CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Parallel performance

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 22, 2005, 13:50
Default Hi, I just test the parallel p
  #1
New Member
 
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17
hsing is on a distinguished road
Hi, I just test the parallel performance of the solver, icoFoam, on a cluster. For single cpu mode, it takes about 31 hours while the 2 cpu mode takes 26 hours. So, the efficency is around 60%. Is it reasonable?
hsing is offline   Reply With Quote

Old   August 22, 2005, 13:54
Default It depends on the speed of the
  #2
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 22
henry is on a distinguished road
It depends on the speed of the interconnect, the size of the case and the parallel comms settings you have specified in .OpenFoam-1.1/controlDict.
henry is offline   Reply With Quote

Old   August 22, 2005, 14:11
Default The size of the case is quite
  #3
New Member
 
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17
hsing is on a distinguished road
The size of the case is quite huge, there are
262392 cells in the computation domain.

I have no idea of editing the file of
.OpenFoam-1.1/controlDict. Actually, I do not change it at all. It takes the form:

InfoSwitches
{
writeJobInfo 0;
FoamXwriteComments 1;
}

OptimisationSwitches
{
fileModificationSkew 10;
scheduledTransfer 1;
floatTransfer 0;
nProcsSimpleSum 16;
}
hsing is offline   Reply With Quote

Old   August 22, 2005, 14:14
Default and the speed of the interconn
  #4
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 22
henry is on a distinguished road
and the speed of the interconnect?
henry is offline   Reply With Quote

Old   August 22, 2005, 15:37
Default Every node of my cluster is du
  #5
New Member
 
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17
hsing is on a distinguished road
Every node of my cluster is dual CPU system. So for the two CPU mode is actually running inside a node. And the interconnection between the mahcine node is 1 G byte.
hsing is offline   Reply With Quote

Old   August 22, 2005, 15:44
Default I assume from your results tha
  #6
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 22
henry is on a distinguished road
I assume from your results that the two CPUs are sharing the memory bus in each of your nodes and you are only getting 60% efficiency because the memory bus is saturated. Try running the case between two nodes.
henry is offline   Reply With Quote

Old   August 22, 2005, 15:52
Default Thanks Henry, I will try. And
  #7
New Member
 
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17
hsing is on a distinguished road
Thanks Henry,
I will try. And there is a type erro in my previous post, the interconnection is 1G bite.
hsing is offline   Reply With Quote

Old   August 24, 2005, 15:25
Default I have run the code in two CPU
  #8
New Member
 
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17
hsing is on a distinguished road
I have run the code in two CPU but with two different node. Now, the efficiency seems to be higher than 100%!!!. That means the bottleneck is the bus speed in my cluster and I'd better to upgrad the mother board?

BTW, the running time given by icoFoam is CPU time, in stead of wall clock time, right?
hsing is offline   Reply With Quote

Old   August 24, 2005, 15:47
Default Recent multi-CPU motherboards
  #9
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 22
henry is on a distinguished road
Recent multi-CPU motherboards like the Tyan dual and quad Opteron boards (and I am guessing the recent Xeon boards as well) have a separate memory bus for each CPU. The AMD-based boards have hyper-transport buses between the CPUs as well but I don't know if there is an equivalent for Xeon processors. This arrangement is far preferable to the old shared-memory multi-CPU machines because CPU speeds outstrip memory access which means that memory-access intensive codes like CFD would become memory-access limited unless each CPU has it's own memory.

All the OpenFOAM applications print CPU time but you can easily add a print for the wall-clock time using the clockTime() member function in the same way as cpuTime() is used.
henry is offline   Reply With Quote

Old   August 24, 2005, 15:57
Default >(and I am guessing the recent
  #10
Senior Member
 
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25
mprinkey will become famous soon enough
>(and I am guessing the recent Xeon boards as well)

I am pretty sure this is not correct. The Nocona Xeon dual CPU motherboards still use a shared memory bus. Based on our experience, these systems are not a good target platform for the current incarnation of OpenFOAM.
mprinkey is offline   Reply With Quote

Old   August 24, 2005, 16:03
Default Current CPU performance far ou
  #11
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 22
henry is on a distinguished road
Current CPU performance far outstrips memory access performance and it doesn't look like this situation will improve anytime soon which means that all codes that rely on rapid memory access of large amounts of data (that is all CFD codes not just the current incarnation of OpenFOAM) will benefit from each CPU having it's own memory bus.
henry is offline   Reply With Quote

Old   August 25, 2005, 09:02
Default Does this mean that the Dual-C
  #12
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Does this mean that the Dual-Core Opterons are not good for CFD computations? If I interpret the Block-Diagrams correctly both cores share the same memory-interface (leading to a similar problem like the Xeon-MoBos discussed above).

Does anyone have experience with OpenFOAM on DualCores?
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   August 25, 2005, 09:08
Default I would expect dual-core CPUs
  #13
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 22
henry is on a distinguished road
I would expect dual-core CPUs to suffer from the same problem because they share the same memory bus.
henry is offline   Reply With Quote

Old   August 29, 2005, 03:54
Default At dual-Opterons (Athlon X2) e
  #14
ulf
Guest
 
Posts: n/a
At dual-Opterons (Athlon X2) each CPU has its own RAM-Channel, the dual DDR-Ram bus is devided to a single for each CPU. The Performance of one CPU (aka Socket 939/940) decreases by approx. 8% to a Socket 754 CPU.
  Reply With Quote

Old   August 29, 2005, 04:25
Default I don't have such a CPU - the
  #15
ulf
Guest
 
Posts: n/a
I don't have such a CPU - the 8% above are just the difference of a Socket 939-CPU with/without dual-DDR-Ram! There is a crossbar-switch between the CPU and the RAM, which should act in that way. Graphics, Harddisk and Ethernet use one (Athlon) to 3 (Opteron) HT-Links with 3.2 GB each.

At Tomshardware.de the difference between 2 single Opteron and on Dual-Opteron is neglectable. But they didn't use CFD for comparisons!
  Reply With Quote

Old   August 30, 2005, 14:30
Default Thanks for all of you guys' i
  #16
New Member
 
Ho Hsing
Join Date: Mar 2009
Posts: 13
Rep Power: 17
hsing is on a distinguished road
Thanks for all of you guys' idea about the parallel performance.

Now I have a question about the CPU time. The CPU time provided by the function of elapsedCpuTime() counts only the main node's CPU time instead of all of the parallel node's, right?

Another question is why evey machine only use a portion of the CPU resource as I am quite sure no other people is using the cluster.
Here is the output of my top command:

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
9397 hsing 25 0 18996 18M 8400 R 69.9 0.4 1000m 0 hsingFlow
hsing is offline   Reply With Quote

Old   August 30, 2005, 14:38
Default Each node calculates it's own
  #17
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 22
henry is on a distinguished road
Each node calculates it's own CPU time but only the master write to the log via the Info statement. If you want to see the CPU time for all the nodes replace Info with Sout or Serr.

Only a fraction of the CPU is being used because the rest of the time it's waiting probably for data communication between nodes.
henry is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
parallel performance ivandipia CFX 6 January 29, 2009 15:26
Parallel performance liu OpenFOAM Running, Solving & CFD 8 October 17, 2006 10:04
Performance of interFoam running in parallel hsieh OpenFOAM Running, Solving & CFD 8 September 14, 2006 09:15
ANSYS CFX 10.0 Parallel Performance for Windows XP Saturn CFX 4 August 13, 2006 12:27
Parallel Performance of Fluent Soheyl FLUENT 2 October 30, 2005 06:11


All times are GMT -4. The time now is 00:37.