CFD Online Discussion Forums

CFD Online Discussion Forums (
-   OpenFOAM Running, Solving & CFD (
-   -   Parallel performance (

hsing August 22, 2005 13:50

Hi, I just test the parallel p
Hi, I just test the parallel performance of the solver, icoFoam, on a cluster. For single cpu mode, it takes about 31 hours while the 2 cpu mode takes 26 hours. So, the efficency is around 60%. Is it reasonable?

henry August 22, 2005 13:54

It depends on the speed of the
It depends on the speed of the interconnect, the size of the case and the parallel comms settings you have specified in .OpenFoam-1.1/controlDict.

hsing August 22, 2005 14:11

The size of the case is quite
The size of the case is quite huge, there are
262392 cells in the computation domain.

I have no idea of editing the file of
.OpenFoam-1.1/controlDict. Actually, I do not change it at all. It takes the form:

writeJobInfo 0;
FoamXwriteComments 1;

fileModificationSkew 10;
scheduledTransfer 1;
floatTransfer 0;
nProcsSimpleSum 16;

henry August 22, 2005 14:14

and the speed of the interconn
and the speed of the interconnect?

hsing August 22, 2005 15:37

Every node of my cluster is du
Every node of my cluster is dual CPU system. So for the two CPU mode is actually running inside a node. And the interconnection between the mahcine node is 1 G byte.

henry August 22, 2005 15:44

I assume from your results tha
I assume from your results that the two CPUs are sharing the memory bus in each of your nodes and you are only getting 60% efficiency because the memory bus is saturated. Try running the case between two nodes.

hsing August 22, 2005 15:52

Thanks Henry, I will try. And
Thanks Henry,
I will try. And there is a type erro in my previous post, the interconnection is 1G bite.

hsing August 24, 2005 15:25

I have run the code in two CPU
I have run the code in two CPU but with two different node. Now, the efficiency seems to be higher than 100%!!!. That means the bottleneck is the bus speed in my cluster and I'd better to upgrad the mother board?

BTW, the running time given by icoFoam is CPU time, in stead of wall clock time, right?

henry August 24, 2005 15:47

Recent multi-CPU motherboards
Recent multi-CPU motherboards like the Tyan dual and quad Opteron boards (and I am guessing the recent Xeon boards as well) have a separate memory bus for each CPU. The AMD-based boards have hyper-transport buses between the CPUs as well but I don't know if there is an equivalent for Xeon processors. This arrangement is far preferable to the old shared-memory multi-CPU machines because CPU speeds outstrip memory access which means that memory-access intensive codes like CFD would become memory-access limited unless each CPU has it's own memory.

All the OpenFOAM applications print CPU time but you can easily add a print for the wall-clock time using the clockTime() member function in the same way as cpuTime() is used.

mprinkey August 24, 2005 15:57

>(and I am guessing the recent
>(and I am guessing the recent Xeon boards as well)

I am pretty sure this is not correct. The Nocona Xeon dual CPU motherboards still use a shared memory bus. Based on our experience, these systems are not a good target platform for the current incarnation of OpenFOAM.

henry August 24, 2005 16:03

Current CPU performance far ou
Current CPU performance far outstrips memory access performance and it doesn't look like this situation will improve anytime soon which means that all codes that rely on rapid memory access of large amounts of data (that is all CFD codes not just the current incarnation of OpenFOAM) will benefit from each CPU having it's own memory bus.

gschaider August 25, 2005 09:02

Does this mean that the Dual-C
Does this mean that the Dual-Core Opterons are not good for CFD computations? If I interpret the Block-Diagrams correctly both cores share the same memory-interface (leading to a similar problem like the Xeon-MoBos discussed above).

Does anyone have experience with OpenFOAM on DualCores?

henry August 25, 2005 09:08

I would expect dual-core CPUs
I would expect dual-core CPUs to suffer from the same problem because they share the same memory bus.

ulf August 29, 2005 03:54

At dual-Opterons (Athlon X2) e
At dual-Opterons (Athlon X2) each CPU has its own RAM-Channel, the dual DDR-Ram bus is devided to a single for each CPU. The Performance of one CPU (aka Socket 939/940) decreases by approx. 8% to a Socket 754 CPU.

ulf August 29, 2005 04:25

I don't have such a CPU - the
I don't have such a CPU - the 8% above are just the difference of a Socket 939-CPU with/without dual-DDR-Ram! There is a crossbar-switch between the CPU and the RAM, which should act in that way. Graphics, Harddisk and Ethernet use one (Athlon) to 3 (Opteron) HT-Links with 3.2 GB each.

At the difference between 2 single Opteron and on Dual-Opteron is neglectable. But they didn't use CFD for comparisons!

hsing August 30, 2005 14:30

Thanks for all of you guys' i
Thanks for all of you guys' idea about the parallel performance.

Now I have a question about the CPU time. The CPU time provided by the function of elapsedCpuTime() counts only the main node's CPU time instead of all of the parallel node's, right?

Another question is why evey machine only use a portion of the CPU resource as I am quite sure no other people is using the cluster.
Here is the output of my top command:

9397 hsing 25 0 18996 18M 8400 R 69.9 0.4 1000m 0 hsingFlow

henry August 30, 2005 14:38

Each node calculates it's own
Each node calculates it's own CPU time but only the master write to the log via the Info statement. If you want to see the CPU time for all the nodes replace Info with Sout or Serr.

Only a fraction of the CPU is being used because the rest of the time it's waiting probably for data communication between nodes.

All times are GMT -4. The time now is 19:33.