CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   mpirun strange behavior (http://www.cfd-online.com/Forums/openfoam-solving/109307-mpirun-strange-behavior.html)

Djub November 14, 2012 13:03

mpirun strange behavior
 
Hi dear all,

I am experimenting very strange behavior of my calculation.

I ran the same calculation with different number of cores, using mpirun (changing both decomposeParDict and the command line).
And I pointed, during the calculation: the time needed to accomplish a given thing, and the memory needed.
The results were as fallows:
8 proc => time 31 H , Mem 6,5 GiB
4 proc => time 17 H , Mem 4,5 GiB
2 proc => time 9 H , Mem 3,5 GiB
1 proc => time 7 H, Mem 3,0 Gib

:confused: The more proc used, slower the calculations ? :confused:

Here is the configuration of my computer:
Server Dell PowerEdge 2950 with 2 proc Intel Xeon E5420 @ 2.50GHz (each 4 cores on VMware ESXI 5.0 U1) with 20 GiB memory
I am using geekoCFD 3.1.0 (it is not the ultimate version): Linux 3.1.10-1.9-default x86_64 , openSUSE 12.1 (x86_64), openFOAM 2.1.x

Two possibilites:
1/ mpi does not work at all and the entire openFOAM community is mistaken
2/ I am missing something...

Lieven November 14, 2012 17:41

Hello Julien,

Quite odd behaviour indeed and certainly not what it should be (since it is not so likely that the entire openFOAM community is mistaken) ;-)

Which command do you use to run your jobs? Which decomposition method do you set in decomposeParDict? How many cells are assigned to one processor?
How did you measure the time you indicate in your post? Is the real time, the ExecutionTime or ClockTime indicated by OpenFOAM?

Regards,

L

Djub November 15, 2012 04:51

Hi Lieven,

some precisions on my settings:

- command: mpirun -n 8 (or 4, or 2) pimpleFoam > log &
compared with a direct pimpleFoam > log &

- decomposition method: scotch . It seems to work well with a nice balance between the different procs ( if N is the total number of cell, here about 300k, and n is the number of proc, here 8 4 or 2, then the number of cell per procs is quite uniform very close to N/n )

- I am monitoring using ExecutionTime given in the log file.

Any advice or explanation?

florian_krause November 15, 2012 05:25

Quote:

Originally Posted by Djub (Post 392207)
Hi Lieven,

some precisions on my settings:

- command: mpirun -n 8 (or 4, or 2) pimpleFoam > log &
compared with a direct pimpleFoam > log &

- decomposition method: scotch . It seems to work well with a nice balance between the different procs ( if N is the total number of cell, here about 300k, and n is the number of proc, here 8 4 or 2, then the number of cell per procs is quite uniform very close to N/n )

- I am monitoring using ExecutionTime given in the log file.

Any advice or explanation?

You should have followed the user guide http://www.openfoam.org/docs/user/ru...p#x12-820003.4

If you don't use option -parallel then it will probably start the same job np times using one proc for each job (that's what I guess)

Regards,
Florian

Lieven November 15, 2012 06:50

Florian is right, adding the '-parallel' option should solve your problem:

mpirun -n 8 pimpleFoam -parallel > log &

or (what I prefer to do)

mpirun -n 8 foamJob pimpleFoam -parallel &

Regards,

L

Djub November 15, 2012 07:37

Bullshit!
You are write!
I am so confused...:o
But quite angry against myself!

Thanks, Lieven and Florian

Just a remark: -n or -np is the same (or -c or --n).

By the way, could you tell me why do you prefer foamJob ? I made my own "log file interpreter" in MatLab, it works well. What happens with foamJob if I stop my calculation, change a parameter, and start it again ? With my version, I just change the name of the log file (for example with the date: log20121115 ). Does foamJob just appens new data? What about if I want to "rewind" my calculation ,for example if I start from 100s before the end of my previous calculation?

I am still having some strange things, but I will check everything before to think that "the entire openFOAM community is mistaken " ;):D

Lieven November 15, 2012 09:11

Hey Julien, just found out that you can even write

foamJob -p pimpleFoam

instead of

mpirun -n 8 foamJob pimpleFoam

(does not make any difference with respect to the log-file output) so with foamJob there is no need to specify mpirun, number of processors, specify the output file and redirect the process to the background. It basically shortens things + foamJob also checks whether the solver exists and whether the case is decomposed.

Regarding the "log file interpreter", don't forget there is also a tool available in openFoam called 'foamLog' which creates xy-data files of the residuals etc. for all variables

Regards,

Lieven

Djub November 16, 2012 06:20

Hi! :)
I made the same tests as previously, but with the -parallel option (grrr... so stupid ! I have been wasting so much time!). Here are the the results:
1 proc => time 6.6 H , Mem 3.1 GiB
2 proc => time 4.3 H , Mem 3.2 GiB
4 proc => time 4.1 H , Mem 3.3 GiB
6 proc => time 3.0 H , Mem 3,3 Gib
8 proc => time 7.3 H , Mem 3,3 Gib

So thanks :) ! I understand everything now.

An advice for other people: do not use all of your processors. Keep a little part for the mpi process. In my example, running with all my procs was slower than with only 1 proc !

Djub November 19, 2012 05:28

optimizing MPIRUN
 
Hi!
Another test of MPIRUN . I made exactely the same calucation with different number of procs. I check the global CPU usage, and the time needed to compute the calculation . Results are above:
Nproc . CPU . . . . . Time
. 8 . . . 100% . . . . 16.0
. 7 . . . 100% . . . . 10.8
. 6 . . . . 99 % . . . . 9.9
. 5 . . . . 90 % . . . 10.0
. 4 . . . . 84 % . . . 10.0
. 3 . . . . 68 % . . . . 8.0
. 2 . . . . 52 % . . . . 9.9
. 1 . . . . 40 % . . . 15.5

:confused: Isn't it strange? The optimum is 3 ??? :confused:

I made another comparison (longer), comparing between 3 and 6 procs. The winner is still 3 procs !

Any explanation?

Lieven November 19, 2012 06:00

Hey Julien,

Here's my guess. The speedup by parallellization of openfoam calculations is (strongly) limited by the speed of communication between the different processors. You can consider the speed of data transfer within one processor as infinite with respect to the data transfer between two processors. In your case you have 2x 4CPU and therefore this is what happens:

In case of openfoam on 3 CPU:
1. Operating system
2. Openfoam processor 0
3. Openfoam processor 1
4. Openfoam processor 2
-- (no communication)
5. free
6. free
7. free
8. free

In case of 4 CPU (and more)
1. Operating system
2. Openfoam processor 0
3. Openfoam processor 1
4. Openfoam processor 2
-- communication
5. Openfoam processor 3
6. free
7. free
8. free

So I think the communication is the bottleneck in your case. Changing the simulation time won't solve the problem either. You could verify this by forcing mpirun to first fill the other processor (in this case the optimum should shift to 4).

One of the things you can also test is how the calculation time evolves if you keep the number of cells/CPU constant when increasing the number of CPU. If you test it for two different number of cells/CPU (e.g. 100k and 300k / CPU). The latter should scale better than the former cause the ratio computation/communication is more advantageous.

Regards,

L

Hanzo November 19, 2012 22:22

Thanks for your studies Lieven. I am very interested in this post :)

Quote:

Originally Posted by Lieven (Post 392907)
You could verify this by forcing mpirun to first fill the other processor (in this case the optimum should shift to 4).

Could you give a hint on how to do such processor shifting with mpirun?

Lieven November 20, 2012 03:55

Have a look at http://www.openfoam.org/docs/user/ru...s-parallel.php , there they give some explanation about the '<machines> file'. I'm pretty sure that should enable you to do it.

I've never done it myself so can't really help you with the practical implementation. I guess the machine list can help you, given at the beginning of the openfoam output when you run a parallel case (but maybe it requires a different output).

Regards,

L


All times are GMT -4. The time now is 21:27.