CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Unconsistent parallel jobs running time (https://www.cfd-online.com/Forums/openfoam-solving/147676-unconsistent-parallel-jobs-running-time.html)

arnaud6 January 27, 2015 06:08

Unconsistent parallel jobs running time
 
Hello all!

I keep posting on this forum as I find it really useful.

I have recently come up with some issues regarding parallel jobs. I am running potentialFoam and simpleFoam on several cluster nodes. I am experiencing really different running times depending on the nodes selected.
The times can be multiplied by *5 or even be stuck on the cluster depending on the nodes selected ! I am running with openfoam-2.3.1 and mpirun-1.6.5 and using InfiniBand.

Before I give you more information, does anyone has seen those kind of problems ? I would like to know if there is a software or an openfoam utility to output the amount of data transferred between the processors ? I know there is something on fluent to obtain the parallel data transfer.
I have tried to set the Pstream debug switches to 1 in openfoam but the output is so low level that it is impossible to draw any conclusions with this...

dkxls January 27, 2015 10:05

I'm not aware of any utility to meassure the parallel data transfer.

Couple of hints/questions:
  1. Are you using the stock OpenFOAM applications or did you make some modifications to the application?
  2. How many cells per core (meaning MPI process) are you using?
  3. Is your case IO heavy, i.e. how often do you read/write data?
  4. Renumbering your mesh (prior to decomposition as well as the decomposed mesh) can improve your performance significantly.
  5. How are your ExecutionTime and ClockTime?
Cheers,
Armin

arnaud6 January 30, 2015 14:44

Thanks for your reply Armin,

To answer your questions,
1) No I am using the standard openfam solvers, utilities, etc coming from openfoam-2.3.1
2)Between 300k and 1M which I think should be ok
3)I don't write any data neither do I read it (I start from steady boundary conditions)!
4)I am running this test at the moment, I will let you know !
5) Execution time and cloktime are very similar, should I see a major difference ?

dkxls January 30, 2015 16:32

Quote:

Originally Posted by arnaud6 (Post 529744)
2)Between 300k and 1M which I think should be ok

Jep, that should be OK. If you have more than 100k cells per cpu, your application should scale well. I wouldn't run with less than 50k per CPU, but that is also a bit depending on the application.

Quote:

Originally Posted by arnaud6 (Post 529744)
Execution time and cloktime are very similar, should I see a major difference ?

Nope, the closer ExecutionTime and ClockTime are, the better!
Meaning, the closer they are the more time you are actually computing something and the less time is spend with other stuff like IO. At least that's how it typically goes, there are exceptions though.

arnaud6 February 10, 2015 12:42

Hello I am coming back to you with more information.

I have run the Test-Parallel of OpenFOAM and the output looks fine for me.
Here is an example of the log file

PHP Code:

Create time

[0
Starting transfers
[0
[
0master receiving from slave 1
[144
Starting transfers
[144
[
144slave sending to master 0
[144slave receiving from master 0
[153
Starting transfers
[153
[
153slave sending to master 0
[153slave receiving from master 0 

I don't know how to interpret all the processor numbers at the end of the test but I don't find them really useful. Should I get more information from this Test-Parallel ?

Just as a quick reminder, we observe this behaviour:
Running on a single switch, the case is running as expected with let's say 80 seconds per iteration.
Running the same job across multiple switches, each iteration takes 250 sec, so 3 times more.

I want to emphasize that the IB fabric seems to work correctly as we don't observe any issue running commercial grade CFD applications.

We have built mpich3.1.3 from source and we observe exactly the same behaviour as using openmpi (slow across switches and fast in a single switch) so this suggests it is not mpi-related.

Has anyone experienced this behaviour running parallel openfoam jobs ? Any pointer would be greatly appreciated !


All times are GMT -4. The time now is 15:37.