CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Unconsistent parallel jobs running time

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 27, 2015, 06:08
Unhappy Unconsistent parallel jobs running time
  #1
New Member
 
Join Date: May 2013
Posts: 23
Rep Power: 9
arnaud6 is on a distinguished road
Hello all!

I keep posting on this forum as I find it really useful.

I have recently come up with some issues regarding parallel jobs. I am running potentialFoam and simpleFoam on several cluster nodes. I am experiencing really different running times depending on the nodes selected.
The times can be multiplied by *5 or even be stuck on the cluster depending on the nodes selected ! I am running with openfoam-2.3.1 and mpirun-1.6.5 and using InfiniBand.

Before I give you more information, does anyone has seen those kind of problems ? I would like to know if there is a software or an openfoam utility to output the amount of data transferred between the processors ? I know there is something on fluent to obtain the parallel data transfer.
I have tried to set the Pstream debug switches to 1 in openfoam but the output is so low level that it is impossible to draw any conclusions with this...
arnaud6 is offline   Reply With Quote

Old   January 27, 2015, 10:05
Default
  #2
Senior Member
 
dkxls's Avatar
 
Armin
Join Date: Feb 2011
Location: Helsinki, Finland
Posts: 156
Rep Power: 15
dkxls will become famous soon enough
I'm not aware of any utility to meassure the parallel data transfer.

Couple of hints/questions:
  1. Are you using the stock OpenFOAM applications or did you make some modifications to the application?
  2. How many cells per core (meaning MPI process) are you using?
  3. Is your case IO heavy, i.e. how often do you read/write data?
  4. Renumbering your mesh (prior to decomposition as well as the decomposed mesh) can improve your performance significantly.
  5. How are your ExecutionTime and ClockTime?
Cheers,
Armin
dkxls is offline   Reply With Quote

Old   January 30, 2015, 14:44
Default
  #3
New Member
 
Join Date: May 2013
Posts: 23
Rep Power: 9
arnaud6 is on a distinguished road
Thanks for your reply Armin,

To answer your questions,
1) No I am using the standard openfam solvers, utilities, etc coming from openfoam-2.3.1
2)Between 300k and 1M which I think should be ok
3)I don't write any data neither do I read it (I start from steady boundary conditions)!
4)I am running this test at the moment, I will let you know !
5) Execution time and cloktime are very similar, should I see a major difference ?
arnaud6 is offline   Reply With Quote

Old   January 30, 2015, 16:32
Default
  #4
Senior Member
 
dkxls's Avatar
 
Armin
Join Date: Feb 2011
Location: Helsinki, Finland
Posts: 156
Rep Power: 15
dkxls will become famous soon enough
Quote:
Originally Posted by arnaud6 View Post
2)Between 300k and 1M which I think should be ok
Jep, that should be OK. If you have more than 100k cells per cpu, your application should scale well. I wouldn't run with less than 50k per CPU, but that is also a bit depending on the application.

Quote:
Originally Posted by arnaud6 View Post
Execution time and cloktime are very similar, should I see a major difference ?
Nope, the closer ExecutionTime and ClockTime are, the better!
Meaning, the closer they are the more time you are actually computing something and the less time is spend with other stuff like IO. At least that's how it typically goes, there are exceptions though.
dkxls is offline   Reply With Quote

Old   February 10, 2015, 12:42
Default
  #5
New Member
 
Join Date: May 2013
Posts: 23
Rep Power: 9
arnaud6 is on a distinguished road
Hello I am coming back to you with more information.

I have run the Test-Parallel of OpenFOAM and the output looks fine for me.
Here is an example of the log file

PHP Code:
Create time

[0
Starting transfers
[0
[
0master receiving from slave 1
[144
Starting transfers
[144
[
144slave sending to master 0
[144slave receiving from master 0
[153
Starting transfers
[153
[
153slave sending to master 0
[153slave receiving from master 0 
I don't know how to interpret all the processor numbers at the end of the test but I don't find them really useful. Should I get more information from this Test-Parallel ?

Just as a quick reminder, we observe this behaviour:
Running on a single switch, the case is running as expected with let's say 80 seconds per iteration.
Running the same job across multiple switches, each iteration takes 250 sec, so 3 times more.

I want to emphasize that the IB fabric seems to work correctly as we don't observe any issue running commercial grade CFD applications.

We have built mpich3.1.3 from source and we observe exactly the same behaviour as using openmpi (slow across switches and fast in a single switch) so this suggests it is not mpi-related.

Has anyone experienced this behaviour running parallel openfoam jobs ? Any pointer would be greatly appreciated !
arnaud6 is offline   Reply With Quote

Reply

Tags
cluster, discrepancy, mpirun, openfoam-2.3.1, parallel

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to export time series of variables for one point? mary mor OpenFOAM Post-Processing 8 July 19, 2017 10:54
[Other] Contribution a new utility: refine wall layer mesh based on yPlus field lakeat OpenFOAM Community Contributions 57 February 1, 2015 08:25
Star cd es-ice solver error ernarasimman STAR-CD 2 September 12, 2014 00:01
plot over time fferroni OpenFOAM Post-Processing 7 June 8, 2012 07:56
Could anybody help me see this error and give help liugx212 OpenFOAM Running, Solving & CFD 3 January 4, 2006 18:07


All times are GMT -4. The time now is 20:53.