CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Cluster Parallelization Performance

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 20, 2013, 07:50
Default Cluster Parallelization Performance
  #1
Member
 
Join Date: Apr 2009
Posts: 36
Rep Power: 17
minger is on a distinguished road
I have been fortunate enough to given some hardware to try and set up an OpenFOAM cluster. I have it up and running with 2 nodes at the moment and am getting unexpectedly poor performance. I am hoping someone can provide some input as to where to look. Here is the info:

Hardware:
2 identical HP Z400 with
Xeon CPU W3550 @ 3.07 GHz x 4
11.7 GB Memory

They are connected via a Linksys RVS4000 Gigabit switch. I have used iperf and can vouch that the machines are transferring at gigabit speed.

The version of foam is 2.2. Both machines are running Ubuntu 13.10 and have identical setups.

My test case is the pimpleDyMFoam tutorial mixerVesselAMI2D. I have thrown away the default mesh and 2 levels of refinement. The first level of refinement is 307200 cells.

I have 3 results for the first case, one with a single core (no parallel option), one with a single node, 4 processors, and one with both nodes, 8 processors.

I am using the scotch method (per the tutorial) of decomposition for both single host and multihost runs. The 2-node case is decomposed as:
Code:
FoamFile
{
    version     2.0;
    format      ascii;
    class       dictionary;
    location    "system";
    object      decomposeParDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

//- Force AMI to be on single processor. Can cause imbalance with some
//  decomposers.
//singleProcessorFaceSets ((AMI -1));

numberOfSubdomains 8;

method          scotch;

distributed     yes;

roots           ( );


// ************************************************************************* //
The single node case is simply changed to 4 subdomains.

To launch the cases
Parallel: time mpirun -hostfile hostfile pimpleDyMFoam -parallel > log
Single Core: time pimpleDyMFoam > log

The hostfile looks like
Code:
192.168.0.3 slots=4
192.168.0.4 slots=4
for the 2 node case, and I simply remove the second node for the single node case.

Results of the first level of mesh refinement look like:
Code:
Single Core Run
real    79m05.605s
user    78m17.394s
sys    0m44.688s

Single Host 4 core machine
real    42m49.394s
user    168m27.428s
sys    0m13.658s

Full Parallel Run
real    60m58.221s
user    104m3.251s
sys    137m15.823s
The second case is further refined to a cell count of 1.2 MM. Results from that look like (no single core run) (note that I reduced the physical time as well which is why the times are similar)

Code:
Single Node 4 Core
real    65m20.622s
user    256m19.924s
sys    0m56.965s

Full Parallel Run
real    58m50.084s
user    143m23.455s
sys    90m40.328s
I guess I'm "content" with a 25% speedup at 1.2 MM elements, but I would have expected better scalaility ... and moreso, I would have not expected to have to run 1 MM elements to get any return from the second node.

So, am I'm trying this, I'm seeing a couple of things. Firstly, I suppose the AMI may be causing issuse? I wouldn't expect the AMI to degrade the parallel performance so much. Also, I am seeing something about a
Code:
nCellsInCoarsestLevel 10;
that is supposed to be set to the square root of the number of cells ... is that right? Is that number of cells in the entire domain, or in the partitioned subdomain?

Anyways, I will keep churning through these -- but any insight or help is appreciated; thanks!

edit: The single node job finished 10 min faster than projected. Barely a 10% speedup on the 2 node, 8 processor run.\

edit 2:
Job finished with modified nCellsInCoarsestLevel. I went back to the "fine" case, and set that value:
Code:
nCellsInCoarsestLevel 550; ! sqrt(300000)
The results are largely unchanged:
Code:
Single Core Run 
real    79m05.605s 
user    78m17.394s 
sys    0m44.688s  

Single Host 4 core machine 
real    42m49.394s 
user    168m27.428s 
sys   0m13.658s  

Full Parallel Run 
real    60m58.221s 
user    104m3.251s 
sys    137m15.823s

Full Parallel Run, Modified nCellsInCoarestLevel
real    57m39.171s
user    90m7.315s
sys    138m22.254s
I am currently running a case with
Code:
singleProcessorFaceSets ((AMI -1));
in decomposePar

Last edited by minger; November 20, 2013 at 11:00. Reason: added run with nCellsInCoarsestLevel
minger is offline   Reply With Quote

Old   November 21, 2013, 17:45
Default
  #2
Member
 
Join Date: Apr 2009
Posts: 36
Rep Power: 17
minger is on a distinguished road
It seems that the AMI and/or dynamic mesh motion was SEVERELY slowing the parallelization down. I went to a more basic test case, and chose the pitzDaily simpleFoam test. Results are:
Code:
================================
pitzDaily

Single Host 4 core machine
real    0m23.332s
user    1m24.179s
sys    0m0.688s

Full Parallel
real    0m48.747s
user    1m5.969s
sys    1m56.091s

================================
pitzDaily Fine - 49k cells

Single Host 4 core machine
real    2m33.846s
user    10m8.397s
sys    0m0.982s

Full Parallel 
real    2m36.021s
user    5m30.923s
sys    4m32.839s

================================
pitzDaily xFine - 195k cells

Single Host 4 core machine
real    45m59.531s
user    182m16.379s
sys    0m6.847s

Full Parallel
real    19m44.253s
user    61m9.221s
sys    16m59.335s
So, I was able to get linear scaling on the parallelization somewhere between 7 and 25k cells per node.

It does raise the question as to whether its the AMI or the DyN that is causing the slowdown.
minger is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Set up for High performance cluster c0sk OpenFOAM Running, Solving & CFD 2 January 30, 2014 23:33
poor performance at massive parallel run using SGI cluster matthias OpenFOAM Running, Solving & CFD 8 October 21, 2011 08:24
Parallel cluster solving with OpenFoam? P2P Cluster? hornig OpenFOAM Programming & Development 8 December 5, 2010 16:06
Linux Cluster Performance with a bi-processor PC M. FLUENT 1 April 22, 2005 09:25
link to cluster performance study fourier Main CFD Forum 0 March 8, 2002 01:00


All times are GMT -4. The time now is 03:05.