CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Problem with parallelization on cluster

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   August 4, 2015, 05:00
Default Problem with parallelization on cluster
  #1
New Member
 
Join Date: Sep 2014
Posts: 10
Rep Power: 4
GiuMan is on a distinguished road
Hi guys,

I have a serious problem with my parallel run: I have a cluster with 5 blades and 12 CPU/blade.
When I use only 1 blade (12 CPU) simpleFOAM solve 1 iteration in about 30second, using 2 blade (24 CPU) it takes 10 seconds but,when I use 3 o 4 blades it takes 60-90 second for 1 single iteration.

How is it possible? Does anyone find the same problem?

Thank's
GiuMan is offline   Reply With Quote

Old   August 4, 2015, 16:41
Default
  #2
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,419
Rep Power: 25
alexeym will become famous soon enoughalexeym will become famous soon enough
Hi,

I guess everybody will need more details like...

1. Size of the problem (i.e. number of cell in the mesh).
2. Type of interconnect in your cluster.

Since problem appears then number of subdomains goes above 36, maybe you are loosing time waiting for processes to exchange data.
alexeym is offline   Reply With Quote

Old   August 5, 2015, 05:01
Default
  #3
New Member
 
Join Date: Sep 2014
Posts: 10
Rep Power: 4
GiuMan is on a distinguished road
Thanks for answer.

The problem is a test case that I've built with about 12M cells.
Our cluster doesn't have Infiniband for connection but some test with other application (CFX, Radioss, and other) doesn't give this problems during parallel runs.
GiuMan is offline   Reply With Quote

Old   August 5, 2015, 05:13
Default
  #4
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,419
Rep Power: 25
alexeym will become famous soon enoughalexeym will become famous soon enough
Hi,

Well, at this point there will be even more technical questions:

0. Is this solution process slow down reproducible?
1. What solver do you utilize?
2. What decomposition method do you utilize?
3. What linear solver do you utilize?
4. Does convergence of the linear solvers depends on the number of blades used?
alexeym is offline   Reply With Quote

Old   August 5, 2015, 05:23
Default
  #5
New Member
 
Join Date: Sep 2014
Posts: 10
Rep Power: 4
GiuMan is on a distinguished road
Thak's for your time,
about your questions:

0 - I've tested the cluster with several test case and every time, using 3 or more blades I have the same problem
1 - simpleFOAM
2 - Hierarchical with differents coeffs, for example if i use 48 CPU I've used 4/4/3 or 48/1/1 obtaining different duration for calculations
3 - Look at attached files
4 - I've still don't have info about convergenge because i'm testing using 30-40 step, just to understand the calculation time
Attached Files
File Type: txt fvSchemes.txt (1.4 KB, 3 views)
File Type: txt fvSolution.txt (2.0 KB, 3 views)
GiuMan is offline   Reply With Quote

Old   August 5, 2015, 05:33
Default
  #6
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,419
Rep Power: 25
alexeym will become famous soon enoughalexeym will become famous soon enough
Hi,

2. Could you visualize decomposition? Also try to use scotch decomposition instead is hierarchical.

3. Try to change linear solver for pressure to PCG (there was reports about GAMG poor performance in parallel regime). I.e. in fvSolution instead of

Code:
    p
    {
        solver           GAMG;
        tolerance        1e-7;
        relTol           0.01;
        smoother         GaussSeidel;
        nPreSweeps       0;
        nPostSweeps      2;
        cacheAgglomeration on;
        agglomerator     faceAreaPair;
        nCellsInCoarsestLevel 10;
        mergeLevels      1;
    }
put

Code:
    p
    {
        solver           PCG;
        preconditioner DIC;
        tolerance        1e-7;
        relTol           0.01;
    }
4. In fact I am talking about convergence of linear solvers not PDE solver In your log file you have something like:

Code:
GAMG:  Solving for p, Initial residual = 0.330664, Final residual = 0.0304151, No Iterations 4
Does the number of iterations (for the same time steps) depend on number of blades used?
alexeym is offline   Reply With Quote

Old   August 5, 2015, 06:48
Default
  #7
New Member
 
Join Date: Sep 2014
Posts: 10
Rep Power: 4
GiuMan is on a distinguished road
Hi,

2 - I enclose one decomposition output. About scotch decomposition do you have any suggestions?
3 - Ok, now I will try this solver
4 - The number of time step is equals using 1 or 2 blades, it changes using 3 blades
Attached Files
File Type: txt foam-yourban_9-decomposePar.txt (15.4 KB, 1 views)
GiuMan is offline   Reply With Quote

Old   August 5, 2015, 08:26
Default
  #8
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,419
Rep Power: 25
alexeym will become famous soon enoughalexeym will become famous soon enough
Hi,

Decomposition is for the case of two blades, yet strange behavior begins with 3 blades. I have got nothing particular to suggest about Scotch decomposition, just set number of subdomains

GAMG behavior/performance also depends on the number of cells in the subdomain and nCellsInCoarsestLevel setting.

For further diagnostics you can attach solver (simpleFoam) output during the first 10 time steps with decomposition into 24 subdomains (2 blades) and 36 subdomains (3 blades).
alexeym is offline   Reply With Quote

Old   August 5, 2015, 08:28
Default
  #9
New Member
 
Join Date: Sep 2014
Posts: 10
Rep Power: 4
GiuMan is on a distinguished road
Perfect,

now I prepare this 2 test with 10 step and I will attech them to this post!
GiuMan is offline   Reply With Quote

Old   August 5, 2015, 09:14
Default
  #10
New Member
 
Join Date: Sep 2014
Posts: 10
Rep Power: 4
GiuMan is on a distinguished road
So,

I've done several tests. I've attached the result here (simpleFOAM and decomposePar log).
I've used new p solutor and scotch decomposition.

Now it looks that OF scales good:
- 12 CPU 1 step about 50s
- 24 CPU 1 step about 30s
- 48 CPU 1 step about 15s

But if i compare the 12 and 48 CPU new result with the old ones they are slower (I've atteched also the results):
12 CPU
-Now: 50s
-Before: 30s

12 CPU
-Now: 30s
-Before: 12s

But maybe is the geometry that is not big enough to scale in a right way?If I use a bigger geometry (our real geo hava abou 30-50M) i will have best results?

Or maybe the lack of infiniband make this possible?

Thank's
Attached Files
File Type: zip new_test_48.zip (8.5 KB, 0 views)
File Type: zip new_test_12_partial.zip (4.6 KB, 0 views)
File Type: zip new_test_24_partial.zip (5.4 KB, 1 views)
File Type: zip old_test.zip (13.9 KB, 1 views)
GiuMan is offline   Reply With Quote

Old   August 5, 2015, 11:22
Default
  #11
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,419
Rep Power: 25
alexeym will become famous soon enoughalexeym will become famous soon enough
Well,

This time you spend more time solving pressure equation:

Old test:

Code:
GAMG:  Solving for p, Initial residual = 0.450677, Final residual = 0.00449642,
No Iterations 10
New test:

Code:
DICPCG:  Solving for p, Initial residual = 0.246676, Final residual = 0.00245312
, No Iterations 245
That is why overall simulation is slower.

Could you run test with GAMG linear solver (i.e. you take fvSolution from old test) + Scotch decomposition on 36/48 cores and post log-files? Just to see what is happening with GAMG after you go from 24 to higher number of subdomains.

In general PCG linear solver requires more iterations to converge than GAMG. So increasing number of cells, well, it will help to creating feeling that we are scaling better, as time spent in calculation will increase as you increase number of cells and keep number of subdomains constant.

I would choose hierarchical decomposition method as guilty here. With the method too many processor boundary faces were created and since data exchange between processors is quite expensive (it is Ethernet), overall performance was poor. I have compared output from decomposePar and Scotch method seems to create much less processor boundary faces.
alexeym is offline   Reply With Quote

Old   August 6, 2015, 05:08
Default
  #12
New Member
 
Join Date: Sep 2014
Posts: 10
Rep Power: 4
GiuMan is on a distinguished road
Morning,

I've done the test with 36/48 cpu using GAMG and now the computauto seems to be very fast and it scales very good.
Using 48 CPU it takes 72s for 10 step.

Now the next step is to evaluete convergence time: so i'm thinking to do 2 different test with 5000 step, using the 2 differents pressure solver, to evaluate the minimun numebr of step to obtain a good convergence. Do you think is a good way to proceed?

Thank's
GiuMan is offline   Reply With Quote

Old   August 14, 2015, 05:11
Default
  #13
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,419
Rep Power: 25
alexeym will become famous soon enoughalexeym will become famous soon enough
Hi,

Your plan is reasonable, yet I would suggest:

1. Use convergence criterion instead of fixed number of iterations (see residualControl section in fvSolution dictionary in airFoil2D tutorial case for example).

2. Use setups closer to the real problems, which will be solved in the future. Obviously if you spend more time in calculations (i.e. instead of just momentum equation, you also solve temperature/mass transfer/reaction kinetics) you scale better and better
alexeym is offline   Reply With Quote

Reply

Tags
clusters, decomposepar, openfoam 2.2.x, parallel computing

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem running on Cluster Sway OpenFOAM Running, Solving & CFD 1 August 5, 2015 04:01
Improper data to cluster through .cas and .dat files kaeran FLUENT 0 October 24, 2014 04:10
Cluster Parallelization Performance minger OpenFOAM Running, Solving & CFD 1 November 21, 2013 18:45
area does not match neighbour by ... % -- possible face ordering problem St.Pacholak OpenFOAM 9 November 22, 2011 11:02
Adiabatic and Rotating wall (Convection problem) ParodDav CFX 5 April 29, 2007 19:13


All times are GMT -4. The time now is 15:47.