Problem with parallelization on cluster
Hi guys,
I have a serious problem with my parallel run: I have a cluster with 5 blades and 12 CPU/blade. When I use only 1 blade (12 CPU) simpleFOAM solve 1 iteration in about 30second, using 2 blade (24 CPU) it takes 10 seconds but,when I use 3 o 4 blades it takes 60-90 second for 1 single iteration. How is it possible? Does anyone find the same problem? Thank's |
Hi,
I guess everybody will need more details like... 1. Size of the problem (i.e. number of cell in the mesh). 2. Type of interconnect in your cluster. Since problem appears then number of subdomains goes above 36, maybe you are loosing time waiting for processes to exchange data. |
Thanks for answer.
The problem is a test case that I've built with about 12M cells. Our cluster doesn't have Infiniband for connection but some test with other application (CFX, Radioss, and other) doesn't give this problems during parallel runs. |
Hi,
Well, at this point there will be even more technical questions: 0. Is this solution process slow down reproducible? 1. What solver do you utilize? 2. What decomposition method do you utilize? 3. What linear solver do you utilize? 4. Does convergence of the linear solvers depends on the number of blades used? |
2 Attachment(s)
Thak's for your time,
about your questions: 0 - I've tested the cluster with several test case and every time, using 3 or more blades I have the same problem 1 - simpleFOAM 2 - Hierarchical with differents coeffs, for example if i use 48 CPU I've used 4/4/3 or 48/1/1 obtaining different duration for calculations 3 - Look at attached files 4 - I've still don't have info about convergenge because i'm testing using 30-40 step, just to understand the calculation time |
Hi,
2. Could you visualize decomposition? Also try to use scotch decomposition instead is hierarchical. 3. Try to change linear solver for pressure to PCG (there was reports about GAMG poor performance in parallel regime). I.e. in fvSolution instead of Code:
p Code:
p Code:
GAMG: Solving for p, Initial residual = 0.330664, Final residual = 0.0304151, No Iterations 4 |
1 Attachment(s)
Hi,
2 - I enclose one decomposition output. About scotch decomposition do you have any suggestions? 3 - Ok, now I will try this solver 4 - The number of time step is equals using 1 or 2 blades, it changes using 3 blades |
Hi,
Decomposition is for the case of two blades, yet strange behavior begins with 3 blades. I have got nothing particular to suggest about Scotch decomposition, just set number of subdomains ;) GAMG behavior/performance also depends on the number of cells in the subdomain and nCellsInCoarsestLevel setting. For further diagnostics you can attach solver (simpleFoam) output during the first 10 time steps with decomposition into 24 subdomains (2 blades) and 36 subdomains (3 blades). |
Perfect,
now I prepare this 2 test with 10 step and I will attech them to this post! |
4 Attachment(s)
So,
I've done several tests. I've attached the result here (simpleFOAM and decomposePar log). I've used new p solutor and scotch decomposition. Now it looks that OF scales good: - 12 CPU 1 step about 50s - 24 CPU 1 step about 30s - 48 CPU 1 step about 15s But if i compare the 12 and 48 CPU new result with the old ones they are slower (I've atteched also the results): 12 CPU -Now: 50s -Before: 30s 12 CPU -Now: 30s -Before: 12s But maybe is the geometry that is not big enough to scale in a right way?If I use a bigger geometry (our real geo hava abou 30-50M) i will have best results? Or maybe the lack of infiniband make this possible? Thank's |
Well,
This time you spend more time solving pressure equation: Old test: Code:
GAMG: Solving for p, Initial residual = 0.450677, Final residual = 0.00449642, Code:
DICPCG: Solving for p, Initial residual = 0.246676, Final residual = 0.00245312 Could you run test with GAMG linear solver (i.e. you take fvSolution from old test) + Scotch decomposition on 36/48 cores and post log-files? Just to see what is happening with GAMG after you go from 24 to higher number of subdomains. In general PCG linear solver requires more iterations to converge than GAMG. So increasing number of cells, well, it will help to creating feeling that we are scaling better, as time spent in calculation will increase as you increase number of cells and keep number of subdomains constant. I would choose hierarchical decomposition method as guilty here. With the method too many processor boundary faces were created and since data exchange between processors is quite expensive (it is Ethernet), overall performance was poor. I have compared output from decomposePar and Scotch method seems to create much less processor boundary faces. |
Morning,
I've done the test with 36/48 cpu using GAMG and now the computauto seems to be very fast and it scales very good. Using 48 CPU it takes 72s for 10 step. Now the next step is to evaluete convergence time: so i'm thinking to do 2 different test with 5000 step, using the 2 differents pressure solver, to evaluate the minimun numebr of step to obtain a good convergence. Do you think is a good way to proceed? Thank's |
Hi,
Your plan is reasonable, yet I would suggest: 1. Use convergence criterion instead of fixed number of iterations (see residualControl section in fvSolution dictionary in airFoil2D tutorial case for example). 2. Use setups closer to the real problems, which will be solved in the future. Obviously if you spend more time in calculations (i.e. instead of just momentum equation, you also solve temperature/mass transfer/reaction kinetics) you scale better and better ;) |
All times are GMT -4. The time now is 19:57. |