CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Large case parallel efficiency

Register Blogs Community New Posts Updated Threads Search

Like Tree40Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 18, 2011, 13:38
Default
  #41
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
I came across with this website, is anyone interested?
WSMP: Watson Sparse Matrix Package (Version 11.1.19)

And I'm curious, what's the largest case ever simulated with OpenFOAM? And how many cpus were used?
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   May 18, 2011, 14:50
Default
  #42
New Member
 
Join Date: Jan 2010
Posts: 23
Rep Power: 16
jdiorio is on a distinguished road
Quote:
Originally Posted by flavio_galeazzo View Post
Hi jdiorio,

If you cannot guarantee that the nodes work only for you, there is no surprise that your computation time varies greatly. One time the machine is working for your job only, and another time it is splitting the resources between X jobs.

If you are using the LSF scheduler, it is possible to reserve the nodes for your job only. Then your results will be consistent.
Thanks for the response Flavio.

But isn't this the purpose of the LSF scheduler (i.e. it will only assign processors that are available/not running other jobs)? Like I mentioned, this problem doesn't occur only during heavy load time, and I've monitored the cpu usage once during the runs once the machines have been assigned. But I'm certainly willing to give it a try - do you happen to know the bsub options to reserve the nodes solely for your job? I think the bsub -R option could do it, and request only machines that are lightly loaded. I could also try submitting to the priority queue on the cluster, see if that makes a difference.
jdiorio is offline   Reply With Quote

Old   August 22, 2011, 09:13
Default
  #43
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
fyi, I just saw the ppt by Dr. Jasak.

So, currently amg is not good for scalability? Then which one are you guys using?
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   August 22, 2011, 10:05
Default
  #44
Senior Member
 
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,912
Rep Power: 36
alberto will become famous soon enoughalberto will become famous soon enough
His statement is that the AMG scales worse as strategy than Krylov solvers (gradient methods, to use a probably more common term among students), but that AMG solvers require about 1/3 of the number of iterations.

You see this very clearly for example on the pressure equation, where GAMG significantly beats other methods.

I generally tend to use GAMG for pressure and conjugate gradients for other variables.
__________________
Alberto Passalacqua

GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as in both physical and virtual formats (current status: http://albertopassalacqua.com/?p=1541)
OpenQBMM - An open-source implementation of quadrature-based moment methods.

To obtain more accurate answers, please specify the version of OpenFOAM you are using.
alberto is offline   Reply With Quote

Old   August 22, 2011, 10:10
Default
  #45
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
Quote:
AMG solvers require about 1/3 of the number of iterations.
Thank you. Is this rough estimation based on single processor or multi-processor? I am not talking about using Krylov method, I am just wondering if there's any better options than AMG solver, that works best so far. Thanks,
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   August 22, 2011, 10:19
Default
  #46
Senior Member
 
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,912
Rep Power: 36
alberto will become famous soon enoughalberto will become famous soon enough
Hi,

it is quite general. In my experience using GAMG on the pressure equation for large cases leads to very nice improvements in performance (much lower number of iterations: 1/3 is a bit on the pessimistic side, since in many cases the improvement is larger).

Best,
__________________
Alberto Passalacqua

GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as in both physical and virtual formats (current status: http://albertopassalacqua.com/?p=1541)
OpenQBMM - An open-source implementation of quadrature-based moment methods.

To obtain more accurate answers, please specify the version of OpenFOAM you are using.
alberto is offline   Reply With Quote

Old   August 22, 2011, 20:58
Default
  #47
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by lakeat View Post
fyi, I just saw the ppt by Dr. Jasak.

So, currently amg is not good for scalability? Then which one are you guys using?
I have already read that ppt long time ago. My opinion is that it is better to say that *his* implementation of AMG does not have good scaling.


Here is small note about parallel AMG (it is also very personal opinion):

There are three major types of AMG (there are lot more types actually but based on use currently).

1. Additive corrective multigrid that openFOAM,Fluent, starCCM+ etc use.
2. Classic or Ruge- Stuben
3. Smoothed aggregation AMG

Here are major sources of problems in parallelization.
1. Performance of smoother degrades. For example Gauss Seidel would behave more like Jacobi smoother.
Here is one proposed remedy http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf
I will switch to this approach in coming months as i get time to implement this.

2. Set up of coarse levels. Some multigrids are easy to set up some are difficult to set up.
Out of the three given above #1 (additive corrective) is easiest to set up and #3 (smoothed) is most difficult to set up.

3. Performance degradation due to communication and very low level of coarse equations , typically smaller than number of processors given.


Few side notes in addition to above comments.

Multigrid #1 (openfoam's) generate a large number of coarse levels (typically 10 - 20 because each level is roughly Nfine/2 or Nfine/3 ). It means all the problems that I mentioned applies to each level.
Multigrid #2 and #3 are very difficult to implement but the drop in equations is much higher(factor of 8-10). Here is real example from of the set up:

Finest Level Cells = 130208083
Coarse Level Cells = 14420920
Coarse Level Cells = 267305
Single Proc Mat Size = 267305 at lev 2

(Note in this case I used 267305 equation at coarsest level). This means for 130million points i only need 3-4 levels.


Jasak's is saying that due to reason #3, communication issues and coarse level equations AMG performance will degrade.

My opinion is that communication issue is mainly dependent on communication algorithm. In MY case I have no complains. If he can improve this he could improve the efficiency.

Coarse level equations SEEMS to be a problem but in practical use i do not think that they would matter much. Here is why (again my opinion).

Imagine that you are running a calculation on large number of processors say 2000 processors. Then in a perfectly balanced set up your coarset level matrix would have equations ~ 2000.
This could be quickly solved by each processors and would not add much cost.
If partitioning was not good you would be looking at 2000x10 equations which also could be solved quickly at single processor. (I use BiCG + AMG single processor version to do so).


Further one should use #2 or #3 AMG instead of #1 AMG because of these reasons:

1. Less coarse levels are generated.
2. Much higher convergence compared to #1 AMG.
3. Takes much less memory.
4. Can be used for difficult problems like FEM etc. That is 1 AMG fits all.

This is why I no longer take interest in additive corrective AMG. It is easy but not so rewarding.
mgg likes this.
arjun is offline   Reply With Quote

Old   August 22, 2011, 21:15
Default
  #48
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
Wow, what a long post! Good! and Thank you. I would like to buy you a drink next time when we meet.

So far, ICCG works better than amg for me.
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   August 22, 2011, 21:52
Default
  #49
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by lakeat View Post
Wow, what a long post! Good! and Thank you. I would like to buy you a drink next time when we meet.

So far, ICCG works better than amg for me.
Thank you but I do not drink. If you ever visit japan let me know, my wife is very good cook ;-)

For me, so far these are the best performing solvers that I have written:

1. For cartesian grid based DNS solver. (Immersed boundary).
Converges in 1-3 iterations for very large cases like 3-4 billion cells. Uses SSOR + Direct solver (FFT + Block cyclic reduction). This is the fastest thing I have seen so far. I am happy that i designed it.

2. Full AMG for locally refined cartesian meshes , a solver for immersed boundary.
Again can converge in 2-3 iterations of BiCG stablized algorithm for 250 - 500 million cells.
Uses these things inside: Bi conjugate gradient, Gauss Seidel + FFT + Block cyclic reduction
(Means uses all type of solvers in ONE).

3. Smoothed Aggregation Preconditioned bi Conjugate Gradient method.
So far tested on 100-200 million points. Very fast but slower than #1 and #2
arjun is offline   Reply With Quote

Old   August 29, 2011, 15:56
Default
  #50
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
I am still testing a case ~100million mesh size, an unsteady problem for 1000 processors with infiniband support.

I am not sure which one is better, V or W, or F cycle? But last time I tried F cycle, there was obviously a floating point exception.
And also I am still not clear how to set minCoarseEqns and nMaxLevels. Could someone in the know shed me some light on this?
Thanks a lot.
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   August 29, 2011, 22:20
Default
  #51
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
By some tests so far for my smaller case (4M grids), V cycle is better.
minCoarseEqns I found <5 is best.

But still the best performance comes from using ICCG directly.

My goal is to reach One-Step-One-Second rule with Multigrid technique. So is there any FORMULA to set the minCoarseEqns etc.?
Thanks
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email

Last edited by lakeat; August 29, 2011 at 23:44.
lakeat is offline   Reply With Quote

Old   August 30, 2011, 20:46
Default
  #52
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by lakeat View Post

My goal is to reach One-Step-One-Second rule with Multigrid technique. So is there any FORMULA to set the minCoarseEqns etc.?
Thanks

If you are running 100 million with 1000 processors then I think you will not reach 'one time step' in 1 second.

I am running a small test problem (not on openfoam) with 88 million cells (SIMPLE algo) , my timing with 32 processors is 70sec per time step (3 sub iterations, so per iteration 70/3 sec).

With 3 sub iterations , the estimated time for 100 mill cells with my solver (assuming everything linear scaling) is ~ 2.5 second per time step.

since my solver is not perfect, in theory you could achieve 1 step per second, but in practice i doubt that you will come even close to 1 time step in 5 - 10 seconds.
arjun is offline   Reply With Quote

Old   August 31, 2011, 11:17
Default
  #53
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
Quote:
Originally Posted by arjun View Post
If you are running 100 million with 1000 processors then I think you will not reach 'one time step' in 1 second.

It would be desirable that some day, One-Second-One-Second could be realized, which is real-time simulation.

I am just feel surprised that in some situations directly using ICCG is better than using multigrid tech.
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   September 1, 2011, 01:13
Default
  #54
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by lakeat View Post
It would be desirable that some day, One-Second-One-Second could be realized, which is real-time simulation.

I am just feel surprised that in some situations directly using ICCG is better than using multigrid tech.

yes in small cases ICCG is better than multigrid.

What pre-conditioner are you using with ICCG. And is there any way for you to export that matrix in text form. If that happens I could try with one of my multigrid routines to see , whether multigrid can do it faster than what you are getting.
arjun is offline   Reply With Quote

Old   September 9, 2011, 04:13
Default
  #55
Member
 
Flavio Galeazzo
Join Date: Mar 2009
Location: Karlsruhe, Germany
Posts: 34
Rep Power: 18
flavio_galeazzo is on a distinguished road
Quote:
Originally Posted by lakeat View Post
It would be desirable that some day, One-Second-One-Second could be realized, which is real-time simulation.

I am just feel surprised that in some situations directly using ICCG is better than using multigrid tech.
I would like to share my newest experience with the scalability of the linear solvers in OpenFoam. I used to run simulations using compressible solvers, and used the PCG linear solver for pressure and PBiCG for the other variables. The scalability was very good up to 256 processors, and I could get one second computational time per time step for a 14 million node grid.
Recently I moved to a incompressible solver, and in that case the GAMG linear solver was far superior than the PCG for the pressure, the other variables stayed with the PBiCG. However, to my surprise, the scalability was very poor this time. I got good results up to 32 processors, with about 10 seconds computational time per time step, and increasing the number of processors do not improved the computational time.
Reading now the ppt from Dr. Jasak, from the post of lakeat, it become clear that the problem is actually the GAMG linear solver. It is very unfortunate, since the GAMG linear solver is indeed very helpfull for the incompressible solver.
flavio_galeazzo is offline   Reply With Quote

Old   September 9, 2011, 08:53
Default
  #56
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
Quote:
Originally Posted by arjun View Post
yes in small cases ICCG is better than multigrid.

What pre-conditioner are you using with ICCG. And is there any way for you to export that matrix in text form. If that happens I could try with one of my multigrid routines to see , whether multigrid can do it faster than what you are getting.
Sorry, I was busy with some other problems/proposals. If you could send me an email, I could send you the case.
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   September 9, 2011, 09:22
Default
  #57
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
@Flavio,

Thanks for sharing, and I have some similar reports here.

-1- A good scalability is critical for us, since in our case, we do hope a real-time analysis some day (complex geometries of course), but so far we are just looking for a One-Step-One-Second (OSOS) rule to be realized.
-2- I have a case with several millions hex cell, incompressible solver, hybrid turb model, it showed a very good scalability with PBICG for pressure, better than MG, this was tested on cpus up to ~100. But as I increased the mesh size to >10 million, the situation is no longer pleasant for me, the following item 3 and item 4 are what I found and did.
-3- The solving process in the beginning in order to remove the initial transient effect seems to be different from the "normal" solving process (big difference in matrix??) and deserved to be paid attention to. I need advice here. (My question here is "Will potentialFoam can really produce a better start here??") I am not clear with what happening with the matrix. But for sure, you DO NOT want to use the PBICG at the very beginning, it would be disastrous and totally unacceptable (even floating point exception, FPE), no matter how you change the relTol. To use a well tuned MG method, with V-cycle worked better for me as a start. (Sorry for my poor english, hope you could understand what Im saying)
-4- As the solving process continued for a while (tricky here), I tried to switch it back to the PBICG based on the experience got from my item 2 above, but this time, no avail, it would still ran for sometime and then crashed again somewhere sometime. I have no clue for this so far.

I do not have time to dive into the mechanism behind this, it would be great if someone would shed some lights for me here on,
1. WHY PBICG (for pressure) wins out than MG for some large cases concerning scalaility?
2. What is the next move, what is the up-to-date status the better solution to achieve better scalability in CFD world apart from OF (for solvers that are like those in OF that works well for incompressible complex geometries, not some special solvers as arjun have wriiten ).
3. Is there anyone who has achieved a better parallel efficiency would be willing to contribute more experience?

Thanks
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email

Last edited by lakeat; September 9, 2011 at 09:37.
lakeat is offline   Reply With Quote

Old   September 9, 2011, 09:47
Default
  #58
Member
 
Flavio Galeazzo
Join Date: Mar 2009
Location: Karlsruhe, Germany
Posts: 34
Rep Power: 18
flavio_galeazzo is on a distinguished road
Hi lakeat,

Your experiences seems to be similar. Unfortunately I have no clue on how to improve the scalability of my incompressible solver. One workaround is to use a compressible solver for an incompressible case. It worked well for me.
flavio_galeazzo is offline   Reply With Quote

Old   September 9, 2011, 11:43
Default
  #59
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
Quote:
Originally Posted by flavio_galeazzo View Post
Hi lakeat,

Your experiences seems to be similar. Unfortunately I have no clue on how to improve the scalability of my incompressible solver. One workaround is to use a compressible solver for an incompressible case. It worked well for me.
What! Why, why did it work better?
__________________
~
Daniel WEI
-------------
Boeing Research & Technology - China
Beijing, China
Email
lakeat is offline   Reply With Quote

Old   September 12, 2011, 20:04
Default
  #60
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by lakeat View Post
Sorry, I was busy with some other problems/proposals. If you could send me an email, I could send you the case.

I was also taking some break from CFD. Here is my email: arjun.yadav@yahoo.com
arjun is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Postprocessing large data sets in parallel evrikon OpenFOAM Post-Processing 28 June 28, 2016 03:43
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 05:36
Parelleling Efficiency kassiotis OpenFOAM 0 June 19, 2009 14:12
Parallel efficiency channel flow maka OpenFOAM Running, Solving & CFD 1 December 8, 2005 12:58
Post-processing of a large transient case Flav Siemens 2 September 28, 2004 06:19


All times are GMT -4. The time now is 00:50.