|
[Sponsors] |
May 18, 2011, 13:38 |
|
#41 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
I came across with this website, is anyone interested?
WSMP: Watson Sparse Matrix Package (Version 11.1.19) And I'm curious, what's the largest case ever simulated with OpenFOAM? And how many cpus were used?
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
May 18, 2011, 14:50 |
|
#42 | |
New Member
Join Date: Jan 2010
Posts: 23
Rep Power: 16 |
Quote:
But isn't this the purpose of the LSF scheduler (i.e. it will only assign processors that are available/not running other jobs)? Like I mentioned, this problem doesn't occur only during heavy load time, and I've monitored the cpu usage once during the runs once the machines have been assigned. But I'm certainly willing to give it a try - do you happen to know the bsub options to reserve the nodes solely for your job? I think the bsub -R option could do it, and request only machines that are lightly loaded. I could also try submitting to the priority queue on the cluster, see if that makes a difference. |
||
August 22, 2011, 09:13 |
|
#43 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
fyi, I just saw the ppt by Dr. Jasak.
So, currently amg is not good for scalability? Then which one are you guys using?
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
August 22, 2011, 10:05 |
|
#44 |
Senior Member
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,912
Rep Power: 36 |
His statement is that the AMG scales worse as strategy than Krylov solvers (gradient methods, to use a probably more common term among students), but that AMG solvers require about 1/3 of the number of iterations.
You see this very clearly for example on the pressure equation, where GAMG significantly beats other methods. I generally tend to use GAMG for pressure and conjugate gradients for other variables.
__________________
Alberto Passalacqua GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as in both physical and virtual formats (current status: http://albertopassalacqua.com/?p=1541) OpenQBMM - An open-source implementation of quadrature-based moment methods. To obtain more accurate answers, please specify the version of OpenFOAM you are using. |
|
August 22, 2011, 10:10 |
|
#45 | |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Quote:
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
||
August 22, 2011, 10:19 |
|
#46 |
Senior Member
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,912
Rep Power: 36 |
Hi,
it is quite general. In my experience using GAMG on the pressure equation for large cases leads to very nice improvements in performance (much lower number of iterations: 1/3 is a bit on the pessimistic side, since in many cases the improvement is larger). Best,
__________________
Alberto Passalacqua GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as in both physical and virtual formats (current status: http://albertopassalacqua.com/?p=1541) OpenQBMM - An open-source implementation of quadrature-based moment methods. To obtain more accurate answers, please specify the version of OpenFOAM you are using. |
|
August 22, 2011, 20:58 |
|
#47 | |
Senior Member
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34 |
Quote:
Here is small note about parallel AMG (it is also very personal opinion): There are three major types of AMG (there are lot more types actually but based on use currently). 1. Additive corrective multigrid that openFOAM,Fluent, starCCM+ etc use. 2. Classic or Ruge- Stuben 3. Smoothed aggregation AMG Here are major sources of problems in parallelization. 1. Performance of smoother degrades. For example Gauss Seidel would behave more like Jacobi smoother. Here is one proposed remedy http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf I will switch to this approach in coming months as i get time to implement this. 2. Set up of coarse levels. Some multigrids are easy to set up some are difficult to set up. Out of the three given above #1 (additive corrective) is easiest to set up and #3 (smoothed) is most difficult to set up. 3. Performance degradation due to communication and very low level of coarse equations , typically smaller than number of processors given. Few side notes in addition to above comments. Multigrid #1 (openfoam's) generate a large number of coarse levels (typically 10 - 20 because each level is roughly Nfine/2 or Nfine/3 ). It means all the problems that I mentioned applies to each level. Multigrid #2 and #3 are very difficult to implement but the drop in equations is much higher(factor of 8-10). Here is real example from of the set up: Finest Level Cells = 130208083 Coarse Level Cells = 14420920 Coarse Level Cells = 267305 Single Proc Mat Size = 267305 at lev 2 (Note in this case I used 267305 equation at coarsest level). This means for 130million points i only need 3-4 levels. Jasak's is saying that due to reason #3, communication issues and coarse level equations AMG performance will degrade. My opinion is that communication issue is mainly dependent on communication algorithm. In MY case I have no complains. If he can improve this he could improve the efficiency. Coarse level equations SEEMS to be a problem but in practical use i do not think that they would matter much. Here is why (again my opinion). Imagine that you are running a calculation on large number of processors say 2000 processors. Then in a perfectly balanced set up your coarset level matrix would have equations ~ 2000. This could be quickly solved by each processors and would not add much cost. If partitioning was not good you would be looking at 2000x10 equations which also could be solved quickly at single processor. (I use BiCG + AMG single processor version to do so). Further one should use #2 or #3 AMG instead of #1 AMG because of these reasons: 1. Less coarse levels are generated. 2. Much higher convergence compared to #1 AMG. 3. Takes much less memory. 4. Can be used for difficult problems like FEM etc. That is 1 AMG fits all. This is why I no longer take interest in additive corrective AMG. It is easy but not so rewarding. |
||
August 22, 2011, 21:15 |
|
#48 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Wow, what a long post! Good! and Thank you. I would like to buy you a drink next time when we meet.
So far, ICCG works better than amg for me.
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
August 22, 2011, 21:52 |
|
#49 | |
Senior Member
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34 |
Quote:
For me, so far these are the best performing solvers that I have written: 1. For cartesian grid based DNS solver. (Immersed boundary). Converges in 1-3 iterations for very large cases like 3-4 billion cells. Uses SSOR + Direct solver (FFT + Block cyclic reduction). This is the fastest thing I have seen so far. I am happy that i designed it. 2. Full AMG for locally refined cartesian meshes , a solver for immersed boundary. Again can converge in 2-3 iterations of BiCG stablized algorithm for 250 - 500 million cells. Uses these things inside: Bi conjugate gradient, Gauss Seidel + FFT + Block cyclic reduction (Means uses all type of solvers in ONE). 3. Smoothed Aggregation Preconditioned bi Conjugate Gradient method. So far tested on 100-200 million points. Very fast but slower than #1 and #2 |
||
August 29, 2011, 15:56 |
|
#50 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
I am still testing a case ~100million mesh size, an unsteady problem for 1000 processors with infiniband support.
I am not sure which one is better, V or W, or F cycle? But last time I tried F cycle, there was obviously a floating point exception. And also I am still not clear how to set minCoarseEqns and nMaxLevels. Could someone in the know shed me some light on this? Thanks a lot.
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
August 29, 2011, 22:20 |
|
#51 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
By some tests so far for my smaller case (4M grids), V cycle is better.
minCoarseEqns I found <5 is best. But still the best performance comes from using ICCG directly. My goal is to reach One-Step-One-Second rule with Multigrid technique. So is there any FORMULA to set the minCoarseEqns etc.? Thanks
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China Last edited by lakeat; August 29, 2011 at 23:44. |
|
August 30, 2011, 20:46 |
|
#52 | |
Senior Member
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34 |
Quote:
If you are running 100 million with 1000 processors then I think you will not reach 'one time step' in 1 second. I am running a small test problem (not on openfoam) with 88 million cells (SIMPLE algo) , my timing with 32 processors is 70sec per time step (3 sub iterations, so per iteration 70/3 sec). With 3 sub iterations , the estimated time for 100 mill cells with my solver (assuming everything linear scaling) is ~ 2.5 second per time step. since my solver is not perfect, in theory you could achieve 1 step per second, but in practice i doubt that you will come even close to 1 time step in 5 - 10 seconds. |
||
August 31, 2011, 11:17 |
|
#53 | |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Quote:
It would be desirable that some day, One-Second-One-Second could be realized, which is real-time simulation. I am just feel surprised that in some situations directly using ICCG is better than using multigrid tech.
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
||
September 1, 2011, 01:13 |
|
#54 | |
Senior Member
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34 |
Quote:
yes in small cases ICCG is better than multigrid. What pre-conditioner are you using with ICCG. And is there any way for you to export that matrix in text form. If that happens I could try with one of my multigrid routines to see , whether multigrid can do it faster than what you are getting. |
||
September 9, 2011, 04:13 |
|
#55 | |
Member
Flavio Galeazzo
Join Date: Mar 2009
Location: Karlsruhe, Germany
Posts: 34
Rep Power: 18 |
Quote:
Recently I moved to a incompressible solver, and in that case the GAMG linear solver was far superior than the PCG for the pressure, the other variables stayed with the PBiCG. However, to my surprise, the scalability was very poor this time. I got good results up to 32 processors, with about 10 seconds computational time per time step, and increasing the number of processors do not improved the computational time. Reading now the ppt from Dr. Jasak, from the post of lakeat, it become clear that the problem is actually the GAMG linear solver. It is very unfortunate, since the GAMG linear solver is indeed very helpfull for the incompressible solver. |
||
September 9, 2011, 08:53 |
|
#56 | |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Quote:
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
||
September 9, 2011, 09:22 |
|
#57 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
@Flavio,
Thanks for sharing, and I have some similar reports here. -1- A good scalability is critical for us, since in our case, we do hope a real-time analysis some day (complex geometries of course), but so far we are just looking for a One-Step-One-Second (OSOS) rule to be realized. -2- I have a case with several millions hex cell, incompressible solver, hybrid turb model, it showed a very good scalability with PBICG for pressure, better than MG, this was tested on cpus up to ~100. But as I increased the mesh size to >10 million, the situation is no longer pleasant for me, the following item 3 and item 4 are what I found and did. -3- The solving process in the beginning in order to remove the initial transient effect seems to be different from the "normal" solving process (big difference in matrix??) and deserved to be paid attention to. I need advice here. (My question here is "Will potentialFoam can really produce a better start here??") I am not clear with what happening with the matrix. But for sure, you DO NOT want to use the PBICG at the very beginning, it would be disastrous and totally unacceptable (even floating point exception, FPE), no matter how you change the relTol. To use a well tuned MG method, with V-cycle worked better for me as a start. (Sorry for my poor english, hope you could understand what Im saying) -4- As the solving process continued for a while (tricky here), I tried to switch it back to the PBICG based on the experience got from my item 2 above, but this time, no avail, it would still ran for sometime and then crashed again somewhere sometime. I have no clue for this so far. I do not have time to dive into the mechanism behind this, it would be great if someone would shed some lights for me here on, 1. WHY PBICG (for pressure) wins out than MG for some large cases concerning scalaility? 2. What is the next move, what is the up-to-date status the better solution to achieve better scalability in CFD world apart from OF (for solvers that are like those in OF that works well for incompressible complex geometries, not some special solvers as arjun have wriiten ). 3. Is there anyone who has achieved a better parallel efficiency would be willing to contribute more experience? Thanks
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China Last edited by lakeat; September 9, 2011 at 09:37. |
|
September 9, 2011, 09:47 |
|
#58 |
Member
Flavio Galeazzo
Join Date: Mar 2009
Location: Karlsruhe, Germany
Posts: 34
Rep Power: 18 |
Hi lakeat,
Your experiences seems to be similar. Unfortunately I have no clue on how to improve the scalability of my incompressible solver. One workaround is to use a compressible solver for an incompressible case. It worked well for me. |
|
September 12, 2011, 20:04 |
|
#60 | |
Senior Member
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34 |
Quote:
I was also taking some break from CFD. Here is my email: arjun.yadav@yahoo.com |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Postprocessing large data sets in parallel | evrikon | OpenFOAM Post-Processing | 28 | June 28, 2016 03:43 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 05:36 |
Parelleling Efficiency | kassiotis | OpenFOAM | 0 | June 19, 2009 14:12 |
Parallel efficiency channel flow | maka | OpenFOAM Running, Solving & CFD | 1 | December 8, 2005 12:58 |
Post-processing of a large transient case | Flav | Siemens | 2 | September 28, 2004 06:19 |