
[Sponsors] 
March 23, 2010, 18:20 
OpenFOAM and CUDA

#1 
Member

Dear All,
We announce that the SpeedIT Toolkit 0.9 has been released. The library has been internally deployed, tested and validated in a real scenario (blood flow in aortic bifurcation, data came from IT’IS Foundation, Switzerland). http://vratis.com/speedITblog/ The library contains the CG and BiCGSTAB solvers and has been tested with Nvidia GTX 2xx and the newest Tesla. The plugin for OpenFOAM is free and is licenced with GPL. Currently we work on AMG and LB solvers which should appear in 2Q of 2010. Best regards, Lukasz 

March 24, 2010, 09:35 
Post in OpenFOAM Forum

#2 
Senior Member
Richard Smith
Join Date: Mar 2009
Location: Enfield, NH, USA
Posts: 138
Blog Entries: 4
Rep Power: 9 
Great work, you might also consider making an announcement in the
OpenFOAM Announcements from Other Sources forum category.
__________________
Symscape, Computational Fluid Dynamics for all 

March 25, 2010, 15:40 

#3 
Member

I have just done that. thanks


June 3, 2010, 14:36 

#4 
Member
Francois Gallard
Join Date: Mar 2010
Location: Edinburgh
Posts: 39
Rep Power: 8 
Great,
I am very interested in your work. It seems very promising. Would your libraries be usable for clusters of machines with CPU and GPus ? Thanks for sharing this Francois 

July 8, 2010, 12:18 

#5 
Member

We finished the work and the official release is there. You can find the OpenFOAM plugin for GPUbased iterative solvers (Conjugate Gradient and BiCGSTAB) at speedit.vratis.com. Classic version of our library and the OpenFOAM plugin are both based on GPL. Enjoy!


October 6, 2010, 06:05 

#6 
Member
Alex
Join Date: Apr 2010
Posts: 32
Rep Power: 8 
Hi Lucasz,
I'm trying to use the speedit toolkit and downloaded the free classic version. I followed the README files and recompiled OpenFOAM in single precision. The icoFoam cavity tutorial runs with the PCG_accel solver, however it is about ten times slower than the normal PCG solver. Both are run in single precision with diagonal preconditioner. Below are the final iterations of both runs. Time = 0.5 Courant Number mean: 0.116925 max: 0.852129 DILUPBiCG: Solving for Ux, Initial residual = 2.4755e07, Final residual = 2.4755e07, No Iterations 0 DILUPBiCG: Solving for Uy, Initial residual = 4.45417e07, Final residual = 4.45417e07, No Iterations 0 diagonalPCG: Solving for p, Initial residual = 1.85634e06, Final residual = 8.29721e07, No Iterations 1 time step continuity errors : sum local = 1.37325e08, global = 2.27462e10, cumulative = 1.50401e09 diagonalPCG: Solving for p, Initial residual = 1.70986e06, Final residual = 8.12331e07, No Iterations 1 time step continuity errors : sum local = 1.43066e08, global = 2.99404e10, cumulative = 1.20461e09 ExecutionTime = 0.16 s ClockTime = 0 s And with the SpeedIt solver: Time = 0.5 Courant Number mean: 0.116925 max: 0.852129 DILUPBiCG: Solving for Ux, Initial residual = 2.2693e07, Final residual = 2.2693e07, No Iterations 0 DILUPBiCG: Solving for Uy, Initial residual = 4.88815e07, Final residual = 4.88815e07, No Iterations 0 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 5.64166e07, No Iterations 1 time step continuity errors : sum local = 2.09718e08, global = 1.48015e10, cumulative = 1.09157e10 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 8.90977e07, No Iterations 0 time step continuity errors : sum local = 2.34665e08, global = 1.11866e10, cumulative = 2.70921e12 ExecutionTime = 1.43 s ClockTime = 1 s Secondly, I wanted to try it on the simpleFoam pitzDaily case, but there I get the message: ERROR : solver function returned 1 For example the final iteration is: Time = 1000 DILUPBiCG: Solving for Ux, Initial residual = 1.87909e05, Final residual = 5.75224e08, No Iterations 2 DILUPBiCG: Solving for Uy, Initial residual = 0.000241922, Final residual = 9.22941e06, No Iterations 1 ERROR : solver function returned 1 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 5.57732e05, No Iterations 1000 time step continuity errors : sum local = 0.00215994, global = 1.51208e05, cumulative = 16.2167 DILUPBiCG: Solving for epsilon, Initial residual = 4.7183e05, Final residual = 4.28301e06, No Iterations 1 DILUPBiCG: Solving for k, Initial residual = 8.54257e05, Final residual = 3.38759e06, No Iterations 1 ExecutionTime = 818.05 s ClockTime = 820 s Here I have to say that the normal single precision simpleFoam does not even work for this tutorial. With PCG_accel the tutorial can be run, however with the error message. I'm therefore not sure if this error message is resulting from PCG_accel. Here the single pression with PCG_accel is about 4 times slower than the normal double precision PCG (179.52 s). Can you explain why the accelerated solver is slower than the normal solver? Best regards, Alex. 

October 6, 2010, 09:59 

#7 
Member

Dear Alex,
Before we can comment performance results you obtained, we should know your hardware configuration. Please remember, that even the most powerful GPUs are only about ten times faster than modern CPUs. Next, in your example accelerated solver converges after 0 or 1 iteration. In this case most of the time in solver routine is spent on data transfer between CPU and GPU, not on computations on GPU side. We described this phenomena thoroughly in documentation  on one of the fastest GPUs we obtained small performance increase, when one solver iteration was done. Performance gain was significantly larger, when dozens of solver iterations were required. The pitzDaily example shows, that both solvers (OpenFOAM and SPeedIT) does not converge in required number of iterations. However, it seems that our solver could converge in larger number of iterations. I can not comment performance comparison, because OpenFOAM DOUBLE precision solver converges in much less number of iterations than our SINGLE precision solver. I think that comparison with our double precision solver should be done. Sincerely SpeedIT team 

October 7, 2010, 03:08 

#8 
Member
Alex
Join Date: Apr 2010
Posts: 32
Rep Power: 8 
Thanks for the reply.
I thought indeed that it was overhead in the first case. Unfortunately the combination of PCG/PCG_accel and diagonal/none preconditioning doesn't converge properly for the testcases I'm interested in (airfoil computations at the moment). So a good comparison on that part is not possible. As preconditioner for PCG I use GAMG or DIC, but I prefer GAMG as a solver actually. How is the progress in making GAMG run on the GPU? For potentialFoam on an airfoil, I also see the error: ERROR : solver function returned 1 I ran the cylinder tutorial of potentialFoam with PCG/PCG_accel using diagonal preconditioning. There it worked, although the accelerated version was slower. But I think it as to do with my hardware. I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU. Due to our preinstalled system, I could not run with the latest driver and CUDA version. Currently I use driver 195.36.15 with CUDA 3.0. I didn't expect a huge speedup here, but perhaps a little bit. Do you expect any speedup for such a configuration? I saw something strange on the log files of potentialFoam. This is the normal PCG log: Calculating potential flow diagonalPCG: Solving for p, Initial residual = 1, Final residual = 9.58282e07, No Iterations 305 diagonalPCG: Solving for p, Initial residual = 0.0126598, Final residual = 9.57773e07, No Iterations 205 diagonalPCG: Solving for p, Initial residual = 0.00273797, Final residual = 9.74167e07, No Iterations 188 diagonalPCG: Solving for p, Initial residual = 0.00101243, Final residual = 9.71138e07, No Iterations 185 continuity error = 0.000682 Interpolated U error = 2.76476e05 ExecutionTime = 0.06 s ClockTime = 0 s And this is the PCG_accel log: Calculating potential flow diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.49939e07, No Iterations 247 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.0697e07, No Iterations 240 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.77372e07, No Iterations 231 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.58744e07, No Iterations 223 continuity error = 0.000709772 Interpolated U error = 2.76527e05 ExecutionTime = 0.43 s ClockTime = 0 s Why is the initial residual for every pressure loop starting at 0, while in the normal solver it starts from a lower level? It doesn't seem to affect the results much, but the number of iterations increase since it starts at a higher level. Alex. 

October 9, 2010, 14:31 
Particle Tracking on GPU and Couple with OpenFOAM

#9 
Member
Stefan Radl
Join Date: Mar 2009
Location: Graz, Austria
Posts: 82
Rep Power: 10 
Hey,
has anybody tried this? I think, it would be extremely interesting to have a simple particle tracking solver in OpenFOAM that uses the GPU. cheers! 

December 13, 2010, 11:38 
Openfoam Plugin speedIT Installation

#10 
Member
Mohammad.R.Shetab
Join Date: Jul 2010
Posts: 49
Rep Power: 7 
Dear Friends
I had successfuly compiled CUDA 3.2 by the link below: Then I downloaded files OpenFOAM_Plugin_1.1 and SpeedIT_Classic from the site : speedit.vratis.com But unfortunately I don't know how to compile them. There is a Readme file in OpenFOAM_Plugin_1.1 that says to do these things. 1 cd $WM_PROJECT_USER_DIR 2 svn checkout https://62.87.249.40/repos/speedIT/b...SpeedIT_plugin But in this step, id and password being required that I don't know!! Anyone can help?!!! Anyone know how to compile them?!!! Thank you 

December 13, 2010, 12:53 

#11 
Member


December 13, 2010, 13:15 
compilation

#12 
Member
Mohammad.R.Shetab
Join Date: Jul 2010
Posts: 49
Rep Power: 7 
Dear Lukasz
I also downloaded this package and in readme of this package there is something the same that required ID and password. svn checkout https://62.87.249.40/repos/speedIT/branches/1.0/OpenFOAM_SpeedIT_plugin Is there any other way to compile this package?! 

December 13, 2010, 16:07 

#13  
Member

Quote:
No, you don't need to recompile it. It is a plugin, just follow the installation instructions in order to run it. 

December 14, 2010, 18:29 
Unsuccessful!

#14 
Member
Mohammad.R.Shetab
Join Date: Jul 2010
Posts: 49
Rep Power: 7 
Dear Lukasz
I downloaded the folder 1.2.Classic from sourceforge.net. As you told yourself the readMe file ( and so Installation instruction in it) seems to be out of date. I tried to Install it as it was mentioned in readMe file but it was unsuccessful. would you please send me a note or link me installation steps. Thank you Mohammadreza 

February 23, 2011, 20:55 

#15 
Member
Mohammad.R.Shetab
Join Date: Jul 2010
Posts: 49
Rep Power: 7 
Dear Lukasz
I've compiled the classic one. But when I want to test that with icoFoam I get this error: Create time Create mesh for time = 0 Reading transportProperties Reading field p Reading field U Reading/calculating face flux field phi Starting time loop Time = 0.005 Courant Number mean: 0 max: 0 WARNING : Unsupported preconditioner DILU, using NO preconditioner. > FOAM FATAL ERROR: Call your own solver from PBiCG_accel.C file FOAM exiting What should I do?!!! 

April 7, 2011, 03:28 
PCG_accel diverging?

#16  
New Member
Andreas Otto
Join Date: Sep 2009
Posts: 11
Rep Power: 8 
Dear Alex,
I'm also trying the SpeedIt solver, in my case on interDyMFoam (damBreakwithObstacle case). I've got the same problems you experienced. The accelerated solver (PCG_accel) is much slower than the normal one (PCG) (Maybe a hardware problem due to my quite old graphics card!) and  what is more important  the computation stops after a few iterations as it is not converging! The normal solver runs fine. Have you found the reason for this and any solution? Best regards Andreas Quote:


April 8, 2011, 05:51 

#17 
Member
Alex
Join Date: Apr 2010
Posts: 32
Rep Power: 8 
@Andreas:
In my case the old graphics cards is clearly the bottle neck. The communication from and to the card is too slow. I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU. One a newer card the performance should be better. With the free version it is however difficult to compare results. You have to run everything in single precision and you cannot use good preconditioners. 

April 11, 2011, 06:34 

#18 
New Member
Andreas Otto
Join Date: Sep 2009
Posts: 11
Rep Power: 8 
Dear Alex,
thank you for your quick reply! I don't bother too much with the speed. The main problem is the convergence of the results. If a use the same preconditioner (diagonal) for both calculations (pcg and pcg_accel), pcg converges while pcg_accel diverges. Do you know why? Thanks Andreas 

April 11, 2011, 07:43 

#19 
Member
Alex
Join Date: Apr 2010
Posts: 32
Rep Power: 8 
Dear Andreas,
Sorry, I don't have an answer on that. Actually, I observed exactly the opposite. I had a simulation where the normal pcg diverged and the pcg_accel converged. Best regards, Alex. 

April 18, 2012, 14:54 

#20 
Member

Now, we have AMG preconditioner if you are interested. It converges faster than DIC and DILU according to our preliminary tests.
See http://wp.me/p1ZihD1V for details Last edited by Lukasz; April 26, 2012 at 15:27. 

Tags 
cuda, gpu, speedit 
Thread Tools  
Display Modes  


Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
OpenFOAM and OpenCL  Arrow  OpenFOAM  6  October 26, 2009 11:25 