OpenFOAM and CUDA
Dear All,
We announce that the SpeedIT Toolkit 0.9 has been released. The library has been internally deployed, tested and validated in a real scenario (blood flow in aortic bifurcation, data came from IT’IS Foundation, Switzerland). http://vratis.com/speedITblog/ The library contains the CG and BiCGSTAB solvers and has been tested with Nvidia GTX 2xx and the newest Tesla. The plugin for OpenFOAM is free and is licenced with GPL. Currently we work on AMG and LB solvers which should appear in 2Q of 2010. Best regards, Lukasz 
Post in OpenFOAM Forum
Great work, you might also consider making an announcement in the
OpenFOAM Announcements from Other Sources forum category. 
I have just done that. thanks

Great,
I am very interested in your work. It seems very promising. Would your libraries be usable for clusters of machines with CPU and GPus ? Thanks for sharing this Francois 
We finished the work and the official release is there. You can find the OpenFOAM plugin for GPUbased iterative solvers (Conjugate Gradient and BiCGSTAB) at speedit.vratis.com. Classic version of our library and the OpenFOAM plugin are both based on GPL. Enjoy!

Hi Lucasz,
I'm trying to use the speedit toolkit and downloaded the free classic version. I followed the README files and recompiled OpenFOAM in single precision. The icoFoam cavity tutorial runs with the PCG_accel solver, however it is about ten times slower than the normal PCG solver. Both are run in single precision with diagonal preconditioner. Below are the final iterations of both runs. Time = 0.5 Courant Number mean: 0.116925 max: 0.852129 DILUPBiCG: Solving for Ux, Initial residual = 2.4755e07, Final residual = 2.4755e07, No Iterations 0 DILUPBiCG: Solving for Uy, Initial residual = 4.45417e07, Final residual = 4.45417e07, No Iterations 0 diagonalPCG: Solving for p, Initial residual = 1.85634e06, Final residual = 8.29721e07, No Iterations 1 time step continuity errors : sum local = 1.37325e08, global = 2.27462e10, cumulative = 1.50401e09 diagonalPCG: Solving for p, Initial residual = 1.70986e06, Final residual = 8.12331e07, No Iterations 1 time step continuity errors : sum local = 1.43066e08, global = 2.99404e10, cumulative = 1.20461e09 ExecutionTime = 0.16 s ClockTime = 0 s And with the SpeedIt solver: Time = 0.5 Courant Number mean: 0.116925 max: 0.852129 DILUPBiCG: Solving for Ux, Initial residual = 2.2693e07, Final residual = 2.2693e07, No Iterations 0 DILUPBiCG: Solving for Uy, Initial residual = 4.88815e07, Final residual = 4.88815e07, No Iterations 0 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 5.64166e07, No Iterations 1 time step continuity errors : sum local = 2.09718e08, global = 1.48015e10, cumulative = 1.09157e10 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 8.90977e07, No Iterations 0 time step continuity errors : sum local = 2.34665e08, global = 1.11866e10, cumulative = 2.70921e12 ExecutionTime = 1.43 s ClockTime = 1 s Secondly, I wanted to try it on the simpleFoam pitzDaily case, but there I get the message: ERROR : solver function returned 1 For example the final iteration is: Time = 1000 DILUPBiCG: Solving for Ux, Initial residual = 1.87909e05, Final residual = 5.75224e08, No Iterations 2 DILUPBiCG: Solving for Uy, Initial residual = 0.000241922, Final residual = 9.22941e06, No Iterations 1 ERROR : solver function returned 1 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 5.57732e05, No Iterations 1000 time step continuity errors : sum local = 0.00215994, global = 1.51208e05, cumulative = 16.2167 DILUPBiCG: Solving for epsilon, Initial residual = 4.7183e05, Final residual = 4.28301e06, No Iterations 1 DILUPBiCG: Solving for k, Initial residual = 8.54257e05, Final residual = 3.38759e06, No Iterations 1 ExecutionTime = 818.05 s ClockTime = 820 s Here I have to say that the normal single precision simpleFoam does not even work for this tutorial. With PCG_accel the tutorial can be run, however with the error message. I'm therefore not sure if this error message is resulting from PCG_accel. Here the single pression with PCG_accel is about 4 times slower than the normal double precision PCG (179.52 s). Can you explain why the accelerated solver is slower than the normal solver? Best regards, Alex. 
Dear Alex,
Before we can comment performance results you obtained, we should know your hardware configuration. Please remember, that even the most powerful GPUs are only about ten times faster than modern CPUs. Next, in your example accelerated solver converges after 0 or 1 iteration. In this case most of the time in solver routine is spent on data transfer between CPU and GPU, not on computations on GPU side. We described this phenomena thoroughly in documentation  on one of the fastest GPUs we obtained small performance increase, when one solver iteration was done. Performance gain was significantly larger, when dozens of solver iterations were required. The pitzDaily example shows, that both solvers (OpenFOAM and SPeedIT) does not converge in required number of iterations. However, it seems that our solver could converge in larger number of iterations. I can not comment performance comparison, because OpenFOAM DOUBLE precision solver converges in much less number of iterations than our SINGLE precision solver. I think that comparison with our double precision solver should be done. Sincerely SpeedIT team 
Thanks for the reply.
I thought indeed that it was overhead in the first case. Unfortunately the combination of PCG/PCG_accel and diagonal/none preconditioning doesn't converge properly for the testcases I'm interested in (airfoil computations at the moment). So a good comparison on that part is not possible. As preconditioner for PCG I use GAMG or DIC, but I prefer GAMG as a solver actually. How is the progress in making GAMG run on the GPU? For potentialFoam on an airfoil, I also see the error: ERROR : solver function returned 1 I ran the cylinder tutorial of potentialFoam with PCG/PCG_accel using diagonal preconditioning. There it worked, although the accelerated version was slower. But I think it as to do with my hardware. I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU. Due to our preinstalled system, I could not run with the latest driver and CUDA version. Currently I use driver 195.36.15 with CUDA 3.0. I didn't expect a huge speedup here, but perhaps a little bit. Do you expect any speedup for such a configuration? I saw something strange on the log files of potentialFoam. This is the normal PCG log: Calculating potential flow diagonalPCG: Solving for p, Initial residual = 1, Final residual = 9.58282e07, No Iterations 305 diagonalPCG: Solving for p, Initial residual = 0.0126598, Final residual = 9.57773e07, No Iterations 205 diagonalPCG: Solving for p, Initial residual = 0.00273797, Final residual = 9.74167e07, No Iterations 188 diagonalPCG: Solving for p, Initial residual = 0.00101243, Final residual = 9.71138e07, No Iterations 185 continuity error = 0.000682 Interpolated U error = 2.76476e05 ExecutionTime = 0.06 s ClockTime = 0 s And this is the PCG_accel log: Calculating potential flow diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.49939e07, No Iterations 247 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.0697e07, No Iterations 240 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.77372e07, No Iterations 231 diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.58744e07, No Iterations 223 continuity error = 0.000709772 Interpolated U error = 2.76527e05 ExecutionTime = 0.43 s ClockTime = 0 s Why is the initial residual for every pressure loop starting at 0, while in the normal solver it starts from a lower level? It doesn't seem to affect the results much, but the number of iterations increase since it starts at a higher level. Alex. 
Particle Tracking on GPU and Couple with OpenFOAM
Hey,
has anybody tried this? I think, it would be extremely interesting to have a simple particle tracking solver in OpenFOAM that uses the GPU. cheers! 
Openfoam Plugin speedIT Installation
Dear Friends
I had successfuly compiled CUDA 3.2 by the link below: Then I downloaded files OpenFOAM_Plugin_1.1 and SpeedIT_Classic from the site : speedit.vratis.com But unfortunately I don't know how to compile them. There is a Readme file in OpenFOAM_Plugin_1.1 that says to do these things. 1 cd $WM_PROJECT_USER_DIR 2 svn checkout https://62.87.249.40/repos/speedIT/b...SpeedIT_plugin But in this step, id and password being required that I don't know!! Anyone can help?!!! Anyone know how to compile them?!!! Thank you 

compilation
Dear Lukasz
I also downloaded this package and in readme of this package there is something the same that required ID and password. svn checkout https://62.87.249.40/repos/speedIT/branches/1.0/OpenFOAM_SpeedIT_plugin Is there any other way to compile this package?! 
Quote:
No, you don't need to recompile it. It is a plugin, just follow the installation instructions in order to run it. 
Unsuccessful!
Dear Lukasz
I downloaded the folder 1.2.Classic from sourceforge.net. As you told yourself the readMe file ( and so Installation instruction in it) seems to be out of date. I tried to Install it as it was mentioned in readMe file but it was unsuccessful. would you please send me a note or link me installation steps. Thank you Mohammadreza 
Dear Lukasz
I've compiled the classic one. But when I want to test that with icoFoam I get this error: Create time Create mesh for time = 0 Reading transportProperties Reading field p Reading field U Reading/calculating face flux field phi Starting time loop Time = 0.005 Courant Number mean: 0 max: 0 WARNING : Unsupported preconditioner DILU, using NO preconditioner. > FOAM FATAL ERROR: Call your own solver from PBiCG_accel.C file FOAM exiting What should I do?!!! 
PCG_accel diverging?
Dear Alex,
I'm also trying the SpeedIt solver, in my case on interDyMFoam (damBreakwithObstacle case). I've got the same problems you experienced. The accelerated solver (PCG_accel) is much slower than the normal one (PCG) (Maybe a hardware problem due to my quite old graphics card!) and  what is more important  the computation stops after a few iterations as it is not converging! The normal solver runs fine. Have you found the reason for this and any solution? Best regards Andreas Quote:

@Andreas:
In my case the old graphics cards is clearly the bottle neck. The communication from and to the card is too slow. I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU. One a newer card the performance should be better. With the free version it is however difficult to compare results. You have to run everything in single precision and you cannot use good preconditioners. 
Dear Alex,
thank you for your quick reply! I don't bother too much with the speed. The main problem is the convergence of the results. If a use the same preconditioner (diagonal) for both calculations (pcg and pcg_accel), pcg converges while pcg_accel diverges. Do you know why? Thanks Andreas 
Dear Andreas,
Sorry, I don't have an answer on that. Actually, I observed exactly the opposite. I had a simulation where the normal pcg diverged and the pcg_accel converged. Best regards, Alex. 
Now, we have AMG preconditioner if you are interested. It converges faster than DIC and DILU according to our preliminary tests.
See http://wp.me/p1ZihD1V for details 
All times are GMT 4. The time now is 02:35. 