CFD Online Discussion Forums - OpenFOAM and CUDA

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - OpenFOAM and CUDA (https://www.cfd-online.com/Forums/hardware/74063-openfoam-cuda.html)

OpenFOAM and CUDA

Dear All,

We announce that the SpeedIT Toolkit 0.9 has been released. The library has been internally deployed, tested and validated in a real scenario (blood flow in aortic bifurcation, data came from IT’IS Foundation, Switzerland).

http://vratis.com/speedITblog/

The library contains the CG and BiCGSTAB solvers and has been tested with Nvidia GTX 2xx and the newest Tesla.
The plugin for OpenFOAM is free and is licenced with GPL.

Currently we work on AMG and LB solvers which should appear in 2Q of 2010.

Best regards,
Lukasz

Post in OpenFOAM Forum

Great work, you might also consider making an announcement in the
OpenFOAM Announcements from Other Sources forum category.

I have just done that. thanks

Great,
I am very interested in your work. It seems very promising. Would your libraries be usable for clusters of machines with CPU and GPus ?

Thanks for sharing this

Francois

We finished the work and the official release is there. You can find the OpenFOAM plugin for GPU-based iterative solvers (Conjugate Gradient and BiCGSTAB) at speedit.vratis.com. Classic version of our library and the OpenFOAM plugin are both based on GPL. Enjoy!

Hi Lucasz,

I'm trying to use the speedit toolkit and downloaded the free classic version.
I followed the README files and recompiled OpenFOAM in single precision.

The icoFoam cavity tutorial runs with the PCG_accel solver, however it is about ten times slower than the normal PCG solver. Both are run in single precision with diagonal preconditioner. Below are the final iterations of both runs.
Time = 0.5

Courant Number mean: 0.116925 max: 0.852129
DILUPBiCG: Solving for Ux, Initial residual = 2.4755e-07, Final residual = 2.4755e-07, No Iterations 0
DILUPBiCG: Solving for Uy, Initial residual = 4.45417e-07, Final residual = 4.45417e-07, No Iterations 0
diagonalPCG: Solving for p, Initial residual = 1.85634e-06, Final residual = 8.29721e-07, No Iterations 1
time step continuity errors : sum local = 1.37325e-08, global = 2.27462e-10, cumulative = 1.50401e-09
diagonalPCG: Solving for p, Initial residual = 1.70986e-06, Final residual = 8.12331e-07, No Iterations 1
time step continuity errors : sum local = 1.43066e-08, global = -2.99404e-10, cumulative = 1.20461e-09
ExecutionTime = 0.16 s ClockTime = 0 s

And with the SpeedIt solver:
Time = 0.5

Courant Number mean: 0.116925 max: 0.852129
DILUPBiCG: Solving for Ux, Initial residual = 2.2693e-07, Final residual = 2.2693e-07, No Iterations 0
DILUPBiCG: Solving for Uy, Initial residual = 4.88815e-07, Final residual = 4.88815e-07, No Iterations 0
diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 5.64166e-07, No Iterations 1
time step continuity errors : sum local = 2.09718e-08, global = -1.48015e-10, cumulative = -1.09157e-10
diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 8.90977e-07, No Iterations 0
time step continuity errors : sum local = 2.34665e-08, global = 1.11866e-10, cumulative = 2.70921e-12
ExecutionTime = 1.43 s ClockTime = 1 s

Secondly, I wanted to try it on the simpleFoam pitzDaily case, but there I get the message:
ERROR : solver function returned -1

For example the final iteration is:
Time = 1000

DILUPBiCG: Solving for Ux, Initial residual = 1.87909e-05, Final residual = 5.75224e-08, No Iterations 2
DILUPBiCG: Solving for Uy, Initial residual = 0.000241922, Final residual = 9.22941e-06, No Iterations 1
ERROR : solver function returned -1
diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 5.57732e-05, No Iterations 1000
time step continuity errors : sum local = 0.00215994, global = -1.51208e-05, cumulative = 16.2167
DILUPBiCG: Solving for epsilon, Initial residual = 4.7183e-05, Final residual = 4.28301e-06, No Iterations 1
DILUPBiCG: Solving for k, Initial residual = 8.54257e-05, Final residual = 3.38759e-06, No Iterations 1
ExecutionTime = 818.05 s ClockTime = 820 s

Here I have to say that the normal single precision simpleFoam does not even work for this tutorial. With PCG_accel the tutorial can be run, however with the error message. I'm therefore not sure if this error message is resulting from PCG_accel. Here the single pression with PCG_accel is about 4 times slower than the normal double precision PCG (179.52 s).

Can you explain why the accelerated solver is slower than the normal solver?

Best regards,
Alex.

Dear Alex,

Before we can comment performance results you obtained, we should know your hardware configuration.
Please remember, that even the most powerful GPUs are only about ten times faster than modern CPUs.

Next, in your example accelerated solver converges after 0 or 1 iteration. In this case most of the time in solver routine is spent on data transfer between CPU and GPU, not on computations on GPU side. We described this phenomena thoroughly in documentation - on one of the fastest GPUs we obtained small performance increase, when one solver iteration was done. Performance gain was significantly larger, when dozens of solver iterations were required.

The pitzDaily example shows, that both solvers (OpenFOAM and SPeedIT) does not converge in required number of iterations.
However, it seems that our solver could converge in larger number of iterations.
I can not comment performance comparison, because OpenFOAM DOUBLE precision solver converges in much less number of iterations than our SINGLE precision solver. I think that comparison with our double precision solver should be done.

Sincerely
SpeedIT team

Thanks for the reply.

I thought indeed that it was overhead in the first case. Unfortunately the combination of PCG/PCG_accel and diagonal/none preconditioning doesn't converge properly for the testcases I'm interested in (airfoil computations at the moment). So a good comparison on that part is not possible. As preconditioner for PCG I use GAMG or DIC, but I prefer GAMG as a solver actually. How is the progress in making GAMG run on the GPU?
For potentialFoam on an airfoil, I also see the error:
ERROR : solver function returned -1

I ran the cylinder tutorial of potentialFoam with PCG/PCG_accel using diagonal preconditioning. There it worked, although the accelerated version was slower. But I think it as to do with my hardware.
I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU. Due to our preinstalled system, I could not run with the latest driver and CUDA version. Currently I use driver 195.36.15 with CUDA 3.0. I didn't expect a huge speedup here, but perhaps a little bit. Do you expect any speed-up for such a configuration?

I saw something strange on the log files of potentialFoam.
This is the normal PCG log:
Calculating potential flow
diagonalPCG: Solving for p, Initial residual = 1, Final residual = 9.58282e-07, No Iterations 305
diagonalPCG: Solving for p, Initial residual = 0.0126598, Final residual = 9.57773e-07, No Iterations 205
diagonalPCG: Solving for p, Initial residual = 0.00273797, Final residual = 9.74167e-07, No Iterations 188
diagonalPCG: Solving for p, Initial residual = 0.00101243, Final residual = 9.71138e-07, No Iterations 185
continuity error = 0.000682
Interpolated U error = 2.76476e-05
ExecutionTime = 0.06 s ClockTime = 0 s

And this is the PCG_accel log:
Calculating potential flow
diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.49939e-07, No Iterations 247
diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.0697e-07, No Iterations 240
diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.77372e-07, No Iterations 231
diagonalPCG_accel: Solving for p, Initial residual = 0, Final residual = 9.58744e-07, No Iterations 223
continuity error = 0.000709772
Interpolated U error = 2.76527e-05
ExecutionTime = 0.43 s ClockTime = 0 s

Why is the initial residual for every pressure loop starting at 0, while in the normal solver it starts from a lower level? It doesn't seem to affect the results much, but the number of iterations increase since it starts at a higher level.

Alex.

Particle Tracking on GPU and Couple with OpenFOAM

Hey,

has anybody tried this?

I think, it would be extremely interesting to have a simple particle tracking solver in OpenFOAM that uses the GPU.

cheers!

Openfoam Plugin speedIT Installation

Dear Friends

I had successfuly compiled CUDA 3.2 by the link below:

Then I downloaded files OpenFOAM_Plugin_1.1 and SpeedIT_Classic from the site : speedit.vratis.com

But unfortunately I don't know how to compile them. There is a Readme file in OpenFOAM_Plugin_1.1 that says to do these things.

1- cd $WM_PROJECT_USER_DIR

2- svn checkout https://62.87.249.40/repos/speedIT/b...SpeedIT_plugin

But in this step, id and password being required that I don't know!!

Anyone can help?!!!
Anyone know how to compile them?!!!

Thank you

Dear Lukasz

I also downloaded this package and in readme of this package there is something the same that required ID and password.

svn checkout https://62.87.249.40/repos/speedIT/branches/1.0/OpenFOAM_SpeedIT_plugin

Is there any other way to compile this package?!

Quote:

Originally Posted by mrshb4 (Post 287254)

Dear Lukasz

svn checkout https://62.87.249.40/repos/speedIT/branches/1.0/OpenFOAM_SpeedIT_plugin

Is there any other way to compile this package?!

Thanks for finding the outdated information in our documentation.

No, you don't need to recompile it. It is a plugin, just follow the installation instructions in order to run it.

Dear Lukasz

I downloaded the folder 1.2.Classic from sourceforge.net. As you told yourself the readMe file ( and so Installation instruction in it) seems to be out of date. I tried to Install it as it was mentioned in readMe file but it was unsuccessful. would you please send me a note or link me installation steps.

Thank you
Mohammadreza

Dear Lukasz

I've compiled the classic one. But when I want to test that with icoFoam I get this error:

Create time

Create mesh for time = 0

Reading transportProperties

Reading field p

Reading field U

Reading/calculating face flux field phi

Starting time loop

Time = 0.005

Courant Number mean: 0 max: 0
WARNING : Unsupported preconditioner DILU, using NO preconditioner.

--> FOAM FATAL ERROR:
Call your own solver from PBiCG_accel.C file

FOAM exiting

What should I do?!!!

PCG_accel diverging?

Dear Alex,
I'm also trying the SpeedIt solver, in my case on interDyMFoam (damBreakwithObstacle case). I've got the same problems you experienced.
The accelerated solver (PCG_accel) is much slower than the normal one (PCG) (Maybe a hardware problem due to my quite old graphics card!) and - what is more important - the computation stops after a few iterations as it is not converging! The normal solver runs fine.
Have you found the reason for this and any solution?

Best regards

Andreas

Quote:

Originally Posted by aloeven (Post 278056)

@Andreas:

In my case the old graphics cards is clearly the bottle neck. The communication from and to the card is too slow. I'm running on a Quadro FX 1700 card with 512 MB memory. The clock speed is only 400Mhz for the memory and 460 Mhz for the GPU.

One a newer card the performance should be better. With the free version it is however difficult to compare results. You have to run everything in single precision and you cannot use good preconditioners.

Dear Alex,
thank you for your quick reply! I don't bother too much with the speed. The main problem is the convergence of the results. If a use the same preconditioner (diagonal) for both calculations (pcg and pcg_accel), pcg converges while pcg_accel diverges. Do you know why?

Thanks

Andreas

Dear Andreas,

Sorry, I don't have an answer on that. Actually, I observed exactly the opposite. I had a simulation where the normal pcg diverged and the pcg_accel converged.

Best regards,
Alex.

Now, we have AMG preconditioner if you are interested. It converges faster than DIC and DILU according to our preliminary tests.
See http://wp.me/p1ZihD-1V for details

Full acceleration on GPU vs. partial

Our previous posts with acceleration of linear part of the flow solver were not promising. The conclusion is that partial acceleration is not the way to go.

This is the reason why we focused on implementing the RANS single-phase flow solver directly on GPU. The results are quite promising:

Motorbike, 6.5M cells, aero flow: simpleFoam:

CPU: 9188 sec.
GPU: 2914 sec.
Acceleration: 3.1x.

See this link for more results.

Lukasz,

This looks promising, but until you support LES/DES and multi-gpu on multiple computers, there are very few people that can make use of it. There aren't a whole lot of people running 7 million cell RANS simulations anymore.

Thanks Kyle for your honest opinion.
We are already working on multi-GPU.

Which following LES/DES models do you think would be attractive for the community?

Quote:

Originally Posted by kyle (Post 509862)

Personally I use IDDES for automotive external aerodynamics simulations, which I think is pretty common.