CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - GPU selection (https://www.cfd-online.com/Forums/hardware/92154-gpu-selection.html)

Hi

I would like to rewrite my parallel 3d CFD code using GPU computing techniques. I have doubts

Which GPU is best in terms of

Cost/performance
Easiest to start programming

What is OPENCL
what is the difference between opencl and cuda

Pls help

I think, OpenCL is a more or less C-like API. There's an implemention for ATI (AMD) graphic cards. Cuda is an API for Nvidia graphic cards which is based on OpenCL, so it should be pretty much the same (although it's not compatible as far as I know).

Which graphics card performs best depends much on the application. There are applications where ATI cards are much faster than Nvidia cards and vice versa. And it also might be possible, the same application performs better or worse on the same graphics card when running on all shaders or not. I suspect, it's impossible to give an universal statement. Purchase the card you like the best and give it a try. Or rewrite your code and ask some members here to run a benchmark (I could run it on a Radeon HD5870).

When you got it working, could you please post some performance data? I'm highly interested in performance of massive parallelization on many specialitzed gpu processors compared to cpu computing on less versatile cores.

Thank you for kind advice

I was away for sometime,
After little bit googling I found that there are two streams of GPU programming available for scientific computing.
NVIDIA CUDA and AMD's OPENCL . CUDA is well matured with lot of tutorials and tools while AMD's is emerging. Some Describe CUDA as "excellent car with bad engine and OPENCL as bad car with good engine" is that correct ?

As a beginer like me I can easily switch to CUDA . However I fear OPENCL(AMD) will soon overtake CUDA.

I am very curious about latest trends so that my switching will not fail.

Performance and precision

Hi Foamers,

I understand that it might be hard to give a general recommendation for high performance using GPUs. However, is it possible to compare performance of a GPU with a standard single core CPU? I know the question is very unprecise, but I would consider +/- 20% as comparable performance.

Secondly, I read on http://www.symscape.com/gpu-0-2-openfoam about single precision of the calculations which are performed on GPUs. Are there experiences comparing results achieved on GPUs with the ones done on CPUs?

Cheers
Stefan

The spread is very wide. Even when just looking on CPU's, there is an enormous spread in performance from the slowest to the fastest one.
The same with GPU's.
So comparing a slow GPU with a fast CPU or vice versa will make a huge difference.

Anyway, one can say GPU's are a lot faster than CPU's. I'm not aware of today's high performance CPU's, but my GPU (Radeon HD5870) delivers up to 544 GFLOPS double precision while a i7 Sandy Bridge 3,4GHz has a peak performance of only 102 GFLOPS on all 4 cores! Average performance of the most older CPU's is about half the peak performance, assuming this is on this CPU the same, it would running on all 4 cores be 10 times slower than the 1.5 years old graphics card.
In single precision, the Radeon 5870 goes up to over 2.7 TFLOPS.

GPU's are much faster than CPU's. But it is nearly impossible to make any statement like "a GPU is 10 times faster than a CPU", since for example an older Nehalem based i7 3,2GHz has only 50% peak performance of the Sandy Bridge one.

But the high performance of graphics cards is based on massive parallelization. CFD-codes usually don't scale that good, therefore you might get an other result when comparing performance of CPU and GPU in a specific application. It also depends from a lot of things like memory ultilization (memory controllers are included in the CPU, so it can access memory faster than the graphics card when using more than the graphics memory), scalability of the code, performance of the stream processors on specific operations, the code itself, compiler etc...
Especially the problem itself has a quite big impact. Solving a code where all partitions can be independently solved performs well on GPU's. In theory, a linear code which can not be independently solves performs worse. And the more non-linear it gets, the worse it gets as communication efforts rise.

Single precision operations carried out by a GPU should give the same result when carried out on a CPU, there should be no difference in the results.

Dear abdul099,

thank you for your detailed answer. I have an impression on the performance, which let me conclude, that with this regard, for some cases using GPU's can turn out to be a good way to increase computational power and is worth a try.

In terms of precision, please let me specify my question a little:
From the link http://www.symscape.com/gpu-0-2-openfoam I assume that on GPU's you can only have single precision. Usually, OpenFoam on CPUs comes with double precision. I have no feeling about the impact of the precision on a result after thousands of iterations. When I compare both results (CPU/double and GPU/single), might there be a notable difference? Here, I consider a notable difference to be > 1%.

Best regards
Stefan

There are GPU's running double precision. I know for sure for the ATI Radeon HD-5000 series (I own one of this cards) and the HD-6000 series. I'm nearly sure, all Nvidia cards with about the same age are supporting double precision as well. Maybe even older cards from both companies, but I don't know which series was the first one.

Single or double precision can have a significant impact. It can happen, a single precision run diverges while a double precision one converges well. Of course, not that often, but I did already see it on my own. How much the solution of a SP run differs from a DP run can't be determined without testing, but there will be a difference nearly for sure (except small trivial cases which can be run to a perfect convergence).

When all goes fine, a double precision run should take less iterations, but for cost of more time per iteration. It takes more memory, which can cause problems on big cases which barley fit into the memory and communication bandwidth between processes becomes more important. The result file takes more space on hard disk. And it makes only sense when using a higher order discretization scheme which are more unstable than first order schemes.
So keep in mind, there are some disadvantages as well!

Thank you for your kind answers! This helps a lot.

Have a good day!
Stefan

If you haven't already chosen a GPU and programming paradigm, I'd suggest using CUDA and buying a 2nd-tier NVIDIA GPU. CUDA is easier to program than OpenCL, and there are already a number of libraries that will help you write a 3D CFD code. I'd hold off on buying a new GPU right now, as both AMD and NVIDIA are planning to launch new models in the next few months. If you must buy now, know that the most GFLOP/s per dollar always come from the mid-to-lower-end GPUs, such as the GTX 560.

Hi Mark,

we did buy a GPU. We came to the same conclusions :-) Thank you for advice anyway!

Best regards
Stefan