GPU selection

lalupp · September 4, 2011, 00:16

Hi

I would like to rewrite my parallel 3d CFD code using GPU computing techniques. I have doubts

Which GPU is best in terms of

Cost/performance
Easiest to start programming

What is OPENCL
what is the difference between opencl and cuda

Pls help

abdul099 · September 12, 2011, 19:58

I think, OpenCL is a more or less C-like API. There's an implemention for ATI (AMD) graphic cards. Cuda is an API for Nvidia graphic cards which is based on OpenCL, so it should be pretty much the same (although it's not compatible as far as I know).

Which graphics card performs best depends much on the application. There are applications where ATI cards are much faster than Nvidia cards and vice versa. And it also might be possible, the same application performs better or worse on the same graphics card when running on all shaders or not. I suspect, it's impossible to give an universal statement. Purchase the card you like the best and give it a try. Or rewrite your code and ask some members here to run a benchmark (I could run it on a Radeon HD5870).

When you got it working, could you please post some performance data? I'm highly interested in performance of massive parallelization on many specialitzed gpu processors compared to cpu computing on less versatile cores.

lalupp · September 22, 2011, 08:46

Thank you for kind advice

I was away for sometime,
After little bit googling I found that there are two streams of GPU programming available for scientific computing.
NVIDIA CUDA and AMD's OPENCL . CUDA is well matured with lot of tutorials and tools while AMD's is emerging. Some Describe CUDA as "excellent car with bad engine and OPENCL as bad car with good engine" is that correct ?

As a beginer like me I can easily switch to CUDA . However I fear OPENCL(AMD) will soon overtake CUDA.

I am very curious about latest trends so that my switching will not fail.

holodeck10 · October 25, 2011, 04:08

Hi Foamers,

I understand that it might be hard to give a general recommendation for high performance using GPUs. However, is it possible to compare performance of a GPU with a standard single core CPU? I know the question is very unprecise, but I would consider +/- 20% as comparable performance.

Secondly, I read on http://www.symscape.com/gpu-0-2-openfoam about single precision of the calculations which are performed on GPUs. Are there experiences comparing results achieved on GPUs with the ones done on CPUs?

Cheers
Stefan

abdul099 · October 26, 2011, 00:44

The spread is very wide. Even when just looking on CPU's, there is an enormous spread in performance from the slowest to the fastest one.
The same with GPU's.
So comparing a slow GPU with a fast CPU or vice versa will make a huge difference.

Anyway, one can say GPU's are a lot faster than CPU's. I'm not aware of today's high performance CPU's, but my GPU (Radeon HD5870) delivers up to 544 GFLOPS double precision while a i7 Sandy Bridge 3,4GHz has a peak performance of only 102 GFLOPS on all 4 cores! Average performance of the most older CPU's is about half the peak performance, assuming this is on this CPU the same, it would running on all 4 cores be 10 times slower than the 1.5 years old graphics card.
In single precision, the Radeon 5870 goes up to over 2.7 TFLOPS.

GPU's are much faster than CPU's. But it is nearly impossible to make any statement like "a GPU is 10 times faster than a CPU", since for example an older Nehalem based i7 3,2GHz has only 50% peak performance of the Sandy Bridge one.

But the high performance of graphics cards is based on massive parallelization. CFD-codes usually don't scale that good, therefore you might get an other result when comparing performance of CPU and GPU in a specific application. It also depends from a lot of things like memory ultilization (memory controllers are included in the CPU, so it can access memory faster than the graphics card when using more than the graphics memory), scalability of the code, performance of the stream processors on specific operations, the code itself, compiler etc...
Especially the problem itself has a quite big impact. Solving a code where all partitions can be independently solved performs well on GPU's. In theory, a linear code which can not be independently solves performs worse. And the more non-linear it gets, the worse it gets as communication efforts rise.

Single precision operations carried out by a GPU should give the same result when carried out on a CPU, there should be no difference in the results.

holodeck10 · October 26, 2011, 01:03

Dear abdul099,

thank you for your detailed answer. I have an impression on the performance, which let me conclude, that with this regard, for some cases using GPU's can turn out to be a good way to increase computational power and is worth a try.

In terms of precision, please let me specify my question a little:
From the link http://www.symscape.com/gpu-0-2-openfoam I assume that on GPU's you can only have single precision. Usually, OpenFoam on CPUs comes with double precision. I have no feeling about the impact of the precision on a result after thousands of iterations. When I compare both results (CPU/double and GPU/single), might there be a notable difference? Here, I consider a notable difference to be > 1%.

Best regards
Stefan

abdul099 · October 26, 2011, 20:15

There are GPU's running double precision. I know for sure for the ATI Radeon HD-5000 series (I own one of this cards) and the HD-6000 series. I'm nearly sure, all Nvidia cards with about the same age are supporting double precision as well. Maybe even older cards from both companies, but I don't know which series was the first one.

Single or double precision can have a significant impact. It can happen, a single precision run diverges while a double precision one converges well. Of course, not that often, but I did already see it on my own. How much the solution of a SP run differs from a DP run can't be determined without testing, but there will be a difference nearly for sure (except small trivial cases which can be run to a perfect convergence).

When all goes fine, a double precision run should take less iterations, but for cost of more time per iteration. It takes more memory, which can cause problems on big cases which barley fit into the memory and communication bandwidth between processes becomes more important. The result file takes more space on hard disk. And it makes only sense when using a higher order discretization scheme which are more unstable than first order schemes.
So keep in mind, there are some disadvantages as well!

holodeck10 · October 26, 2011, 21:31

Thank you for your kind answers! This helps a lot.

Have a good day!
Stefan

markstock · November 18, 2011, 12:05

If you haven't already chosen a GPU and programming paradigm, I'd suggest using CUDA and buying a 2nd-tier NVIDIA GPU. CUDA is easier to program than OpenCL, and there are already a number of libraries that will help you write a 3D CFD code. I'd hold off on buying a new GPU right now, as both AMD and NVIDIA are planning to launch new models in the next few months. If you must buy now, know that the most GFLOP/s per dollar always come from the mid-to-lower-end GPUs, such as the GTX 560.

holodeck10 · November 19, 2011, 01:07

Hi Mark,

we did buy a GPU. We came to the same conclusions :-) Thank you for advice anyway!

Best regards
Stefan

September 4, 2011, 00:16	GPU selection	#1
lalupp Member lalupp Join Date: Jul 2010 Location: India Posts: 44 Rep Power: 15	Hi I would like to rewrite my parallel 3d CFD code using GPU computing techniques. I have doubts Which GPU is best in terms of Cost/performance Easiest to start programming What is OPENCL what is the difference between opencl and cuda Pls help

September 12, 2011, 19:58		#2
abdul099 Senior Member Join Date: Oct 2009 Location: Germany Posts: 636 Rep Power: 21	I think, OpenCL is a more or less C-like API. There's an implemention for ATI (AMD) graphic cards. Cuda is an API for Nvidia graphic cards which is based on OpenCL, so it should be pretty much the same (although it's not compatible as far as I know). Which graphics card performs best depends much on the application. There are applications where ATI cards are much faster than Nvidia cards and vice versa. And it also might be possible, the same application performs better or worse on the same graphics card when running on all shaders or not. I suspect, it's impossible to give an universal statement. Purchase the card you like the best and give it a try. Or rewrite your code and ask some members here to run a benchmark (I could run it on a Radeon HD5870). When you got it working, could you please post some performance data? I'm highly interested in performance of massive parallelization on many specialitzed gpu processors compared to cpu computing on less versatile cores. kamyab and hospital0968 like this.

September 22, 2011, 08:46	GPU selecetion	#3
lalupp Member lalupp Join Date: Jul 2010 Location: India Posts: 44 Rep Power: 15	Thank you for kind advice I was away for sometime, After little bit googling I found that there are two streams of GPU programming available for scientific computing. NVIDIA CUDA and AMD's OPENCL . CUDA is well matured with lot of tutorials and tools while AMD's is emerging. Some Describe CUDA as "excellent car with bad engine and OPENCL as bad car with good engine" is that correct ? As a beginer like me I can easily switch to CUDA . However I fear OPENCL(AMD) will soon overtake CUDA. I am very curious about latest trends so that my switching will not fail.

October 25, 2011, 04:08	Performance and precision	#4
holodeck10 New Member Stefan Join Date: Jan 2011 Location: Bremen Posts: 20 Rep Power: 15	Hi Foamers, I understand that it might be hard to give a general recommendation for high performance using GPUs. However, is it possible to compare performance of a GPU with a standard single core CPU? I know the question is very unprecise, but I would consider +/- 20% as comparable performance. Secondly, I read on http://www.symscape.com/gpu-0-2-openfoam about single precision of the calculations which are performed on GPUs. Are there experiences comparing results achieved on GPUs with the ones done on CPUs? Cheers Stefan

October 26, 2011, 00:44		#5
abdul099 Senior Member Join Date: Oct 2009 Location: Germany Posts: 636 Rep Power: 21	The spread is very wide. Even when just looking on CPU's, there is an enormous spread in performance from the slowest to the fastest one. The same with GPU's. So comparing a slow GPU with a fast CPU or vice versa will make a huge difference. Anyway, one can say GPU's are a lot faster than CPU's. I'm not aware of today's high performance CPU's, but my GPU (Radeon HD5870) delivers up to 544 GFLOPS double precision while a i7 Sandy Bridge 3,4GHz has a peak performance of only 102 GFLOPS on all 4 cores! Average performance of the most older CPU's is about half the peak performance, assuming this is on this CPU the same, it would running on all 4 cores be 10 times slower than the 1.5 years old graphics card. In single precision, the Radeon 5870 goes up to over 2.7 TFLOPS. GPU's are much faster than CPU's. But it is nearly impossible to make any statement like "a GPU is 10 times faster than a CPU", since for example an older Nehalem based i7 3,2GHz has only 50% peak performance of the Sandy Bridge one. But the high performance of graphics cards is based on massive parallelization. CFD-codes usually don't scale that good, therefore you might get an other result when comparing performance of CPU and GPU in a specific application. It also depends from a lot of things like memory ultilization (memory controllers are included in the CPU, so it can access memory faster than the graphics card when using more than the graphics memory), scalability of the code, performance of the stream processors on specific operations, the code itself, compiler etc... Especially the problem itself has a quite big impact. Solving a code where all partitions can be independently solved performs well on GPU's. In theory, a linear code which can not be independently solves performs worse. And the more non-linear it gets, the worse it gets as communication efforts rise. Single precision operations carried out by a GPU should give the same result when carried out on a CPU, there should be no difference in the results. hospital0968 likes this.

October 26, 2011, 01:03		#6
holodeck10 New Member Stefan Join Date: Jan 2011 Location: Bremen Posts: 20 Rep Power: 15	Dear abdul099, thank you for your detailed answer. I have an impression on the performance, which let me conclude, that with this regard, for some cases using GPU's can turn out to be a good way to increase computational power and is worth a try. In terms of precision, please let me specify my question a little: From the link http://www.symscape.com/gpu-0-2-openfoam I assume that on GPU's you can only have single precision. Usually, OpenFoam on CPUs comes with double precision. I have no feeling about the impact of the precision on a result after thousands of iterations. When I compare both results (CPU/double and GPU/single), might there be a notable difference? Here, I consider a notable difference to be > 1%. Best regards Stefan

October 26, 2011, 20:15		#7
abdul099 Senior Member Join Date: Oct 2009 Location: Germany Posts: 636 Rep Power: 21	There are GPU's running double precision. I know for sure for the ATI Radeon HD-5000 series (I own one of this cards) and the HD-6000 series. I'm nearly sure, all Nvidia cards with about the same age are supporting double precision as well. Maybe even older cards from both companies, but I don't know which series was the first one. Single or double precision can have a significant impact. It can happen, a single precision run diverges while a double precision one converges well. Of course, not that often, but I did already see it on my own. How much the solution of a SP run differs from a DP run can't be determined without testing, but there will be a difference nearly for sure (except small trivial cases which can be run to a perfect convergence). When all goes fine, a double precision run should take less iterations, but for cost of more time per iteration. It takes more memory, which can cause problems on big cases which barley fit into the memory and communication bandwidth between processes becomes more important. The result file takes more space on hard disk. And it makes only sense when using a higher order discretization scheme which are more unstable than first order schemes. So keep in mind, there are some disadvantages as well!

October 26, 2011, 21:31		#8
holodeck10 New Member Stefan Join Date: Jan 2011 Location: Bremen Posts: 20 Rep Power: 15	Thank you for your kind answers! This helps a lot. Have a good day! Stefan

November 18, 2011, 12:05		#9
markstock New Member Mark Stock Join Date: Nov 2011 Location: Boston area Posts: 5 Rep Power: 14	If you haven't already chosen a GPU and programming paradigm, I'd suggest using CUDA and buying a 2nd-tier NVIDIA GPU. CUDA is easier to program than OpenCL, and there are already a number of libraries that will help you write a 3D CFD code. I'd hold off on buying a new GPU right now, as both AMD and NVIDIA are planning to launch new models in the next few months. If you must buy now, know that the most GFLOP/s per dollar always come from the mid-to-lower-end GPUs, such as the GTX 560.

November 19, 2011, 01:07		#10
holodeck10 New Member Stefan Join Date: Jan 2011 Location: Bremen Posts: 20 Rep Power: 15	Hi Mark, we did buy a GPU. We came to the same conclusions :-) Thank you for advice anyway! Best regards Stefan

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
GPU Linear Solvers for OpenFOAM	gocarts	OpenFOAM Announcements from Other Sources	37	August 17, 2022 14:22
Run time Selection Mechanism - Some help required to understand	jaswi	OpenFOAM Programming & Development	3	October 29, 2015 13:42
NVIDIA Tesla GPU	ztdep	Hardware	0	December 7, 2010 21:00
Which part of CFD is suitable for GPU processing?	quarkz	Main CFD Forum	0	July 1, 2010 05:16
New Nvidia gpu aimed at gpgpu	bmeagle	OpenFOAM	0	November 9, 2006 09:41