CFD Online Discussion Forums - Mixed CPU/GPU computing

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Main CFD Forum (https://www.cfd-online.com/Forums/main/)

- - Mixed CPU/GPU computing (https://www.cfd-online.com/Forums/main/12259-mixed-cpu-gpu-computing.html)

Mixed CPU/GPU computing

Would be nice if this could get into the CFD mainstream!

http://www.engadget.com/2006/09/19/p...mputing-power/

Re: Mixed CPU/GPU computing

Parallel computers built using the Intel i860 (64 bit processor used mainly as graphics chip) were common in the late 80s and early 90s at least in the UK. They generally had no OS, worked well enough for hand written CFD computation and were reasonably competive with the expensive computational computers from the large manufacturers. When the switch to commodity parallel computers occurred in the mid to late 90s they no longer made economic sense and disappeared.

Current CFD simulations are generally limited by memory access and not CPU speed. At a guess (but I have not checked so please correct me if I am wrong) a graphics processor is likely to be limited in the same way?

Re: Mixed CPU/GPU computing

Hi,

Well there is a lot of discussion going on regarding the use of GPU as a processor. Especailly for the LBM codes.

But general CFD can also benefit from the GPU speed. The memeory acess is very large for the GPU since the modern card uses GDDR-3 RAM. There is a number of papers describing these issues and also the speed up. (Try make a google search: GPU + CFD + LBM)

However since the GPU operate like a SIMD parallel computer one has to use special programming to extract the power. More advance solver (Krylov and multigrid) are difficult to implement on a SIMD parallel computer.

Funny that you mention the i860, which was a great processor. New cards as extra co-processor has started to pop up. Take a look at www.clearspeed.com. IBM has just release Blade servers with CELL chip which also is a SIMD chip. (But a bit more advance). Other has reported large speed up for some areas.

AMD has for some time been talking about make a second slot for the co-processor. I think Cray is also involve in this approach.

The clearspeed processor is used in the new supercomputer at Tokyo university to deliver terascale performance.

So there is a lot of possibilities, but the lack of programmning tool makes things a bit tough.

Also coverting a large general purpose CFD code to use the GPU would be a tremendous process. And would not be feasibel since the GPU change very fast (every 6 month) due to the gamning area.

Regards

Jens

Re: Mixed CPU/GPU computing

> More advance solver (Krylov and multigrid) are difficult to implement on a
: SIMD parallel computer.

We had no problems implementing such schemes on OSless parallel computers in the 80s and 90s. In fact it required significantly less reorganising and rewriting of the code and algorithms than the earlier vector machines from Cray, IBM and the like had required.

> Funny that you mention the i860, which was a great processor.

It may have been as a graphics chip but neither the first or second versions were particularly good as a parallel processing compute engine in my experience. It relied far too much on "look ahead" information to get its speed which meant that a reasonably implicit code which was writing lots of small packets of information into memory from neighbouring processors was forever invalidating the cache. The achieved speed with real codes was generally not particularly good because of this reason.

> Well there is a lot of discussion going on regarding the use of GPU as a
: processor. Especailly for the LBM codes.

Lattice/particle methods have very simple code but have proved over the decades to be uncompetitive without specialised hardware. It has never made economic sense to produce that specialised hardware.

> So there is a lot of possibilities, but the lack of programmning tool makes
: things a bit tough.

That has not been my experience. The array processing at the heart of a CFD solver is only a few thousand lines at most. The assembly process rarely requires much attention on a parallel machine. There is no need for anything more than a compiler and some way to ensure the IO caches are flushed. All the rest is nice but does not help productivity signficantly if you are considering just the CFD solver.

> Also coverting a large general purpose CFD code to use the GPU would be a
: tremendous process. And would not be feasibel since the GPU change very
: fast (every 6 month) due to the gamning area.

Again, this is counter to my experience. What gets parallelised is the heart of the CFD program and not the whole program that would run on a sequential machine. One tries to shove as much of the organisational code to the front and back and only duplicate on all processors the minimum that must be duplicated.

Of course, it will depend on how the particular program is organised and what tasks you choose to pick up and implement as well as the "raw" CFD solver. The first reasonably general purpose CFD code I parallelised in the 80s took less than 3 days and that included writing the equivalent of MPI because there was only raw links.

Changing hardware will make no difference if you program in a portable language like C or Fortran. Changing communications libraries also makes little difference because they all do essentially the same thing but with different names and parameters. Before MPI became the norm I think we supported 4 or 5 different communication libraries with if def statements. Irritating but a very minor task.

Re: Mixed CPU/GPU computing

Just adding one more reference to the discussion:

http://www.theregister.co.uk/2006/09/19/ati_gpgpu/

"Researchers at Stanford, the University of North Carolina and the University of Waterloo are just some of the folks who have hammered away at the software problems around GPGPUs for years. The computer science crowd has worked with - and in some cases convinced - ATI and Nvidia to open up their hardware and programming interfaces to make it easier to run common software on the GPUs. The University of Waterloo, for example, has a programming language called SH to ease the software translation process, while Stanford has Brook."