CFD Online Discussion Forums - 8x icoFoam speed up with Cufflink CUDA solver library

Page 2 of 2

Show 40 post(s) from this thread on one page

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)

- - 8x icoFoam speed up with Cufflink CUDA solver library (https://www.cfd-online.com/Forums/openfoam-solving/99828-8x-icofoam-speed-up-cufflink-cuda-solver-library.html)

atg	November 6, 2012 05:56

You don't need cufflink to demonstrate this drop in performance with multiple GPUs.

Just fire up the .nbody simulation on your 2050, note the flops in the lower left corner, and then try it with two GPUs. When I do that with a 2090, the 2090 always beats the 2090 plus Quadro600, by a significant margin. I think it is down to the overhead associated with communicating back and forth to the CPU. Newer versions of CUDA and the Kepler thing are supposed to alleviate this to some degree as far as I know, which admittedly isn't very far!

Good Luck, and thanks for posting.

alquimista

November 6, 2012 11:24

agt I agree with the presence of a bottleneck but it must be largely compesated by the fact to use two GPUs. I'll test the .nbdoy simulation in systems with two Tesla 2050 and two GTX 690.

alquimista

November 6, 2012 11:37

Checking the post again I note that you are using two differents kinds of GPU cards, in that case the Quadro600 is slowing down the system and the 2090 must wait for the Quadro600 to finish its task. So in that case there are no benefit.

I just have run the nbdoy with one 690 and with two 690 and the results are:

1 device: 903.968 single-precision GFLOP/s at 20 flops per interaction
2 devices: 1649.659 single-precision GFLOP/s at 20 flops per interaction

So its reasonable and the proper use of several GPUs justified. I would expect similar behavior in cufflink.

All times are GMT -4. The time now is 14:31.

Page 2 of 2

Show 40 post(s) from this thread on one page