GPU - GeForce GTX Titan Black?

frode86 · February 27, 2014, 03:56

Hello,

My first post in this forum will be about GPUs.
I'm in the process of setting up a new workstation to be able to run OpenFOAM and ANSYS for CFD and mechanical simulation and learning purposes. I would really like GPU for accelerated solution times. These days the GeForce GTX Titan Black are available in stores. This graphic card has the GK110 core which is the same as the Quadro 6000 and Tesla K20x etc.. The double precision rate is 1,7 TFLOPS which is actually higher than the Tesla K20X which has 1,3 TFLOPS. I know the ECC memory is not supported on the GTX card. At this moment cost is more of an issue than the ECC support.

My question is does anyone know if the GTX will work as a GPU in ANSYS and Open FOAM?

Thanks,
Frode

kyle · February 27, 2014, 11:40

GPU accelerated CFD still isn't ready to use for real work. Some codes support it, and there are several plugins for OpenFOAM, but the actual speedup is minimal (and often times negative).

It's much more cost effective to just buy a second workstation and connect them via ethernet. Infiniband doesn't become necessary until you get to 4+ machines. That card you are talking about costs $1000+. Instead you could get a $200 GPU and a $800 second node, which would essentially halve your solution times.

evcelica · February 27, 2014, 14:12

I have never tried a Titan, but My GTX680 doesn't work with ANSYS. It says I must have a Tesla or Quadro card, that my GPU is not supported. ANSYS works with NVIDIA, so I'm almost positive they would force you to buy their "professional" GPU compute card.

Zlatko · February 28, 2014, 07:58

I was also curios about Fluent and gpgpu, so I tried to run a case (3ddp, pbns, rke, coupled solver, 41K cells) on Windows 7 PC (i7 3770K) with NVIDIA GeForce GTX 670 graphics card (1344 CUDA cores).

The fluent command /parallel/gpgpu/show returned:

Code:

CUDA visible GPUs on rocky
  CUDA runtime version 5000
  Driver version 6000
  Number of GPUs 1
    0. GeForce GTX 670 (*)
       7 SMs
       0.98 GHz
       2.14748 GBytes

The solution converged in 56 iterations and output from command /parallel/timer/usage is

Code:

Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              1.145 sec
  Global reductions per iteration:                       31 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                           62 messages
  Data transfer per iteration:                        0.232 MB
  LE solves per iteration:                                2 solves
  LE wall-clock time per iteration:                   0.011 sec (0.9%)
  LE global solves per iteration:                         2 solves
  LE global wall-clock time per iteration:            0.000 sec (0.0%)
  LE global matrix maximum size:                     41000
  AMG cycles per iteration:                           4.107 cycles
  Relaxation sweeps per iteration:                      384 sweeps
  Relaxation exchanges per iteration:                   387 exchanges

  Total wall-clock time:                             64.102 sec
  Total CPU time:                                   127.531 sec

The timings for computation without GPU are:

Code:

Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              0.218 sec
  Global reductions per iteration:                       44 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                          441 messages
  Data transfer per iteration:                        0.472 MB
  LE solves per iteration:                                3 solves
  LE wall-clock time per iteration:                   0.142 sec (65.0%)
  LE global solves per iteration:                         6 solves
  LE global wall-clock time per iteration:            0.000 sec (0.1%)
  LE global matrix maximum size:                        22
  AMG cycles per iteration:                           8.286 cycles
  Relaxation sweeps per iteration:                      566 sweeps
  Relaxation exchanges per iteration:                   572 exchanges

  Total wall-clock time:                             12.210 sec
  Total CPU time:                                    24.399 sec

frode86 · March 1, 2014, 10:20

How is the gtx 670 double precision rate? I think is is rather poor. It also seems like your analysis is rather small given the short solve time. I've read that gpu only will be beneficial in larger analysis.. Did you just activate it in ansys and it worked? Is it possible for you to test if mechanical also accept the card?

Quote:

Originally Posted by Zlatko

I was also curios about Fluent and gpgpu, so I tried to run a case (3ddp, pbns, rke, coupled solver, 41K cells) on Windows 7 PC (i7 3770K) with NVIDIA GeForce GTX 670 graphics card (1344 CUDA cores).

The fluent command /parallel/gpgpu/show returned:

Code:

CUDA visible GPUs on rocky
  CUDA runtime version 5000
  Driver version 6000
  Number of GPUs 1
    0. GeForce GTX 670 (*)
       7 SMs
       0.98 GHz
       2.14748 GBytes

The solution converged in 56 iterations and output from command /parallel/timer/usage is

Code:

Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              1.145 sec
  Global reductions per iteration:                       31 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                           62 messages
  Data transfer per iteration:                        0.232 MB
  LE solves per iteration:                                2 solves
  LE wall-clock time per iteration:                   0.011 sec (0.9%)
  LE global solves per iteration:                         2 solves
  LE global wall-clock time per iteration:            0.000 sec (0.0%)
  LE global matrix maximum size:                     41000
  AMG cycles per iteration:                           4.107 cycles
  Relaxation sweeps per iteration:                      384 sweeps
  Relaxation exchanges per iteration:                   387 exchanges

  Total wall-clock time:                             64.102 sec
  Total CPU time:                                   127.531 sec

The timings for computation without GPU are:

Code:

Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              0.218 sec
  Global reductions per iteration:                       44 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                          441 messages
  Data transfer per iteration:                        0.472 MB
  LE solves per iteration:                                3 solves
  LE wall-clock time per iteration:                   0.142 sec (65.0%)
  LE global solves per iteration:                         6 solves
  LE global wall-clock time per iteration:            0.000 sec (0.1%)
  LE global matrix maximum size:                        22
  AMG cycles per iteration:                           8.286 cycles
  Relaxation sweeps per iteration:                      566 sweeps
  Relaxation exchanges per iteration:                   572 exchanges

  Total wall-clock time:                             12.210 sec
  Total CPU time:                                    24.399 sec

Zlatko · March 1, 2014, 15:30

Quote:

Originally Posted by frode86

How is the gtx 670 double precision rate? I think is is rather poor. It also seems like your analysis is rather small given the short solve time. I've read that gpu only will be beneficial in larger analysis.

The case is small because I intended to run it on linux workstation with NVIDIA Quadro FX 3800 GPU which has only 1GB of memory and the case must fit into GPU memory. Due to the error

Code:

> it 500
  iter  continuity  x-velocity  y-velocity  z-velocity           k     epsilon     time/iter
AMG on GPGPU
NVAMG version 4
Built on Aug 21 2013, 10:28:27
NVAMG ERROR: file ../../src/amg_gpu.c line    863
NVAMG ERROR: CUDA kernel launch error

I tried with Windows machine.

Quote:

Originally Posted by frode86

Did you just activate it in ansys and it worked?

Yes, I've used command fluent 3ddp -g -t2 -gpgpu=1.

Quote:

Originally Posted by frode86

Is it possible for you to test if mechanical also accept the card?

Unfortunately no. We have only ANSYS Academic Research CFD license.

I also discovered folowing: if I start fluent, read the case, initialize and run calculation, it takes 59 seconds to reach convergence criteria. When the case is reread again, initialized and calculation started, it takes only 7.5 seconds.

evcelica · March 3, 2014, 09:24

Quote:

Originally Posted by frode86

Is it possible for you to test if mechanical also accept the card?

Mechanical is where I got the error that it must be a Tesla or certain Quadro card. I'm glad it appears to work in Fluent though.

vjj · July 8, 2014, 20:01

For Ansys Fluent 15, there is a GPU User Guide available - talks about optimal settings and it contains lots of other information

http://www.nvidia.com/content/tesla/...-userguide.pdf

For Tips on using GPU acceleration for Ansys Mechanical

http://www.nvidia.com/content/tesla/...-with-gpus.pdf

pugwash.ds · July 8, 2014, 23:55

Speaking about OpenFOAM only, I am very dubious about the potential for significant speedups with a GPU with what is available right now. I tried compiling for my Qudro graphics card without any luck. Looking into it a bit further, I think a GPU will work very well for some kinds of application which are embarrassingly parallel but openFoam is memory intensive and requires and requires significant internode communication. It is always tempting to look at the headline specs for a processor or GPU but a least with OpenFOAM making use of that processing power without getting mugged by Amdahl’s Law is pretty difficult. You will running out of memory bandwidth or inter node communication. I think the answer is to build a well optimized cluster. All the elements required both hardware and software are easy and you will get a known speed up.

Daveo643 · January 26, 2015, 14:01

Following up if anyone has VERIFIED that Fluent and only Fluent works with the GTX Titan. Screenshots or results from the Fluent command line that shows the card being exploited would be much appreciated.

Daveo643 · February 16, 2015, 16:38

I definitively confirm that Fluent can "exploit" a GPU (at least in V16), even one that's not in the supported list published by ANSYS.

But performance is much worse with the GPU enabled - in my case a GTX580M with 2GB GDDR5 RAM (GF114M, CC 2.1) - than without.

http://www.cfd-online.com/Forums/flu...pu-fluent.html

hhh · August 8, 2016, 22:36

Dear Friends,
I am trying to estimate the Computational time (in terms of Days) for the FSI Problems (example:flexible Flapping Wings) in Flops. I saw few literature, in that they mentioned 800 teraflops is sufficient.

teraflops / petaflops / exaflops, which one is good for this kind of problems? And what is the simulation time difference b/w these flops (In days). How do I estimate the flops for this kind of problems. Please assist me.

Regards,
HHH

February 27, 2014, 03:56	GPU - GeForce GTX Titan Black?	#1
frode86 New Member F Join Date: Feb 2014 Posts: 2 Rep Power: 0	Hello, My first post in this forum will be about GPUs. I'm in the process of setting up a new workstation to be able to run OpenFOAM and ANSYS for CFD and mechanical simulation and learning purposes. I would really like GPU for accelerated solution times. These days the GeForce GTX Titan Black are available in stores. This graphic card has the GK110 core which is the same as the Quadro 6000 and Tesla K20x etc.. The double precision rate is 1,7 TFLOPS which is actually higher than the Tesla K20X which has 1,3 TFLOPS. I know the ECC memory is not supported on the GTX card. At this moment cost is more of an issue than the ECC support. My question is does anyone know if the GTX will work as a GPU in ANSYS and Open FOAM? Thanks, Frode

July 8, 2014, 20:01		#8
vjj New Member Join Date: Mar 2013 Posts: 1 Rep Power: 0	For Ansys Fluent 15, there is a GPU User Guide available - talks about optimal settings and it contains lots of other information http://www.nvidia.com/content/tesla/...-userguide.pdf For Tips on using GPU acceleration for Ansys Mechanical http://www.nvidia.com/content/tesla/...-with-gpus.pdf Maddin likes this.

July 8, 2014, 23:55	OpenFoam GPU	#9
pugwash.ds New Member Damien Smith Join Date: Jul 2014 Posts: 3 Rep Power: 11	Speaking about OpenFOAM only, I am very dubious about the potential for significant speedups with a GPU with what is available right now. I tried compiling for my Qudro graphics card without any luck. Looking into it a bit further, I think a GPU will work very well for some kinds of application which are embarrassingly parallel but openFoam is memory intensive and requires and requires significant internode communication. It is always tempting to look at the headline specs for a processor or GPU but a least with OpenFOAM making use of that processing power without getting mugged by Amdahl’s Law is pretty difficult. You will running out of memory bandwidth or inter node communication. I think the answer is to build a well optimized cluster. All the elements required both hardware and software are easy and you will get a known speed up.

August 8, 2016, 22:36	teraflops vs petaflops Vs exaflops: FSI Problems	#12
hhh Senior Member kunar Join Date: Nov 2011 Posts: 117 Rep Power: 14	Dear Friends, I am trying to estimate the Computational time (in terms of Days) for the FSI Problems (example:flexible Flapping Wings) in Flops. I saw few literature, in that they mentioned 800 teraflops is sufficient. teraflops / petaflops / exaflops, which one is good for this kind of problems? And what is the simulation time difference b/w these flops (In days). How do I estimate the flops for this kind of problems. Please assist me. Regards, HHH __________________ kunar

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
GPU acceleration on ANSYS Fluent 14.5	Daveo643	FLUENT	20	April 28, 2018 13:50
NVIDIA GeForce GTX 690 Modified Into Quadro K5000 and Tesla K10	HMN	Hardware	1	October 20, 2013 06:34

February 27, 2014, 11:40		#2
kyle Senior Member Join Date: Mar 2009 Location: Austin, TX Posts: 160 Rep Power: 18	GPU accelerated CFD still isn't ready to use for real work. Some codes support it, and there are several plugins for OpenFOAM, but the actual speedup is minimal (and often times negative). It's much more cost effective to just buy a second workstation and connect them via ethernet. Infiniband doesn't become necessary until you get to 4+ machines. That card you are talking about costs $1000+. Instead you could get a $200 GPU and a $800 second node, which would essentially halve your solution times.

February 27, 2014, 14:12		#3
evcelica Senior Member Erik Join Date: Feb 2011 Location: Earth (Land portion) Posts: 1,167 Rep Power: 23	I have never tried a Titan, but My GTX680 doesn't work with ANSYS. It says I must have a Tesla or Quadro card, that my GPU is not supported. ANSYS works with NVIDIA, so I'm almost positive they would force you to buy their "professional" GPU compute card.

January 26, 2015, 14:01		#10
Daveo643 New Member Join Date: Mar 2013 Location: Canada Posts: 22 Rep Power: 13	Following up if anyone has VERIFIED that Fluent and only Fluent works with the GTX Titan. Screenshots or results from the Fluent command line that shows the card being exploited would be much appreciated.

February 16, 2015, 16:38		#11
Daveo643 New Member Join Date: Mar 2013 Location: Canada Posts: 22 Rep Power: 13	I definitively confirm that Fluent can "exploit" a GPU (at least in V16), even one that's not in the supported list published by ANSYS. But performance is much worse with the GPU enabled - in my case a GTX580M with 2GB GDDR5 RAM (GF114M, CC 2.1) - than without. http://www.cfd-online.com/Forums/flu...pu-fluent.html