CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

GPU - GeForce GTX Titan Black?

Register Blogs Community New Posts Updated Threads Search

Like Tree6Likes
  • 4 Post By Zlatko
  • 1 Post By Zlatko
  • 1 Post By vjj

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 27, 2014, 03:56
Default GPU - GeForce GTX Titan Black?
  #1
New Member
 
F
Join Date: Feb 2014
Posts: 2
Rep Power: 0
frode86 is on a distinguished road
Hello,

My first post in this forum will be about GPUs.
I'm in the process of setting up a new workstation to be able to run OpenFOAM and ANSYS for CFD and mechanical simulation and learning purposes. I would really like GPU for accelerated solution times. These days the GeForce GTX Titan Black are available in stores. This graphic card has the GK110 core which is the same as the Quadro 6000 and Tesla K20x etc.. The double precision rate is 1,7 TFLOPS which is actually higher than the Tesla K20X which has 1,3 TFLOPS. I know the ECC memory is not supported on the GTX card. At this moment cost is more of an issue than the ECC support.

My question is does anyone know if the GTX will work as a GPU in ANSYS and Open FOAM?

Thanks,
Frode
frode86 is offline   Reply With Quote

Old   February 27, 2014, 11:40
Default
  #2
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18
kyle is on a distinguished road
GPU accelerated CFD still isn't ready to use for real work. Some codes support it, and there are several plugins for OpenFOAM, but the actual speedup is minimal (and often times negative).

It's much more cost effective to just buy a second workstation and connect them via ethernet. Infiniband doesn't become necessary until you get to 4+ machines. That card you are talking about costs $1000+. Instead you could get a $200 GPU and a $800 second node, which would essentially halve your solution times.
kyle is offline   Reply With Quote

Old   February 27, 2014, 14:12
Default
  #3
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
I have never tried a Titan, but My GTX680 doesn't work with ANSYS. It says I must have a Tesla or Quadro card, that my GPU is not supported. ANSYS works with NVIDIA, so I'm almost positive they would force you to buy their "professional" GPU compute card.
evcelica is offline   Reply With Quote

Old   February 28, 2014, 07:58
Default
  #4
New Member
 
Zlatko's Avatar
 
Join Date: Mar 2009
Posts: 10
Rep Power: 17
Zlatko is on a distinguished road
I was also curios about Fluent and gpgpu, so I tried to run a case (3ddp, pbns, rke, coupled solver, 41K cells) on Windows 7 PC (i7 3770K) with NVIDIA GeForce GTX 670 graphics card (1344 CUDA cores).

The fluent command /parallel/gpgpu/show returned:
Code:
CUDA visible GPUs on rocky
  CUDA runtime version 5000
  Driver version 6000
  Number of GPUs 1
    0. GeForce GTX 670 (*)
       7 SMs
       0.98 GHz
       2.14748 GBytes
The solution converged in 56 iterations and output from command /parallel/timer/usage is
Code:
Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              1.145 sec
  Global reductions per iteration:                       31 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                           62 messages
  Data transfer per iteration:                        0.232 MB
  LE solves per iteration:                                2 solves
  LE wall-clock time per iteration:                   0.011 sec (0.9%)
  LE global solves per iteration:                         2 solves
  LE global wall-clock time per iteration:            0.000 sec (0.0%)
  LE global matrix maximum size:                     41000
  AMG cycles per iteration:                           4.107 cycles
  Relaxation sweeps per iteration:                      384 sweeps
  Relaxation exchanges per iteration:                   387 exchanges

  Total wall-clock time:                             64.102 sec
  Total CPU time:                                   127.531 sec
The timings for computation without GPU are:
Code:
Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              0.218 sec
  Global reductions per iteration:                       44 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                          441 messages
  Data transfer per iteration:                        0.472 MB
  LE solves per iteration:                                3 solves
  LE wall-clock time per iteration:                   0.142 sec (65.0%)
  LE global solves per iteration:                         6 solves
  LE global wall-clock time per iteration:            0.000 sec (0.1%)
  LE global matrix maximum size:                        22
  AMG cycles per iteration:                           8.286 cycles
  Relaxation sweeps per iteration:                      566 sweeps
  Relaxation exchanges per iteration:                   572 exchanges

  Total wall-clock time:                             12.210 sec
  Total CPU time:                                    24.399 sec
ghost82, evcelica, flotus1 and 1 others like this.
Zlatko is offline   Reply With Quote

Old   March 1, 2014, 10:20
Default
  #5
New Member
 
F
Join Date: Feb 2014
Posts: 2
Rep Power: 0
frode86 is on a distinguished road
How is the gtx 670 double precision rate? I think is is rather poor. It also seems like your analysis is rather small given the short solve time. I've read that gpu only will be beneficial in larger analysis.. Did you just activate it in ansys and it worked? Is it possible for you to test if mechanical also accept the card?

Quote:
Originally Posted by Zlatko View Post
I was also curios about Fluent and gpgpu, so I tried to run a case (3ddp, pbns, rke, coupled solver, 41K cells) on Windows 7 PC (i7 3770K) with NVIDIA GeForce GTX 670 graphics card (1344 CUDA cores).

The fluent command /parallel/gpgpu/show returned:
Code:
CUDA visible GPUs on rocky
  CUDA runtime version 5000
  Driver version 6000
  Number of GPUs 1
    0. GeForce GTX 670 (*)
       7 SMs
       0.98 GHz
       2.14748 GBytes
The solution converged in 56 iterations and output from command /parallel/timer/usage is
Code:
Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              1.145 sec
  Global reductions per iteration:                       31 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                           62 messages
  Data transfer per iteration:                        0.232 MB
  LE solves per iteration:                                2 solves
  LE wall-clock time per iteration:                   0.011 sec (0.9%)
  LE global solves per iteration:                         2 solves
  LE global wall-clock time per iteration:            0.000 sec (0.0%)
  LE global matrix maximum size:                     41000
  AMG cycles per iteration:                           4.107 cycles
  Relaxation sweeps per iteration:                      384 sweeps
  Relaxation exchanges per iteration:                   387 exchanges

  Total wall-clock time:                             64.102 sec
  Total CPU time:                                   127.531 sec
The timings for computation without GPU are:
Code:
Performance Timer for 56 iterations on 2 compute nodes
  Average wall-clock time per iteration:              0.218 sec
  Global reductions per iteration:                       44 ops
  Global reductions time per iteration:               0.000 sec (0.0%)
  Message count per iteration:                          441 messages
  Data transfer per iteration:                        0.472 MB
  LE solves per iteration:                                3 solves
  LE wall-clock time per iteration:                   0.142 sec (65.0%)
  LE global solves per iteration:                         6 solves
  LE global wall-clock time per iteration:            0.000 sec (0.1%)
  LE global matrix maximum size:                        22
  AMG cycles per iteration:                           8.286 cycles
  Relaxation sweeps per iteration:                      566 sweeps
  Relaxation exchanges per iteration:                   572 exchanges

  Total wall-clock time:                             12.210 sec
  Total CPU time:                                    24.399 sec
frode86 is offline   Reply With Quote

Old   March 1, 2014, 15:30
Default
  #6
New Member
 
Zlatko's Avatar
 
Join Date: Mar 2009
Posts: 10
Rep Power: 17
Zlatko is on a distinguished road
Quote:
Originally Posted by frode86 View Post
How is the gtx 670 double precision rate? I think is is rather poor. It also seems like your analysis is rather small given the short solve time. I've read that gpu only will be beneficial in larger analysis.
The case is small because I intended to run it on linux workstation with NVIDIA Quadro FX 3800 GPU which has only 1GB of memory and the case must fit into GPU memory. Due to the error
Code:
> it 500
  iter  continuity  x-velocity  y-velocity  z-velocity           k     epsilon     time/iter
AMG on GPGPU
NVAMG version 4
Built on Aug 21 2013, 10:28:27
NVAMG ERROR: file ../../src/amg_gpu.c line    863
NVAMG ERROR: CUDA kernel launch error
I tried with Windows machine.

Quote:
Originally Posted by frode86 View Post
Did you just activate it in ansys and it worked?
Yes, I've used command fluent 3ddp -g -t2 -gpgpu=1.

Quote:
Originally Posted by frode86 View Post
Is it possible for you to test if mechanical also accept the card?
Unfortunately no. We have only ANSYS Academic Research CFD license.

I also discovered folowing: if I start fluent, read the case, initialize and run calculation, it takes 59 seconds to reach convergence criteria. When the case is reread again, initialized and calculation started, it takes only 7.5 seconds.
Maddin likes this.
Zlatko is offline   Reply With Quote

Old   March 3, 2014, 09:24
Default
  #7
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
Quote:
Originally Posted by frode86 View Post
Is it possible for you to test if mechanical also accept the card?

Mechanical is where I got the error that it must be a Tesla or certain Quadro card. I'm glad it appears to work in Fluent though.
evcelica is offline   Reply With Quote

Old   July 8, 2014, 20:01
Default
  #8
vjj
New Member
 
Join Date: Mar 2013
Posts: 1
Rep Power: 0
vjj is on a distinguished road
For Ansys Fluent 15, there is a GPU User Guide available - talks about optimal settings and it contains lots of other information

http://www.nvidia.com/content/tesla/...-userguide.pdf

For Tips on using GPU acceleration for Ansys Mechanical

http://www.nvidia.com/content/tesla/...-with-gpus.pdf
Maddin likes this.
vjj is offline   Reply With Quote

Old   July 8, 2014, 23:55
Default OpenFoam GPU
  #9
New Member
 
Damien Smith
Join Date: Jul 2014
Posts: 3
Rep Power: 11
pugwash.ds is on a distinguished road
Speaking about OpenFOAM only, I am very dubious about the potential for significant speedups with a GPU with what is available right now. I tried compiling for my Qudro graphics card without any luck. Looking into it a bit further, I think a GPU will work very well for some kinds of application which are embarrassingly parallel but openFoam is memory intensive and requires and requires significant internode communication. It is always tempting to look at the headline specs for a processor or GPU but a least with OpenFOAM making use of that processing power without getting mugged by Amdahl’s Law is pretty difficult. You will running out of memory bandwidth or inter node communication. I think the answer is to build a well optimized cluster. All the elements required both hardware and software are easy and you will get a known speed up.
pugwash.ds is offline   Reply With Quote

Old   January 26, 2015, 14:01
Default
  #10
New Member
 
Join Date: Mar 2013
Location: Canada
Posts: 22
Rep Power: 13
Daveo643 is on a distinguished road
Following up if anyone has VERIFIED that Fluent and only Fluent works with the GTX Titan. Screenshots or results from the Fluent command line that shows the card being exploited would be much appreciated.
Daveo643 is offline   Reply With Quote

Old   February 16, 2015, 16:38
Default
  #11
New Member
 
Join Date: Mar 2013
Location: Canada
Posts: 22
Rep Power: 13
Daveo643 is on a distinguished road
I definitively confirm that Fluent can "exploit" a GPU (at least in V16), even one that's not in the supported list published by ANSYS.

But performance is much worse with the GPU enabled - in my case a GTX580M with 2GB GDDR5 RAM (GF114M, CC 2.1) - than without.

http://www.cfd-online.com/Forums/flu...pu-fluent.html
Daveo643 is offline   Reply With Quote

Old   August 8, 2016, 22:36
Default teraflops vs petaflops Vs exaflops: FSI Problems
  #12
hhh
Senior Member
 
kunar
Join Date: Nov 2011
Posts: 117
Rep Power: 14
hhh is on a distinguished road
Dear Friends,
I am trying to estimate the Computational time (in terms of Days) for the FSI Problems (example:flexible Flapping Wings) in Flops. I saw few literature, in that they mentioned 800 teraflops is sufficient.

teraflops / petaflops / exaflops, which one is good for this kind of problems? And what is the simulation time difference b/w these flops (In days). How do I estimate the flops for this kind of problems. Please assist me.

Regards,
HHH
__________________
kunar
hhh is offline   Reply With Quote

Reply

Tags
gpu, gtx, tesla


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
GPU acceleration on ANSYS Fluent 14.5 Daveo643 FLUENT 20 April 28, 2018 13:50
NVIDIA GeForce GTX 690 Modified Into Quadro K5000 and Tesla K10 HMN Hardware 1 October 20, 2013 06:34


All times are GMT -4. The time now is 19:01.