CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

GPU acceleration in Ansys Fluent

Register Blogs Community New Posts Updated Threads Search

Like Tree71Likes

 
 
LinkBack Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
Old   April 28, 2017, 12:38
Default GPU acceleration in Ansys Fluent
  #1
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
The topic of GPU acceleration for Ansys Fluent sometimes seems to be shrouded in mystery. So I ran a few benchmarks to answer some frequently asked questions and get a snapshot of the capability of this feature in 2017.

Flow Setup:
Benchmark case: 3D lid driven cavity in a cubical domain
Grid resolution: 64x64x64 -> 262144 cells
Reynolds number: 10000
solver type: pressure-based, steady
Turbulence model: standard k-epsilon
Number of iterations: 100, reporting interval 10
default settings whenever possible

Software/Hardware:
Operating system: Opensuse Leap 42.1
Fluent version: Ansys Fluent 18.0
CPU: Intel Xeon W3670, 6 cores, 3.2GHz, HT disabled
Memory: 24 GB DDR3-1333 ECC triple-channel
GPU: Quadro 5000 (theoretical compute performance: 722 GFLOPS single, 361 GFLOPS double, memory bandwidth: 126 GB/s, memory size: 2.5 GB GDDR5)

1) Coupled algorithm

As stated in this guide, GPU acceleration works best if the linear solver fraction is high which is usually the case when using the coupled solver. Fluent reported it to be around 60% or higher in all cases shown here. Without further ado:



So obviously GPU acceleration works under the right circumstances.
Using only one CPU core, adding the GPU results in a speed-up of 50-60% in single-precision (SP) and double precision (DP) respectively. But you can already see the diminishing returns with higher CPU core counts.

2) SIMPLE algorithm

Using the SIMPLE algorithm the picture is completely different. The linear solver fraction without a GPU is just below 30% for all cases, so GPU acceleration as it is currently implemented in Ansys Fluent can not be as effective. This is a caveat that Ansys is aware of and that is clearly stated in the more in-depth reviews of this feature.



As expected, solution times are much higher with a GPU "acceleration".
To be clear: this is not new information, Ansys never claimed that GPU acceleration was worth it with the SIMPLE algorithm.



3) Pairing "high-end" CPUs with slow GPUs

You might expect to be on the safe side as long as you are using the coupled solver. But we could already see the diminishing returns in case 1 with higher CPU core counts. We increase the discrepancy with different hardware: 2x Xeon E5-2687W, 128GB (16x8GB) DDR3-1600 reg ECC, Quadro 4000 (theoretical compute performance: 486 GFLOPS SP, 243 GFLOPS DP, memory bandwidth: 89.9 GB/s, memory size: 2 GB GDDR5)



While solution times with a GPU and one CPU core are slightly lower than without a GPU, there is a huge performance penalty when using the GPU along with 14 CPU cores. This is despite the fact that the linear solver fraction is 60% without a GPU. So clearly, a low-end GPU will slow down fast CPUs even if the other criteria for using GPU acceleration are met.


4) Consumer-grade graphics cards
Lets see what a cheap consumer-grade graphics card can do for GPU acceleration. The hardware in this test: 2x Xeon E5-2650v4, 128GB (8x16GB) DDR4-2400 reg ECC, Geforce GTX 1060 6GB (theoretical compute performance: 4372 GFLOPS SP, 137 GFLOPS DP, memory bandwidth: 192 GB/s, memory size: 6 GB GDDR5). Note that there was a suspended computation residing in memory so the numbers might not be representative for the absolute performance of this processor type.



The conclusion: GPU acceleration in Ansys Fluent definitely works with cheap gaming graphics cards. Even in DP the performance gains from the GPU are quite remarkable given its low DP performance. This might indicate that the workload in this benchmark is not entirely compute bound. Memory- and PCIe-transfers might also be important. However, the GPU is still a huge bottleneck as soon as we are using more CPU cores.

5) Q&A

Question

When can I use GPU acceleration?
Answer
1) You need to use the right solver in the first place. For example the coupled flow solver or the DO radiation model. Switching from SIMPLE or its variants to coupled just to use GPU acceleration is probably not the best idea.
2) Your model must fit into the GPU memory. You can estimate the amount of memory needed with the formulas in section 4 of the guide mentioned earlier. The benchmark I ran used ~0.5 GB of VRAM in single precision and ~1 GB in double precision. Again: if your model does not fit in the GPU memory, you currently can not use GPU acceleration. GPU memory from dual-cards or more than one card does stack, so you can use this to simulate larger models.

Question
Which GPUs can I use for GPU acceleration in Ansys Fluent
Answer
Ansys only recommends Tesla compute cards for this purpose. However, you can use virtually any recent Nvidia GPU. Yes, even Geforce cards, I verified this with a GTX 1060.
That being said, not all GPUs are created equal. The main differentiation lies in the DP compute performance. Nearly all modern Geforce and Quadro GPUs have a DP/SP performance ratio of 1/32. A Quadro P6000, one of the most expensive GPUs you can buy right now has a theoretical peak performance of 11758 GFLOPS SP but only 367 GFLOPS DP. Just about the same as the seriously outdated Quadro 5000 I used in this test. This is not an issue if you want to compute in SP, but a colossal waste of money if you want to perform simulations in DP. In this case you will have to buy a Tesla card. Be careful though: even some of the Tesla cards now have reduced DP capabilities because their target application is deep learning.
One of the last exceptions from this rule that is still somewhat relevant today is the first generation of Titan GPUs "Kepler" released in 2013 and 2014 (Titan, Titan Black, Titan Z). They have a DP/SP ratio of 1/3 and can be bought used for a reasonable price.

Question
Should I spend extra money on a compute GPU when buying a new Fluent workstation
Answer
For a "general purpose" Workstation with a limited budget the answer is probably no. You are better off spending excess money on more CPU performance in most cases. Only when you have maxed out CPU performance or if you are sure that you mostly use the solvers that benefit from GPU acceleration and your models are small enough you might consider it.

Edit: here is a nearly exhaustive list of Nvidia GPUs with high DP capabilities:
Attached Images
File Type: png result1.png (16.3 KB, 133 views)
File Type: png result2.png (16.4 KB, 184 views)
File Type: png dell128_coupled_sp.png (13.3 KB, 128 views)
File Type: png nvidia_gpus.png (34.6 KB, 145 views)
File Type: png result3.png (15.4 KB, 114 views)
sbaffini, juliom, Echidna and 28 others like this.

Last edited by flotus1; April 29, 2017 at 11:25.
flotus1 is offline   Reply With Quote

 


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Resolved] GPU on Fluent Daveo643 FLUENT 4 March 7, 2018 08:02
How to open Icem mesh in Ansys Fluent? emmkell FLUENT 27 February 6, 2018 03:34
Can you help me with a problem in ansys static structural solver? sourabh.porwal Structural Mechanics 0 March 27, 2016 17:07
Running UDF with Supercomputer roi247 FLUENT 4 October 15, 2015 13:41
Ansys structural and fluent for FSI assafwei FLUENT 1 June 20, 2014 10:56


All times are GMT -4. The time now is 12:28.