CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

GPU acceleration in Ansys Fluent

Register Blogs Community New Posts Updated Threads Search

Like Tree71Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 11, 2023, 22:10
Default
  #61
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I find it very commendable that you put in the effort to make it work with hardware we can actually get our hands on.
But a 25x speedup from a 2080TI compared to CPU begs the question: what CPU are you comparing to? And are we talking multi-threaded or single-threaded.
Please don't take this the wrong way, but such outrageously high speedups from GPU acceleration, when comparing to a reasonably modern CPU, usually get you a few raised eyebrows in the HPC community. Because it usually means that the CPU implementation simply does not have the same level of optimization as the GPU implementation.
When looking at the raw specs like theoretical FP32 operations per second, or memory bandwidth, there is not a 25x gap between CPUs and GPUs. At least leaving aside hardware accelerated operations.
It looks line the 25x factor is between the gpu and one cpu core (per preceding message)
wkernkamp is offline   Reply With Quote

Old   May 11, 2023, 23:08
Default
  #62
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by wkernkamp View Post
It looks line the 25x factor is between the gpu and one cpu core (per preceding message)

Yes. Its compared to 1 processor.


The reason for reporting this way is also the real advantage in my eyes. On my machine with the processor i mentioned (and also noted in this forum) that the scaling drops after 10 to 12 process. So comparing the results with any other number than 1 is kind of misleading as that would vary from machine to machine (different scaling).

This is real advantage because i could never achieve 25x scaling on my machine on CPU and i assume this will be the case for most people with their desktops.


Now let me clarify further, wildkatze itself can run with multiple CPU processes (this is where the scaling info comes from). The GPU thing is separate code designed for single processor single GPU idea (easiest one to have).


At the moment the main model we have is VOF. Here for example the calculation that took 7.5 hours to run with -np 12 (12 proc) is run in 1 hour in GPU. The same thing OpenFOAM takes more than 2.5 days to run with 12 process. (Wildkatze can run on larger timestep size so).

This is real advantage in my books.

Last edited by arjun; May 12, 2023 at 00:58.
arjun is offline   Reply With Quote

Old   May 12, 2023, 02:38
Default
  #63
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by arjun View Post
Yes. Its compared to 1 processor.


The reason for reporting this way is also the real advantage in my eyes. On my machine with the processor i mentioned (and also noted in this forum) that the scaling drops after 10 to 12 process. So comparing the results with any other number than 1 is kind of misleading as that would vary from machine to machine (different scaling).

This is real advantage because i could never achieve 25x scaling on my machine on CPU and i assume this will be the case for most people with their desktops.


Now let me clarify further, wildkatze itself can run with multiple CPU processes (this is where the scaling info comes from). The GPU thing is separate code designed for single processor single GPU idea (easiest one to have).


At the moment the main model we have is VOF. Here for example the calculation that took 7.5 hours to run with -np 12 (12 proc) is run in 1 hour in GPU. The same thing OpenFOAM takes more than 2.5 days to run with 12 process. (Wildkatze can run on larger timestep size so).

This is real advantage in my books.
It is not wrong to present a 1 GPU process versus 1 cpu core comparison, as long as we know what is being compared. There is even an advantage, because we can apply the maximum achievable speedup factor from multicore on a specific machine to get multicore cpu performance.


I gather from your second paragraph that wildkatze has an improved solver for VOF. So in comparison to OpenFOAM, there is a factor 8 from the code, and an additional factor 7.5 from going to GPU, right?
wkernkamp is offline   Reply With Quote

Old   May 12, 2023, 02:48
Default
  #64
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,272
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by wkernkamp View Post
It is not wrong to present a 1 GPU process versus 1 cpu core comparison, as long as we know what is being compared. There is even an advantage, because we can apply the maximum achievable speedup factor from multicore on a specific machine to get multicore cpu performance.


I gather from your second paragraph that wildkatze has an improved solver for VOF. So in comparison to OpenFOAM, there is a factor 8 from the code, and an additional factor 7.5 from going to GPU, right?
Yes. Our VOF is implicit and can handle larger courants numbers and it runs many times faster than openFOAM. (Solver is stable but for the sake of accuracy we suggest user to not use Courant numbers above 5). (I presented it at multiphase conference and trying to get paper published).

The 7.5 hours is with -np 12 so the 7.5 times is almost what max i can get on my machine (since after 12 process the scaling is poor) and it is not compared to single process run.
oswald and wkernkamp like this.
arjun is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Resolved] GPU on Fluent Daveo643 FLUENT 4 March 7, 2018 08:02
How to open Icem mesh in Ansys Fluent? emmkell FLUENT 27 February 6, 2018 03:34
Can you help me with a problem in ansys static structural solver? sourabh.porwal Structural Mechanics 0 March 27, 2016 17:07
Running UDF with Supercomputer roi247 FLUENT 4 October 15, 2015 13:41
Ansys structural and fluent for FSI assafwei FLUENT 1 June 20, 2014 10:56


All times are GMT -4. The time now is 12:34.