GPU acceleration in Ansys Fluent

wkernkamp · May 11, 2023, 22:10

Quote:

Originally Posted by flotus1

I find it very commendable that you put in the effort to make it work with hardware we can actually get our hands on.
But a 25x speedup from a 2080TI compared to CPU begs the question: what CPU are you comparing to? And are we talking multi-threaded or single-threaded.
Please don't take this the wrong way, but such outrageously high speedups from GPU acceleration, when comparing to a reasonably modern CPU, usually get you a few raised eyebrows in the HPC community. Because it usually means that the CPU implementation simply does not have the same level of optimization as the GPU implementation.
When looking at the raw specs like theoretical FP32 operations per second, or memory bandwidth, there is not a 25x gap between CPUs and GPUs. At least leaving aside hardware accelerated operations.

It looks line the 25x factor is between the gpu and one cpu core (per preceding message)

arjun · May 11, 2023, 23:08

Quote:

Originally Posted by wkernkamp

It looks line the 25x factor is between the gpu and one cpu core (per preceding message)

Yes. Its compared to 1 processor.

The reason for reporting this way is also the real advantage in my eyes. On my machine with the processor i mentioned (and also noted in this forum) that the scaling drops after 10 to 12 process. So comparing the results with any other number than 1 is kind of misleading as that would vary from machine to machine (different scaling).

This is real advantage because i could never achieve 25x scaling on my machine on CPU and i assume this will be the case for most people with their desktops.

Now let me clarify further, wildkatze itself can run with multiple CPU processes (this is where the scaling info comes from). The GPU thing is separate code designed for single processor single GPU idea (easiest one to have).

At the moment the main model we have is VOF. Here for example the calculation that took 7.5 hours to run with -np 12 (12 proc) is run in 1 hour in GPU. The same thing OpenFOAM takes more than 2.5 days to run with 12 process. (Wildkatze can run on larger timestep size so).

This is real advantage in my books.

wkernkamp · May 12, 2023, 02:38

Quote:

Originally Posted by arjun

Yes. Its compared to 1 processor.

The reason for reporting this way is also the real advantage in my eyes. On my machine with the processor i mentioned (and also noted in this forum) that the scaling drops after 10 to 12 process. So comparing the results with any other number than 1 is kind of misleading as that would vary from machine to machine (different scaling).

This is real advantage because i could never achieve 25x scaling on my machine on CPU and i assume this will be the case for most people with their desktops.

Now let me clarify further, wildkatze itself can run with multiple CPU processes (this is where the scaling info comes from). The GPU thing is separate code designed for single processor single GPU idea (easiest one to have).

At the moment the main model we have is VOF. Here for example the calculation that took 7.5 hours to run with -np 12 (12 proc) is run in 1 hour in GPU. The same thing OpenFOAM takes more than 2.5 days to run with 12 process. (Wildkatze can run on larger timestep size so).

This is real advantage in my books.

It is not wrong to present a 1 GPU process versus 1 cpu core comparison, as long as we know what is being compared. There is even an advantage, because we can apply the maximum achievable speedup factor from multicore on a specific machine to get multicore cpu performance.

I gather from your second paragraph that wildkatze has an improved solver for VOF. So in comparison to OpenFOAM, there is a factor 8 from the code, and an additional factor 7.5 from going to GPU, right?

arjun · May 12, 2023, 02:48

Quote:

Originally Posted by wkernkamp

It is not wrong to present a 1 GPU process versus 1 cpu core comparison, as long as we know what is being compared. There is even an advantage, because we can apply the maximum achievable speedup factor from multicore on a specific machine to get multicore cpu performance.

I gather from your second paragraph that wildkatze has an improved solver for VOF. So in comparison to OpenFOAM, there is a factor 8 from the code, and an additional factor 7.5 from going to GPU, right?

Yes. Our VOF is implicit and can handle larger courants numbers and it runs many times faster than openFOAM. (Solver is stable but for the sake of accuracy we suggest user to not use Courant numbers above 5). (I presented it at multiphase conference and trying to get paper published).

The 7.5 hours is with -np 12 so the 7.5 times is almost what max i can get on my machine (since after 12 process the scaling is poor) and it is not compared to single process run.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Resolved] GPU on Fluent	Daveo643	FLUENT	4	March 7, 2018 08:02
How to open Icem mesh in Ansys Fluent?	emmkell	FLUENT	27	February 6, 2018 03:34
Can you help me with a problem in ansys static structural solver?	sourabh.porwal	Structural Mechanics	0	March 27, 2016 17:07
Running UDF with Supercomputer	roi247	FLUENT	4	October 15, 2015 13:41
Ansys structural and fluent for FSI	assafwei	FLUENT	1	June 20, 2014 10:56