GPU Parallelization in OpenFoam

kk415 · January 17, 2019, 09:29

Hello Foamers,

I have a solver in our research group in OF 2.3.1 which need to be parallel in GPU. I am new to GPU parallelization, so can anyone guide me from where to start with. I did some google search on this topic and came across RapidCFD(this is good but not an opensource) and GPGPU(this is the linear solver in GPU, but I need to make the solver run fully in GPU).

massive_turbulence · January 17, 2019, 21:45

Quote:

Originally Posted by kk415

Hello Foamers,

I have a solver in our research group in OF 2.3.1 which need to be parallel in GPU. I am new to GPU parallelization, so can anyone guide me from where to start with. I did some google search on this topic and came across RapidCFD(this is good but not an opensource) and GPGPU(this is the linear solver in GPU, but I need to make the solver run fully in GPU).

AFAIK, OpenFOAM uses openmpi and that's cpu only, you'd have to port openfoam yourself to use gpu calculations. With CPU and ram you have more memory anyways for bigger meshes. GPU memory is expensive.

klausb · February 6, 2019, 19:41

Hello Krishna,

there are a number of technical aspects which need clarification because the solution could be pretty simple or incredibly difficult depending on what exactly you want to do.

Do you want to run the entire cfd solver (with solver I mean here something like simpleFoam or pimpleFoam) on a GPU or just the matrix solvers like PCG or BiCGStab? Solution for ONE Nvidia or ONE AMD GPU = PARALLUTION library + OpenFoam plugin

Then, the foam-extend project implements the CUFFLINK library which supports multi GPU computations. This could possibly be ported to other OpenFoam versions. In this case, the matrix computations would be moved to Nvidia GPUs (PCG, BiCGStab...). For multi GPU computations on one node, it might be easier to implement the MAGMA library which has options to span/compute a matrix across multiple GPUs. If you want to use AMD GPUs + CUFFLINK you would have to port CUFFLINK using the hipify tool.

There are many more scenarios based on your requirements/plans and the specific hardware you want to use i.e. Nvidia or AMD GPUs, which GPU generation with how much GPU memory (often the showstopper) well, depending on your programming skills, why not going for hybrid computations using the CPU and GPU cores?

Tell us a little more about what exactly you want to compute on which GPU models?

Klaus

kk415 · February 10, 2019, 23:49

Hello Klaus,

Sorry for the late reply.

I would like to port the entire solver (icoFoam) on the Nvidia Tesla GPU K20 and K80.
I am aware and have used the linear solver libraries i.e. cufflink,
But i also want to get rid of the time elapsed while transferring the data to the GPU. So I would like to port the whole solver onto the GPU using CUDA programming model.

dybuk · February 11, 2019, 03:57

Quote:

Originally Posted by kk415

Hello Foamers,
. I did some google search on this topic and came across RapidCFD(this is good but not an opensource)

What do you mean by "its not open source" ?
In a Rapid-CFD github repo there is a COPYING file which contains GNU GPL license. Also on sim-flow (developers of rapidcfd) webiste (https://sim-flow.com/rapid-cfd-gpu/) i' ve found :

Quote:

RapidCFD is distributed by SIMFLOW Technologies and is freely available and open source, licensed under the GNU General Public Licence. This offering is not approved or endorsed by ESI-Group, the producer of the OpenFOAM® software and owner of the OPENFOAM® trade mark.

klausb · February 11, 2019, 05:15

Hello Krishna,

the COPYING file in https://github.com/Atizar/RapidCFD-dev talks about GPL 3.0. Does this version include the 100.000 nodes limit of the free version or is this the free version?

Here's an explanation how icoFoam works: https://openfoamwiki.net/index.php/IcoFoam

An alternative approach could be to use MAGMA functionality to implement the above as there are MAGMA solvers and BLAS routines working with multiple GPUs. This means, you would convert the LDU matrix to CSR format as a starting point. I think MAGMA would relieve you from dealing with multiple GPUs, MPI, domain decomposition and handle that for you.

The paper "Complete PISO and SIMPLE solvers on Graphics Processing Units" might give you some hints too.

Let me know how you plan to approach it and I'll have another look into my archive?

Klaus

dybuk · February 11, 2019, 05:26

To "clear things out" there are two products develpoed by simflow:
*simFlow - an OpenFOAM GUI, it has two licensing options FREE(up to 100 000 nodes, up to 2 or 4 cores) and PRO(unlimited mesh size etc)
*rapidCFD - gpu port of openfoam 2.3 - opensource (GNU GPL) and doesn't have any limits regarding mesh size etc.

agustinvo · June 4, 2019, 12:02

Hi

do you have any news about this topic? I woudl like to do some test in a graphic card I have. I was thinking to use Rapid-CFD and compare it with the standard OpenFOAM.

Tobi · August 25, 2020, 16:20

Hi Klaus, just one question as I came across your post. Did you already used openfoam with calculation the matrices on the GPU? I would be highly interested in that.

E. G. I do have a 32 core machine and I am going to buy the Nvidia 30xx GPU card soon. Hence, I would be interested in using the GPU cors for the matrix operations in the foundations version.

If I got you correct, I would need to chech the Cufflink library of the extend project and port it to the foundation, right? However I cannot believe that it is so simple without any bottleneck

but I am not an expert in that topic.

klausb · August 26, 2020, 05:20

Hi Tobi,

let me start with some general background information. There's no need to port the cufflink library, it was one of the first implementations many years ago, hence it's well known but not the latest approach. Implementations work with both mainstream OpenFOAM version OF7 and OF1812, the ones I last worked with. You're also NOT limited to Nvidia GPUs, AMD GPUs are fine as well! The upcoming founders edition from Nvidia is probably not a good choice as it's unlikely that it will support fp64 / double precision computations. Think of the "AMD Radeon Pro VII" or two of them.

Many people who worked on GPU implementations did it commercially so they didn't share implementation details. With GPU computations becoming mainstream, this situation has changed.

There are three concepts for an implementation:

1: Solve the linear system on the GPU ("transfer" the matrix solvers PCG, BiCGStab... to the GPU - in practice, use GPU solver libraries)
2: Use the GPU as an additional "big" core (create hybrid CPU/GPU solvers matrix solvers as suggested by Amani AlOnazi in her thesis 2013, again PCG...)
3: Move the entire solver (simpleFOAM, pisoFOAM...) to the GPU as it's done in RapidCFD to avoid extensive, slow data transfers between CPU and GPU

Pros and Cons:

1: "easier to implement" but often limited by GPU memory size and data transfer speed, PCIE 4.0 should help with performance as data transfers should be less of a bottleneck
2: Probably best for engineering workstations with many CPU cores using the GPU as a "turbo booster", needs a hybrid CPU/GPU solver implementation from scratch
3: Removes most of the data transfer bottleneck as updates are done on the GPU but probably not worth the effort with PCIE 5.0, 6.0, data CPU/GPU coherency on the horizon

Concept 1 is probably not worth the effort for you as you would simply transfer computations from the 32 core CPU to a GPU and the 32 CPU would be unused. If you invest in a dual socket motherboard and another CPU you might end up with a cheaper, faster solution.

Implementation:

Concept 1: Convert the matrix to CSR format (with global indices!), solve the linear system with the solvers provided by one of the following LA libraries: MAGMA or hipHMAGMA (for AMD), rocalution (for AMD which can also be compiled for Nvidia as it's coded in HIP) or AMGCL or GASPI's Linear Algebra solvers, maybe also AmgX. The OpenMP backends for CPU computations are usually a lot slower than using MPI, GPU computations are "automatic", as that's what the libraries are designed for, most of your CPU cores will be wasted, usually 1 CPU core supports 1 GPU, that's why GPU servers in data centers have low core count CPUs. Engineering workstations are different.

Concept 2: I have started looking deeper into that but need to upgrade my hardware to move forward. The idea is to use the GPU as a big core or "turbo booster". The above mentioned LA libraries don't support hybrid CPU/GPU backend usage. Load balancing will be a key for good performance (a performance factor comparing the CPU core performance with the GPU performance could be a simple solution which could be applied to an uneven matrix decomposition), one core per GPU will be needed to "support" the GPU, the coding challenge is how to link/integrate/interface the "big" core = (supporting CPU core)<->GPU computations with the OpenFOAM MPI communication of the remaining CPU cores.

Concept 3: Use RapidCFD, I see a high risk, that development work will be a wast of time with faster PCIE and CPU/GPU data coherency on the horizon

Klaus

Tobi · August 26, 2020, 08:50

Hi Klaus,

thank you for your comprehensive reply regarding GPU implementation and OpenFOAM (or CFD in general, or CPU/GPU in general).

What is your conclusion regarding solving the momentum and pressure as a coupled system on the GPU rather than CPU. Shouldn´t it be much faster? Okay, I am not talking about test cases with 500.000 cells. More to the direction of > 20.000.000 cells. Out of the box, without any experience, I would expect that solving the linear system on GPU should be faster as the architecture of the GPU is already for parallel calculations compared to CPU´s.

However, you are right, the 32 core AMD TR is a good one. Probably one has more benefit using either two 32 core CPU´s on one motherboard or a 64 core guy out of the box as there is no need for new programming.

I will think about that but I will probably buy the founders 3090 card as I am also doing renderings and X-Ray-Tracing stuff.

Tobi

klausb · August 26, 2020, 10:37

Hi Tobi,

I don't have any benchmarks with recent hardware and PCIE 4.0 comparing a recent 16 or better 32 core CPU with a recent 4.0 GPU like the AMD Radeon Pro VII. Real benefits for workstations should come from concept 2, concept 1 will be beneficial for cases of a GPU hardware specific size that fit into GPU memory and were the memory size is in tune with the PCIE speed. Asynchronous data transfer and other features that come with the latest LA libraries are also beneficial. When PCIE 2.0 was standard speedups of 1,2x-1,8x compared to the old 6 core XEONS or two of them, in rare cases up to 3,5x were realistic... claims of speedups of 10x...40x were comparisons between 4 core gaming processors with only 2 memory channels and a then professional GPU like the Nvidia K80 or neglected the slow PCIE data transfer comparing CPU hardware performance to GPU hardware performance but not the time needed to solve a CFD simulation.

GPU computation should improve stability as reductions on the GPU produce smaller errors. There are many forms of hybridisation of CPU/GPU computations that could be beneficial but the cost/benefit calculation for an expensive GPU in an OpenFOAM workstation is difficult. Nowadays you can buy a lot of CPU performance for the price of a GPU and the CPU should be more versatile for a wider range of cases.

If you want to leverage the 3080, check, whether your cases are suitable for mixed precision fp64/fp32 computations but that's a different topic.

Klaus

klausb · September 14, 2020, 09:08

@Tobi,

to use OpenFOAM with a new Nvidia card and your 32 core processor, you could try the following:

Install "PETSc4FOAM" (a library to plug-in PETSc into the OpenFOAM framework) and extend it with the "AmgXWrapper" for PETSc.

A nice feature of the "AmgXWrapper" is:

"... when the number of MPI processes is greater than the number of GPU devices, this wrapper will do the system consolidation/data scattering/data gathering automatically. ..."

See also "Integrating OpenFOAM and GPUs using amgX" by Rathnayake, T where you can find details about the AMGX setup which might be helpful.

It would be great if you could provide a benchmark.

Klaus

kcengiz · December 24, 2020, 12:48

Hi Klaus,
I have been experimenting with cudasolvers of foam-extend-4.0 (where linear systems are solved on GPU only). Naturally, i imagined 1 GPU needs one core, and i setup everything like that. As I observed that the GPU memory was far from being full, and the rest of CPU cores were wasted, I wanted to try more MPI tasks per GPU. It worked like that surprisingly, and the resulting performance seemed to be better. Is this approach something advised against? I don't know anything about the essence of CPU-GPU memory copying, but I wonder if there is anything like mpi-tasks-are-competing-for-the-CPU-GPU-bandwidth-use thing.

Note: this approach was not still good enough for me because i could not manage to use the two GPU cards on the machine. All the mpi tasks happened to simutaneously use the first card only.

Quote:

Originally Posted by klausb

Hi Tobi,

let me start with some general background information. There's no need to port the cufflink library, it was one of the first implementations many years ago, hence it's well known but not the latest approach. Implementations work with both mainstream OpenFOAM version OF7 and OF1812, the ones I last worked with. You're also NOT limited to Nvidia GPUs, AMD GPUs are fine as well! The upcoming founders edition from Nvidia is probably not a good choice as it's unlikely that it will support fp64 / double precision computations. Think of the "AMD Radeon Pro VII" or two of them.

Many people who worked on GPU implementations did it commercially so they didn't share implementation details. With GPU computations becoming mainstream, this situation has changed.

There are three concepts for an implementation:

1: Solve the linear system on the GPU ("transfer" the matrix solvers PCG, BiCGStab... to the GPU - in practice, use GPU solver libraries)
2: Use the GPU as an additional "big" core (create hybrid CPU/GPU solvers matrix solvers as suggested by Amani AlOnazi in her thesis 2013, again PCG...)
3: Move the entire solver (simpleFOAM, pisoFOAM...) to the GPU as it's done in RapidCFD to avoid extensive, slow data transfers between CPU and GPU

Pros and Cons:

1: "easier to implement" but often limited by GPU memory size and data transfer speed, PCIE 4.0 should help with performance as data transfers should be less of a bottleneck
2: Probably best for engineering workstations with many CPU cores using the GPU as a "turbo booster", needs a hybrid CPU/GPU solver implementation from scratch
3: Removes most of the data transfer bottleneck as updates are done on the GPU but probably not worth the effort with PCIE 5.0, 6.0, data CPU/GPU coherency on the horizon

Concept 1 is probably not worth the effort for you as you would simply transfer computations from the 32 core CPU to a GPU and the 32 CPU would be unused. If you invest in a dual socket motherboard and another CPU you might end up with a cheaper, faster solution.

Implementation:

Concept 1: Convert the matrix to CSR format (with global indices!), solve the linear system with the solvers provided by one of the following LA libraries: MAGMA or hipHMAGMA (for AMD), rocalution (for AMD which can also be compiled for Nvidia as it's coded in HIP) or AMGCL or GASPI's Linear Algebra solvers, maybe also AmgX. The OpenMP backends for CPU computations are usually a lot slower than using MPI, GPU computations are "automatic", as that's what the libraries are designed for, most of your CPU cores will be wasted, usually 1 CPU core supports 1 GPU, that's why GPU servers in data centers have low core count CPUs. Engineering workstations are different.

Concept 2: I have started looking deeper into that but need to upgrade my hardware to move forward. The idea is to use the GPU as a big core or "turbo booster". The above mentioned LA libraries don't support hybrid CPU/GPU backend usage. Load balancing will be a key for good performance (a performance factor comparing the CPU core performance with the GPU performance could be a simple solution which could be applied to an uneven matrix decomposition), one core per GPU will be needed to "support" the GPU, the coding challenge is how to link/integrate/interface the "big" core = (supporting CPU core)<->GPU computations with the OpenFOAM MPI communication of the remaining CPU cores.

Concept 3: Use RapidCFD, I see a high risk, that development work will be a wast of time with faster PCIE and CPU/GPU data coherency on the horizon

Klaus

klausb · December 26, 2020, 04:47

Hello,

I think Foam-extend still uses the CUFFLINK library.

Check here, how to enable multi GPU mode: https://code.google.com/archive/p/cu...artedPage.wiki

Sections "Multi-GPU Implementation" and "Possible changes in compute mode".

If you find the time, maybe you could run and share a benchmark of a larger tutorial case like the motorBike tutorial on a current CPU vs your GPU. It would especially be interesting if you have a system supporting PCIE 4.0.

Klaus

Dcn · December 5, 2022, 11:46

Quote:

Originally Posted by klausb

@Tobi,

to use OpenFOAM with a new Nvidia card and your 32 core processor, you could try the following:

Install "PETSc4FOAM" (a library to plug-in PETSc into the OpenFOAM framework) and extend it with the "AmgXWrapper" for PETSc.

A nice feature of the "AmgXWrapper" is:

"... when the number of MPI processes is greater than the number of GPU devices, this wrapper will do the system consolidation/data scattering/data gathering automatically. ..."

See also "Integrating OpenFOAM and GPUs using amgX" by Rathnayake, T where you can find details about the AMGX setup which might be helpful.

It would be great if you could provide a benchmark.

Klaus

Hi Klaus can you kindly tell me the procedure to get the "Integrating OpenFOAM and GPUs using amgX" by Rathnayake, T
Thanks

klausb · January 16, 2023, 14:08

See:

https://github.com/barbagroup/AmgXWrapper

AmgXWrapper library details from the Author:

https://on-demand.gputechconf.com/gt...-cfd-codes.pdf

January 17, 2019, 09:29	GPU Parallelization in OpenFoam	#1
kk415 Senior Member krishna kant Join Date: Feb 2016 Location: Hyderabad, India Posts: 133 Rep Power: 10	Hello Foamers, I have a solver in our research group in OF 2.3.1 which need to be parallel in GPU. I am new to GPU parallelization, so can anyone guide me from where to start with. I did some google search on this topic and came across RapidCFD(this is good but not an opensource) and GPGPU(this is the linear solver in GPU, but I need to make the solver run fully in GPU).

February 6, 2019, 19:41		#3
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 250 Rep Power: 22	Hello Krishna, there are a number of technical aspects which need clarification because the solution could be pretty simple or incredibly difficult depending on what exactly you want to do. Do you want to run the entire cfd solver (with solver I mean here something like simpleFoam or pimpleFoam) on a GPU or just the matrix solvers like PCG or BiCGStab? Solution for ONE Nvidia or ONE AMD GPU = PARALLUTION library + OpenFoam plugin Then, the foam-extend project implements the CUFFLINK library which supports multi GPU computations. This could possibly be ported to other OpenFoam versions. In this case, the matrix computations would be moved to Nvidia GPUs (PCG, BiCGStab...). For multi GPU computations on one node, it might be easier to implement the MAGMA library which has options to span/compute a matrix across multiple GPUs. If you want to use AMD GPUs + CUFFLINK you would have to port CUFFLINK using the hipify tool. There are many more scenarios based on your requirements/plans and the specific hardware you want to use i.e. Nvidia or AMD GPUs, which GPU generation with how much GPU memory (often the showstopper) well, depending on your programming skills, why not going for hybrid computations using the CPU and GPU cores? Tell us a little more about what exactly you want to compute on which GPU models? Klaus Tobi, massive_turbulence, peyman.havaej and 2 others like this.

August 25, 2020, 16:20		#9
Tobi Super Moderator Tobias Holzmann Join Date: Oct 2010 Location: Tussenhausen Posts: 2,708 Blog Entries: 6 Rep Power: 51	Hi Klaus, just one question as I came across your post. Did you already used openfoam with calculation the matrices on the GPU? I would be highly interested in that. E. G. I do have a 32 core machine and I am going to buy the Nvidia 30xx GPU card soon. Hence, I would be interested in using the GPU cors for the matrix operations in the foundations version. If I got you correct, I would need to chech the Cufflink library of the extend project and port it to the foundation, right? However I cannot believe that it is so simple without any bottleneck but I am not an expert in that topic. __________________ Keep foaming, Tobias Holzmann Last edited by Tobi; August 26, 2020 at 04:26.

August 26, 2020, 05:20		#10
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 250 Rep Power: 22	Hi Tobi, let me start with some general background information. There's no need to port the cufflink library, it was one of the first implementations many years ago, hence it's well known but not the latest approach. Implementations work with both mainstream OpenFOAM version OF7 and OF1812, the ones I last worked with. You're also NOT limited to Nvidia GPUs, AMD GPUs are fine as well! The upcoming founders edition from Nvidia is probably not a good choice as it's unlikely that it will support fp64 / double precision computations. Think of the "AMD Radeon Pro VII" or two of them. Many people who worked on GPU implementations did it commercially so they didn't share implementation details. With GPU computations becoming mainstream, this situation has changed. There are three concepts for an implementation: 1: Solve the linear system on the GPU ("transfer" the matrix solvers PCG, BiCGStab... to the GPU - in practice, use GPU solver libraries) 2: Use the GPU as an additional "big" core (create hybrid CPU/GPU solvers matrix solvers as suggested by Amani AlOnazi in her thesis 2013, again PCG...) 3: Move the entire solver (simpleFOAM, pisoFOAM...) to the GPU as it's done in RapidCFD to avoid extensive, slow data transfers between CPU and GPU Pros and Cons: 1: "easier to implement" but often limited by GPU memory size and data transfer speed, PCIE 4.0 should help with performance as data transfers should be less of a bottleneck 2: Probably best for engineering workstations with many CPU cores using the GPU as a "turbo booster", needs a hybrid CPU/GPU solver implementation from scratch 3: Removes most of the data transfer bottleneck as updates are done on the GPU but probably not worth the effort with PCIE 5.0, 6.0, data CPU/GPU coherency on the horizon Concept 1 is probably not worth the effort for you as you would simply transfer computations from the 32 core CPU to a GPU and the 32 CPU would be unused. If you invest in a dual socket motherboard and another CPU you might end up with a cheaper, faster solution. Implementation: Concept 1: Convert the matrix to CSR format (with global indices!), solve the linear system with the solvers provided by one of the following LA libraries: MAGMA or hipHMAGMA (for AMD), rocalution (for AMD which can also be compiled for Nvidia as it's coded in HIP) or AMGCL or GASPI's Linear Algebra solvers, maybe also AmgX. The OpenMP backends for CPU computations are usually a lot slower than using MPI, GPU computations are "automatic", as that's what the libraries are designed for, most of your CPU cores will be wasted, usually 1 CPU core supports 1 GPU, that's why GPU servers in data centers have low core count CPUs. Engineering workstations are different. Concept 2: I have started looking deeper into that but need to upgrade my hardware to move forward. The idea is to use the GPU as a big core or "turbo booster". The above mentioned LA libraries don't support hybrid CPU/GPU backend usage. Load balancing will be a key for good performance (a performance factor comparing the CPU core performance with the GPU performance could be a simple solution which could be applied to an uneven matrix decomposition), one core per GPU will be needed to "support" the GPU, the coding challenge is how to link/integrate/interface the "big" core = (supporting CPU core)<->GPU computations with the OpenFOAM MPI communication of the remaining CPU cores. Concept 3: Use RapidCFD, I see a high risk, that development work will be a wast of time with faster PCIE and CPU/GPU data coherency on the horizon Klaus Tobi, raj kumar saini, roenby and 8 others like this.

August 26, 2020, 08:50		#11
Tobi Super Moderator Tobias Holzmann Join Date: Oct 2010 Location: Tussenhausen Posts: 2,708 Blog Entries: 6 Rep Power: 51	Hi Klaus, thank you for your comprehensive reply regarding GPU implementation and OpenFOAM (or CFD in general, or CPU/GPU in general). What is your conclusion regarding solving the momentum and pressure as a coupled system on the GPU rather than CPU. Shouldn´t it be much faster? Okay, I am not talking about test cases with 500.000 cells. More to the direction of > 20.000.000 cells. Out of the box, without any experience, I would expect that solving the linear system on GPU should be faster as the architecture of the GPU is already for parallel calculations compared to CPU´s. However, you are right, the 32 core AMD TR is a good one. Probably one has more benefit using either two 32 core CPU´s on one motherboard or a 64 core guy out of the box as there is no need for new programming. I will think about that but I will probably buy the founders 3090 card as I am also doing renderings and X-Ray-Tracing stuff. Tobi __________________ Keep foaming, Tobias Holzmann

February 10, 2019, 23:49		#4
kk415 Senior Member krishna kant Join Date: Feb 2016 Location: Hyderabad, India Posts: 133 Rep Power: 10	Hello Klaus, Sorry for the late reply. I would like to port the entire solver (icoFoam) on the Nvidia Tesla GPU K20 and K80. I am aware and have used the linear solver libraries i.e. cufflink, But i also want to get rid of the time elapsed while transferring the data to the GPU. So I would like to port the whole solver onto the GPU using CUDA programming model.

February 11, 2019, 05:15		#6
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 250 Rep Power: 22	Hello Krishna, the COPYING file in https://github.com/Atizar/RapidCFD-dev talks about GPL 3.0. Does this version include the 100.000 nodes limit of the free version or is this the free version? Here's an explanation how icoFoam works: https://openfoamwiki.net/index.php/IcoFoam An alternative approach could be to use MAGMA functionality to implement the above as there are MAGMA solvers and BLAS routines working with multiple GPUs. This means, you would convert the LDU matrix to CSR format as a starting point. I think MAGMA would relieve you from dealing with multiple GPUs, MPI, domain decomposition and handle that for you. The paper "Complete PISO and SIMPLE solvers on Graphics Processing Units" might give you some hints too. Let me know how you plan to approach it and I'll have another look into my archive? Klaus

February 11, 2019, 05:26		#7
dybuk Member W.T Join Date: Oct 2012 Posts: 35 Rep Power: 13	To "clear things out" there are two products develpoed by simflow: simFlow - an OpenFOAM GUI, it has two licensing options FREE(up to 100 000 nodes, up to 2 or 4 cores) and PRO(unlimited mesh size etc) rapidCFD - gpu port of openfoam 2.3 - opensource (GNU GPL) and doesn't have any limits regarding mesh size etc.

June 4, 2019, 12:02		#8
agustinvo Senior Member Agustín Villa Join Date: Apr 2013 Location: Alcorcón Posts: 313 Rep Power: 15	Hi do you have any news about this topic? I woudl like to do some test in a graphic card I have. I was thinking to use Rapid-CFD and compare it with the standard OpenFOAM.

August 26, 2020, 10:37		#12
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 250 Rep Power: 22	Hi Tobi, I don't have any benchmarks with recent hardware and PCIE 4.0 comparing a recent 16 or better 32 core CPU with a recent 4.0 GPU like the AMD Radeon Pro VII. Real benefits for workstations should come from concept 2, concept 1 will be beneficial for cases of a GPU hardware specific size that fit into GPU memory and were the memory size is in tune with the PCIE speed. Asynchronous data transfer and other features that come with the latest LA libraries are also beneficial. When PCIE 2.0 was standard speedups of 1,2x-1,8x compared to the old 6 core XEONS or two of them, in rare cases up to 3,5x were realistic... claims of speedups of 10x...40x were comparisons between 4 core gaming processors with only 2 memory channels and a then professional GPU like the Nvidia K80 or neglected the slow PCIE data transfer comparing CPU hardware performance to GPU hardware performance but not the time needed to solve a CFD simulation. GPU computation should improve stability as reductions on the GPU produce smaller errors. There are many forms of hybridisation of CPU/GPU computations that could be beneficial but the cost/benefit calculation for an expensive GPU in an OpenFOAM workstation is difficult. Nowadays you can buy a lot of CPU performance for the price of a GPU and the CPU should be more versatile for a wider range of cases. If you want to leverage the 3080, check, whether your cases are suitable for mixed precision fp64/fp32 computations but that's a different topic. Klaus

September 14, 2020, 09:08		#13
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 250 Rep Power: 22	@Tobi, to use OpenFOAM with a new Nvidia card and your 32 core processor, you could try the following: Install "PETSc4FOAM" (a library to plug-in PETSc into the OpenFOAM framework) and extend it with the "AmgXWrapper" for PETSc. A nice feature of the "AmgXWrapper" is: "... when the number of MPI processes is greater than the number of GPU devices, this wrapper will do the system consolidation/data scattering/data gathering automatically. ..." See also "Integrating OpenFOAM and GPUs using amgX" by Rathnayake, T where you can find details about the AMGX setup which might be helpful. It would be great if you could provide a benchmark. Klaus

December 26, 2020, 04:47		#15
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 250 Rep Power: 22	Hello, I think Foam-extend still uses the CUFFLINK library. Check here, how to enable multi GPU mode: https://code.google.com/archive/p/cu...artedPage.wiki Sections "Multi-GPU Implementation" and "Possible changes in compute mode". If you find the time, maybe you could run and share a benchmark of a larger tutorial case like the motorBike tutorial on a current CPU vs your GPU. It would especially be interesting if you have a system supporting PCIE 4.0. Klaus

January 16, 2023, 14:08	Integrating OpenFOAM and GPUs using amgX	#17
klausb Senior Member Klaus Join Date: Mar 2009 Posts: 250 Rep Power: 22	See: https://github.com/barbagroup/AmgXWrapper AmgXWrapper library details from the Author: https://on-demand.gputechconf.com/gt...-cfd-codes.pdf

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
GPU acceleration in Ansys Fluent	flotus1	Hardware	63	May 12, 2023 02:48
OpenFOAM course for beginners	Jibran	OpenFOAM Announcements from Other Sources	2	November 4, 2019 08:51
OpenFOAM Parallelization MPI Cluster Problem	arslan.ali	OpenFOAM Running, Solving & CFD	4	September 23, 2018 13:50
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days	joegi.geo	OpenFOAM Announcements from Other Sources	0	October 1, 2016 19:20
OpenFOAM Training Jan-Apr 2017, Virtual, London, Houston, Berlin	cfd.direct	OpenFOAM Announcements from Other Sources	0	September 21, 2016 11:50