|
[Sponsors] | |||||
OpenFOAM on AMD GPUs. Container from Infinity Hub: user experiences and performance |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|
|
#1 |
|
New Member
Alexis Espinosa
Join Date: Aug 2009
Location: Australia
Posts: 20
Rep Power: 18 ![]() |
AMD recently provided an OpenFOAM container capable of running on AMD GPUs.
It is in their Infinity Hub: https://www.amd.com/en/technologies/...y-hub/openfoam And my questions are: -How have been the experiences of the community using this OpenFOAM container on AMD GPUs? -Are you reaching cool performance improvements vs just CPU solvers? Thanks a lot, Alexis (PS. I will start using it and post my experiences too) Last edited by alexisespinosa; March 6, 2023 at 22:00. |
|
|
|
|
|
|
|
|
#2 |
|
Senior Member
M. Montero
Join Date: Mar 2009
Location: Madrid
Posts: 157
Rep Power: 18 ![]() |
Hi,
Were you able to launch any simulation using the GPU version? Is it 100% GPU or only the pressure solver is solved in the GPU? Do you know if it could be compatible que Nvidia GPU to test it? Best Regards Marcelino |
|
|
|
|
|
|
|
|
#3 |
|
New Member
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11 ![]() |
Thought I'd share my experiences with this!
![]() My findings, unfortunately with my setup, have been that it remains much faster to solve on CPU than GPU. ![]() I used the HPC_Motorbike example and code provided by AMD in the docker container (not available on the link any longer btw) as-is on my Radeon VII. For the CPU examples, I modified the run to suit a typical CPU-based set of solvers using the standard tutorial fvSolution files. Results as follows. Times shown are SimpleFoam total ClockTime to 20 iterations; and time per iteration, excluding the first time step:
). Unsurprisingly the 1st iteration is much longer as the model is read into vram, which you can see quite easily, but subsequent iterations are also slower than similar solvers on CPU. I have included a time per iteration from iterations 2-20 to illustrate the per-iteration slowdown to account for this.I get that GPUs are made for large models but I am already nearly reaching the 16GB of vram even in this model (5,223,573 cells). I can't run the Medium sized model (~9M cells I think) because I run out of vram I'm running this on my desktop PC for funzies because I don't even want to know how much faster this will be on my usual solving machine (48 core xeon).So, in summary, based on my experiences with a Radeon VII and the Small HPC_motorbike case:
Cheers, Tom |
|
|
|
|
|
|
|
|
#4 | |
|
New Member
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11 ![]() |
Quote:
The initial run script appears to be flexible to support CUDA devices too. I've not dug any deeper and don't have a suitable GPU to test with further, sorry. Code:
Available Options: HIP or CUDA |
||
|
|
|
||
|
|
|
#5 |
|
Senior Member
|
Thanks for your input. Much appreciated.
1/ Can you confirm that the bulk of the CPU time goes into the pressure-solve (independent of CPU vs. GPU)? 2/ How do you precondition PETSc-CG for the pressure solve? 3/ Are you willing to walk an extra mile and compare two flavours of PETSc-CG. Flavour-1: using AMG to precondition PETSc-CG allowing AMG to do a set-up at each linear system solve. Flavour-2: using AMG to precondition PETSc-CG (so far identical to Flavour-1), this time freezing the hierarchy that AMG construct. |
|
|
|
|
|
|
|
|
#6 | |
|
New Member
Tom
Join Date: Dec 2015
Location: Melbourne, Australia
Posts: 11
Rep Power: 11 ![]() |
Quote:
1) I don't have a specific clocktime breakdown, but it would appear so, yes. 2) PETSC-CG is preconditioned using GAMG: Code:
p
{
solver petsc;
petsc
{
options
{
ksp_type cg;
ksp_cg_single_reduction true;
ksp_norm_type none;
mat_type mpiaijhipsparse; //HIPSPARSE
vec_type hip;
//preconditioner
pc_type gamg;
pc_gamg_type "agg"; // smoothed aggregation
pc_gamg_agg_nsmooths "1"; // number of smooths for smoothed aggregation (not smoother iterations)
pc_gamg_coarse_eq_limit "100";
pc_gamg_reuse_interpolation true;
pc_gamg_aggressive_coarsening "2"; //square the graph on the finest N levels
pc_gamg_threshold "-1"; // increase to 0.05 if coarse grids get larger
pc_gamg_threshold_scale "0.5"; // thresholding on coarse grids
pc_gamg_use_sa_esteig true;
// mg_level config
mg_levels_ksp_max_it "1"; // use 2 or 4 if problem is hard (i.e stretched grids)
mg_levels_esteig_ksp_type cg; //max_it "1"; // use 2 or 4 if problem is hard (i.e stretched grids)
// coarse solve (indefinite PC in parallel with 2 cores)
mg_coarse_ksp_type "gmres";
mg_coarse_ksp_max_it "2";
// smoother (cheby)
mg_levels_ksp_type chebyshev;
mg_levels_ksp_chebyshev_esteig "0,0.05,0,1.1";
mg_levels_pc_type "jacobi";
}
caching
{
matrix
{
update always;
}
preconditioner
{
//update always;
update periodic;
periodicCoeffs
{
frequency 40;
}
}
}
}
tolerance 1e-07;
relTol 0.1;
}
|
||
|
|
|
||
|
|
|
#7 | |
|
Senior Member
|
Thanks again.
It appears that by setting Code:
periodicCoeffs
{
frequency 40;
}
To obtain statistics on OpenFoam native GAMG coarsening, insert in system/controlDict I have two follow-up questions if you allow. 1/ How does runtime of PETSc-GAMG compare with OpenFoam-native-GAMG (the latter used as a preconditioner to be fair)? 2/ Do you see statistics of PETSc-GAMG coarsening printed somewhere? It would be interesting to compare these statistics (in particular the geometric and algebraic complexities) with the statistics of OpenFoam-native-GAMG. The latter can be easily obtained by inserting debug switches in system/controlDict; Quote:
|
||
|
|
|
||
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Performance problems on AMD Epyc cluster | crpvn | Hardware | 3 | February 17, 2020 09:50 |