CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree492Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 13, 2022, 15:29
Default
  #601
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
@naffrancois
The charts appear to be very low resolution, at least on my end.
If that's a problem with the file limits enforced by the forum software, you can upload them to an image sharing site instead.

Quote:
So it looks like genoa is 2x faster than rome on a core for core basis?
Not from a traditional "performance per core" perspective. The cores themselves only moderately faster than previous gen.
Thanks to increasing memory bandwidth by more than 2x, scaling will be better. Whether that actually results in 2x maximum performance for CFD remains to be seen.
I'll probably do a small writeup/buyers guide once I wrapped my head around some of Genoa's intricacies.
flotus1 is offline   Reply With Quote

Old   November 13, 2022, 15:37
Default
  #602
Senior Member
 
Join Date: Oct 2011
Posts: 239
Rep Power: 16
naffrancois is on a distinguished road
@flotus1

Yes it seems there is some compression when attaching them.

Here are the links:

https://ibb.co/MsQh94V
https://ibb.co/GVnbYP5
DVSoares, oswald and Habib-CFD like this.
naffrancois is offline   Reply With Quote

Old   November 13, 2022, 15:55
Default
  #603
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Thanks, I added your files on the bottom of the first post.
At some point, we might want to think about a successor to this thread. But that should be done by someone who knows more about operating OpenFOAM than I do.
naffrancois likes this.
flotus1 is offline   Reply With Quote

Old   November 17, 2022, 08:26
Default
  #604
New Member
 
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 3
hurd is on a distinguished road
Hello, here are the first numbers from my freshly delivered system. I used the old script from the first post. Version 2 is still calculating...


Openfoam Version 10 on Ubuntu in WSL inside Windows 11


Epyc 7373X 16x3.8GHz w/ 8x16GB DDR4-3200 RAM

Code:
# cores   Wall time (s):
------------------------
1           15.8124
2             8.22782
4             5.41716
6             3.86773
8             3.1326
12            2.92129
 16            2.87439
So on a short run I get ~0.35 iterations / sec.

Version 2 fits better with the single core result, but the rest seems to be the same as before - what am I missing?

Code:
# cores   Wall time (s):
------------------------
1 363.166
2 8.34771
4 4.59407
8 3.31006
16 2.943

Last edited by hurd; November 17, 2022 at 09:48.
hurd is offline   Reply With Quote

Old   November 17, 2022, 09:44
Default
  #605
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Which exact version of OpenFOAM are you trying to run?
There are 3 scripts, two of which I added under "Moderator note" in the first post for more recent versions. Each of them is supposed to work ootb with different versions.

And check the logs for error messages. Something isn't right here. Your CPU is certainly fast, but not that fast
flotus1 is offline   Reply With Quote

Old   November 17, 2022, 10:00
Default
  #606
New Member
 
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 3
hurd is on a distinguished road
Edited my post to add the info that I use openfoam 10, I now found the 3rd version of the script, which seems to be what everybody else used, it won't run (yet), but let's see.
hurd is offline   Reply With Quote

Old   November 17, 2022, 11:39
Default
  #607
New Member
 
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 3
hurd is on a distinguished road
Sorry for being incompetent with openFoam, I wanted to use it as a benchmark and to share the knowledge as the 7373X seems to be a new data point in the list.


Is there some kind of output file that I can run a hash on to see if the end-result after 100 iterations is correct?


For reference here is the final step in the log.simpleFoam of the plausible run (363s single core run):
Code:
smoothSolver:  Solving for Ux, Initial residual = 0.00119047, Final residual = 0.000102912, No Iterations 9
smoothSolver:  Solving for Uy, Initial residual = 0.022928, Final residual = 0.00183307, No Iterations 9
smoothSolver:  Solving for Uz, Initial residual = 0.0198999, Final residual = 0.00164319, No Iterations 9
GAMG:  Solving for p, Initial residual = 0.00900837, Final residual = 8.37447e-05, No Iterations 4
time step continuity errors : sum local = 0.000120813, global = -2.75464e-06, cumulative = -0.000127849
smoothSolver:  Solving for omega, Initial residual = 0.00019487, Final residual = 1.44885e-05, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.00192524, Final residual = 0.000173624, No Iterations 3
ExecutionTime = 363.166 s  ClockTime = 364 s

End
And here from one of the "superfast" runs (2.9s 16core run):
Code:
smoothSolver:  Solving for Ux, Initial residual = 0.549225, Final residual = 0.297086, No Iterations 1000
smoothSolver:  Solving for Uy, Initial residual = 0.466708, Final residual = 0.0464898, No Iterations 1
smoothSolver:  Solving for Uz, Initial residual = 0.44934, Final residual = 0.0430763, No Iterations 1
GAMG:  Solving for p, Initial residual = 0.0585246, Final residual = 0.000331004, No Iterations 2
time step continuity errors : sum local = 3.17385e-15, global = 8.78889e-18, cumulative = 5.96915e-16
smoothSolver:  Solving for omega, Initial residual = 7.92638e-09, Final residual = 7.92638e-09, No Iterations 0
smoothSolver:  Solving for k, Initial residual = 6.04358e-09, Final residual = 6.04358e-09, No Iterations 0
ExecutionTime = 2.943 s  ClockTime = 3 s

End

Finalising parallel run
hurd is offline   Reply With Quote

Old   November 17, 2022, 14:57
Default
  #608
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
We will have to wait for someone more knowledgeable with OpenFOAM to get to the bottom of this. In the meantime, you should upload the log files. Not only from the solver run, but especially from the meshing stage.

On the part of showing off with a brand new toy: I'm all about that. But you lose a lot of performance from WSL. If you want impressive numbers, you will have to use Linux natively.
flotus1 is offline   Reply With Quote

Old   November 21, 2022, 02:27
Default
  #609
Member
 
Join Date: Sep 2010
Location: Leipzig, Germany
Posts: 93
Rep Power: 15
oswald is on a distinguished road
@hurd: Could you please upload the complete (compressed) logfile for one of the superfast runs or at least the complete output for the last time step?
oswald is offline   Reply With Quote

Old   November 21, 2022, 11:42
Default
  #610
New Member
 
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 3
hurd is on a distinguished road
Thank you for your help, I think I solved it by using the openfoam-dev package from openfoam.org


Now the script from the bench_template_v2 archive runs and I get these results (still using WSL Ubuntu 22.04 on a Win 11 OS)

Epyc 7373X 16x3.8GHz w/ 8x16GB DDR4-3200 RAM
Code:
#cores time[s]    inverse[it/s]
1       398.798   0.251 
2       195.208   0.512
4       107.312   0.932
6        73.4123  1.362
8        56.0352  1.785
10       45.3033  2.207
12       39.625   2.524
14       38.8043  2.577
 16       34.4127  2.906

oswald, wkernkamp and Crowdion like this.
hurd is offline   Reply With Quote

Old   December 3, 2022, 04:58
Default
  #611
New Member
 
Richard Moser
Join Date: Aug 2009
Posts: 28
Rep Power: 16
moser_r is on a distinguished road
Quote:
Originally Posted by flotus1 View Post

But the usual disclaimer still applies: openbenchmarking.org
I don't trust the OpenFOAM numbers there.
Could you elaborate on this a little please (apologies if it has already been discussed earlier in the thread)? What causes you to not trust the numbers on openbenchmarking.org
moser_r is offline   Reply With Quote

Old   December 3, 2022, 05:14
Default
  #612
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Not in this thread right here, but the topic came up from time to time.
Relative positions of CPUs make no sense, benchmark numbers are reported for systems that should not have enough memory to run it, and the "variance" in results is impossibly low. Just to name a few of the issues.
Personally, I consider this benchmark pretty much useless. And it actively does harm because it is so prevalent if you search for OpenFOAM benchmarks.

5800X3D - The new budget king of CFD?
Thoughts on Openbenchmarking.org
flotus1 is offline   Reply With Quote

Old   December 3, 2022, 05:45
Default
  #613
New Member
 
Richard Moser
Join Date: Aug 2009
Posts: 28
Rep Power: 16
moser_r is on a distinguished road
Thanks for coming back so quickly. I can understand your points. It is disappointing, as a good benchmark comparison would be very useful for me at the moment as I am deciding on some new hardware which is specifically for OpenFOAM.
moser_r is offline   Reply With Quote

Old   December 19, 2022, 14:48
Default
  #614
New Member
 
Join Date: Dec 2022
Posts: 1
Rep Power: 0
HenSim is on a distinguished road
Quote:
Originally Posted by hurd View Post
Thank you for your help, I think I solved it by using the openfoam-dev package from openfoam.org


Now the script from the bench_template_v2 archive runs and I get these results (still using WSL Ubuntu 22.04 on a Win 11 OS)

Epyc 7373X 16x3.8GHz w/ 8x16GB DDR4-3200 RAM
Code:
#cores time[s]    inverse[it/s]
1       398.798   0.251 
2       195.208   0.512
4       107.312   0.932
6        73.4123  1.362
8        56.0352  1.785
10       45.3033  2.207
12       39.625   2.524
14       38.8043  2.577
 16       34.4127  2.906


These results seem better than others posted before in terms of scaling, don't they? Which version of WSL did you use? WSL2?
HenSim is offline   Reply With Quote

Old   January 4, 2023, 17:33
Default Xeon Max vs EPYC 7773X
  #615
Member
 
dab bence
Join Date: Mar 2013
Posts: 47
Rep Power: 13
danbence is on a distinguished road
Intel have released a slide and config data for an OpenFoam comparison between the Xeon MAX with HBM2 compared to EPYC 7773X

This is the slide claiming 2.5x speed up. Interesting that Fluent is only 1.2x

http://www.nextplatform.com/wp-conte...erformance.jpg

The test setup was also published there..

https://edc.intel.com/content/www/us...rcomputing-22/

which is...

AMD EPYC 7773X: Test by Intel as of 9/2/2022. 1-node, 2x AMD EPYC HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations
linuxguy123 likes this.
danbence is offline   Reply With Quote

Old   January 5, 2023, 21:48
Default
  #616
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 42
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
I guess the HBM is integrated with CPU or at least with MB and not sufficiently large as DDR5, so the speedup might be influenced by grid size.

Quote:
Originally Posted by danbence View Post
Intel have released a slide and config data for an OpenFoam comparison between the Xeon MAX with HBM2 compared to EPYC 7773X

This is the slide claiming 2.5x speed up. Interesting that Fluent is only 1.2x

http://www.nextplatform.com/wp-conte...erformance.jpg

The test setup was also published there..

https://edc.intel.com/content/www/us...rcomputing-22/

which is...

AMD EPYC 7773X: Test by Intel as of 9/2/2022. 1-node, 2x AMD EPYC HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations
aparangement is offline   Reply With Quote

Old   January 6, 2023, 19:45
Default
  #617
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
The selected EPYC is not the latest 9654 "Genua", 12 channel high Cache cpu.
wkernkamp is offline   Reply With Quote

Old   January 10, 2023, 10:40
Default
  #618
New Member
 
Eduardo
Join Date: Feb 2019
Posts: 9
Rep Power: 7
ERodriguez is on a distinguished road
Hello,
I am facing some troubles with the performance of OpenFOAM in my machine. These are the details of my setup

Quote:
1x AMD EPYC 7742 64-Core processor
8x32 GB (=256) GB RAM DDR4 3200
Debian 10.0
Kernel 5.10.0-10-amd64
2 NUMA nodes
OpenFOAM v2106

I have downloaded the case ‘bench_template_v02.zip’ from the first post (I had to do just a tiny modification to substitute ‘surfaceFeatures’ by ‘surfaceFeatureExtract’ since the former was only introduced in OpenFOAM v2112). Other than this, the case is the same.

My setting looks rather similar to Yannick’s one (same processor) reported in THIS post (only couple of months ago). The only difference is he has 2xEpyc7742 whilst I only have one.

Quote:
Originally Posted by ym92 View Post
Not much difference for dual 64-core setup compared to 64 cores on Epyc Rome (as can be expected). Wanted to post these results already for a while but better late than never..

Hardware: 2x AMD Epyc 7742, 16x32GB DDR4-3200
Software: Ubuntu 20.04, OpenFOAM v1812
I have two main questions:

(1) Regarding single-core time, ym92 reports 936.35s. However, my case runs in 598.44s. This implies about 1/3 faster. I assume this may be caused by the ‘high performance’ settings that we apply to our machine. However, just to ensure (and hoping that Yannick sees this post) I paste here the last few lines of my log file and the mesh count (given by checkMesh) in order to fully ensure the compared cases are the same:
Log file:
Code:
Time = 99

smoothSolver:  Solving for Ux, Initial residual = 0.000910672, Final residual = 7.07286e-05, No Iterations 9
smoothSolver:  Solving for Uy, Initial residual = 0.0219921, Final residual = 0.00194097, No Iterations 8
smoothSolver:  Solving for Uz, Initial residual = 0.0192765, Final residual = 0.00173756, No Iterations 8
GAMG:  Solving for p, Initial residual = 0.0107949, Final residual = 8.33928e-05, No Iterations 4
time step continuity errors : sum local = 0.000124784, global = 6.76048e-06, cumulative = -0.000372448
smoothSolver:  Solving for omega, Initial residual = 0.000140106, Final residual = 1.01603e-05, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.00179223, Final residual = 0.000172271, No Iterations 3
ExecutionTime = 592.51 s  ClockTime = 592 s

Time = 100

smoothSolver:  Solving for Ux, Initial residual = 0.000897164, Final residual = 6.96645e-05, No Iterations 9
smoothSolver:  Solving for Uy, Initial residual = 0.0215208, Final residual = 0.00191335, No Iterations 8
smoothSolver:  Solving for Uz, Initial residual = 0.0188435, Final residual = 0.00171037, No Iterations 8
GAMG:  Solving for p, Initial residual = 0.0106305, Final residual = 8.19107e-05, No Iterations 4
time step continuity errors : sum local = 0.000122673, global = 6.80292e-06, cumulative = -0.000365645
smoothSolver:  Solving for omega, Initial residual = 0.000139402, Final residual = 1.01096e-05, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.00176476, Final residual = 0.000169463, No Iterations 3
ExecutionTime = 598.44 s  ClockTime = 598 s

End
Check Mesh:
Code:
Mesh stats
    points:           2113393
    faces:            5877894
    internal faces:   5691855
    cells:            1893343
    faces per cell:   6.11075
    boundary patches: 72
    point zones:      0
    face zones:       0
    cell zones:       0

Overall number of cells of each type:
    hexahedra:     1704507
    prisms:        30021
    wedges:        4131
    pyramids:      4
    tet wedges:    5828
    tetrahedra:    294
    polyhedra:     148558
    Breakdown of polyhedra by number of faces:
        faces   number of cells
            4   15702
            5   24762
            6   22859
            7   14956
            8   6138
            9   44228
           10   256
           11   77
           12   10929
           13   73
           14   54
           15   7331
           16   9
           17   11
           18   1167
           21   6

Checking topology...
    Boundary definition OK.
    Cell to face addressing OK.
    Point usage OK.
    Upper triangular ordering OK.
    Face vertices OK.
    Number of regions: 1 (OK).
(2) Whilst the first one is good news, my second question is not so good news. The scalability test is way off. I quote Yannick’s one for reference:
Quote:
Originally Posted by ym92 View Post
# cores Wall time (s):
------------------------
1 | 936.35
2 | 521.5
4 | 236.56
6 | 158.72
8 | 120.83
12 | 77.94
16 | 57.41
20 | 46.4
24 | 39.23
48 | 22.79
Which compares with mine:
Quote:
cores , time (s)
01 , 598.44
02 , 381.79
04 , 147.92
06 , 101.07
08 , 79.1
12 , 56.41
16 , 44.01
24 , 40.01
32 , 34.74
40 , 34.92
48 , 33.3
56 , 33.74
64 , 33.22
I have also compiled them both in a single plot (png attached) because one image is worth a thousand words. With the current setting, the case scales rather well up to about 10 cores and then stops scaling at all for arbitrarily large number of cores.

In order to improve these results, I have tried the following changes:
  • Use option –-bind-to none for mpirun
  • Use option –-bind-to socket for mpirun
  • Disable (in the BIOS) the hyperthreading (disable SMT)
  • Setting the number of NUMA nodes to default (single node for all cores)
  • Setting the number of NUMA nodes to 4

NONE of these changes did a significant change. There were only minimum variations barely significant between the different runs.

My question is: is this normal? Is there any other setting that we could try and magically improve our scalability curve up to more decent values?

Any possibility is welcomed and we are happy to perform other tests or provide more information if needed.

Thank you for all your help

Best regards
Attached Images
File Type: png OpenFOAM_scalabilit_Epyc7742.png (14.6 KB, 34 views)
batdan likes this.
ERodriguez is offline   Reply With Quote

Old   January 11, 2023, 05:40
Default
  #619
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Part of the reason you see worse scaling with your system is the faster single-core time you got. For a more intuitive comparison, I would recommend scaling both results by the same single-core value.
There are reasons for this large difference in single-core performance, but we don't need to get into that. Your result is good, and indicates decent FP optimizations at work. Which don't apply at high tread count, where the workload becomes bound by memory bandwidth.

Speaking of memory bandwidth: that's what ultimately limits scaling on your single 64-core CPU. You are comparing against two CPUs, which have twice the amount of shared CPU resources. Memory bandwidth and last level cache being two of them.
Your peak performance of 33s doesn't seem too far off.

For best performance, these are the settings I would recommend:
SMT off
NPS=4
cleared caches before each run using "echo 3 > /proc/sys/vm/drop_caches" as root
and then run the simulation with
mpirun -np 64 --bind-to core --rank-by core --map-by numa

It won't change results drastically though. It's still one CPU against two.
flotus1 is offline   Reply With Quote

Old   January 11, 2023, 06:21
Default
  #620
New Member
 
Yannick
Join Date: May 2018
Posts: 12
Rep Power: 7
ym92 is on a distinguished road
I fully agree with flotus1. Actually when you would use "number of cores used/total number of cores available" on the horizontal axis, our results would probably look very similar. Curve is almost flat for using more than ~50% of the cores.


Not sure why the results for single core is so different, but I might not have used adequate settings. At least I am sure I did not use core binding (which might be a good idea to bind cores to one cpu for around 2-10? cores).
ym92 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 04:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 06:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 09:54


All times are GMT -4. The time now is 08:33.