CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree574Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 17, 2025, 15:38
Default
  #841
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,430
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
A slight over-simplification for why more cache=more better:

When an execution unit needs data to perform a calculation, the cache hierarchy is searched first for that data. L1 -> L2 -> L3 -> RAM. Because the latency increases the further down we go this chain.
Larger caches mean a higher probability that the data already resides in one of the caches, thus doesn't have to come from RAM. This decreases latency, and also frees up precious memory bandwidth for other operations and cores.
flotus1 is offline   Reply With Quote

Old   February 23, 2025, 16:48
Default
  #842
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 566
Rep Power: 21
JBeilke is on a distinguished road
Quote:
Originally Posted by aparangement View Post
But this is true especially during the time of zen2, a lot of posts back then in this tread show that L3=256MB (e.g. epyc 7532) is about 30% faster than their L3=128MB varient, if other specs are almost the same.

I don't know where your 30% come from. I went from 128MB to 256MB cache with Epyc2 and there are maybe about 3% difference for the same core count but not more.
JBeilke is offline   Reply With Quote

Old   March 25, 2025, 20:09
Default
  #843
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 381
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by JBeilke View Post
I don't know where your 30% come from. I went from 128MB to 256MB cache with Epyc2 and there are maybe about 3% difference for the same core count but not more.
Could you give some information on your problem: I.e. number of cells, shape of domain and memory configurations. That would be of interest, because flotus is right that we see a clear benefit from large caches as it reduces the demand on the memory channels.
wkernkamp is offline   Reply With Quote

Old   March 26, 2025, 02:24
Default
  #844
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 566
Rep Power: 21
JBeilke is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
Could you give some information on your problem: I.e. number of cells, shape of domain and memory configurations. That would be of interest, because flotus is right that we see a clear benefit from large caches as it reduces the demand on the memory channels.

There is no problem at all. Everything works as expected. I would just like to know where the 30% is mentioned. In any case, I can't remember reading such a number.
JBeilke is offline   Reply With Quote

Old   March 26, 2025, 04:47
Default
  #845
Senior Member
 
andy
Join Date: May 2009
Posts: 347
Rep Power: 18
andy_ is on a distinguished road
Quote:
Originally Posted by JBeilke View Post
There is no problem at all. Everything works as expected. I would just like to know where the 30% is mentioned. In any case, I can't remember reading such a number.
#838 shows a cache effect greater than 30% for this benchmark.
andy_ is offline   Reply With Quote

Old   March 26, 2025, 05:30
Default
  #846
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 566
Rep Power: 21
JBeilke is on a distinguished road
Quote:
Originally Posted by andy_ View Post
#838 shows a cache effect greater than 30% for this benchmark.

No way. The result there comes from:

"Results for OpenFOAM 9 on a dual EPYC 9684X with 4800 MHz DDR5 RAM"

#793
JBeilke is offline   Reply With Quote

Old   March 27, 2025, 20:59
Default
  #847
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 48
Rep Power: 13
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
Quote:
Originally Posted by JBeilke View Post
I don't know where your 30% come from. I went from 128MB to 256MB cache with Epyc2 and there are maybe about 3% difference for the same core count but not more.
Just search for 7532 and 7542 in this thread, there are several posts, e.g.

OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


7532 finishes the benchmark run as fast as 16s (my personal test is 18s), whereas 7542 (128MB L3 but slightly higher freq?) is around 22s or even lower.

I do remember that there are other similar models (simiar setup but different L3), if you are really interest you could skim this thread.
aparangement is offline   Reply With Quote

Old   March 27, 2025, 21:02
Default
  #848
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 48
Rep Power: 13
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
This is also true for desktop cpus (ast least for zen and zen2), ryzen 2200G is much slower than 1500X.

Quote:
Originally Posted by aparangement View Post
Just search for 7532 and 7542 in this thread, there are several posts, e.g.

OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


OpenFOAM benchmarks on various hardware


7532 finishes the benchmark run as fast as 16s (my personal test is 18s), whereas 7542 (128MB L3 but slightly higher freq?) is around 22s or even lower.

I do remember that there are other similar models (simiar setup but different L3), if you are really interest you could skim this thread.
aparangement is offline   Reply With Quote

Old   March 28, 2025, 04:55
Default
  #849
Super Moderator
 
bigphil's Avatar
 
Philip Cardiff
Join Date: Mar 2009
Location: Dublin, Ireland
Posts: 1,104
Rep Power: 35
bigphil will become famous soon enoughbigphil will become famous soon enough
FYI:

The 1st OpenFOAM HPC Challenge (OHC-1) at the upcoming OpenFOAM Workshop may be of interest to people here. I expect they will publicly share their results, which will be interesting.
Simbelmynė, flotus1 and wkernkamp like this.
bigphil is offline   Reply With Quote

Old   March 28, 2025, 17:19
Default
  #850
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 381
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by bigphil View Post
FYI:

The 1st OpenFOAM HPC Challenge (OHC-1) at the upcoming OpenFOAM Workshop may be of interest to people here. I expect they will publicly share their results, which will be interesting.
Looks like my dual E5-2696v2 128GB ram server does an iteration for every 36 seconds on the smallest, 67M test grid. I am running OF 2212 compiled with SPDP, so matrix solve in double precision rest single precision. That is maybe 30% faster than all double precision.
bigphil likes this.
wkernkamp is offline   Reply With Quote

Old   March 29, 2025, 14:48
Default mixed precision SPDP option
  #851
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 381
Rep Power: 14
wkernkamp is on a distinguished road
I ran the benchmark with the normal double precision DP compile of OF v2212 and the mixed precision SPDP compilation option (in etc/bashrc). The system is a dual E5-2696v2 server with 128GB of DDR3-1866 with eight memory channels total.


The flow calculation is much faster up to 36% for all core counts run. However, the mesh generation is slower by 20%.



Double Precision DP

Code:
Meshing Times:1 1553.65
2 1011.88
4 574.82
8 344.48
12 260.31
16 232.71
20 197.04
24 183.96
Flow Calculation:
1 981.95
2 511.16
4 233.53
8 130.49
12 103.84
16 92.15
20 87.93
24 87.1
Mixed Precision (SPDP)
Code:
Meshing Times:
1 1403.45
2 998.06
4 560.6
8 367.34
12 305.32
16 263.96
20 238.15
24 225.01
Flow Calculation:
1 640.32
2 378.16
4 173.62
8 97.13
12 71.99
16 61.06
20 56.09
24 55.23
oswald and Crowdion like this.
wkernkamp is offline   Reply With Quote

Old   April 2, 2025, 09:48
Default
  #852
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 563
Rep Power: 16
Simbelmynė is on a distinguished road
Mac Studio M4 Max, 16c CPU 40c GPU, 64 GB RAM

MacOS Sequoia 15.4

OpenFOAM v2412, Ubuntu 24.04 ARM running under OrbStack

Using gumersindu's updated version from post 808.


Code:
cores  MeshTime(s)     RunTime(s)     
-----------------------------------
12     88.9            34.52
Edit: With Gerlero's Apple Silicon native OpenFOAM v2412 I get:

Code:
cores  MeshTime(s)     RunTime(s)     
-----------------------------------
12     97.32           37.75
A bit perplexing that the VM is slightly faster.
bigphil, wkernkamp and Crowdion like this.

Last edited by Simbelmynė; April 3, 2025 at 01:40.
Simbelmynė is offline   Reply With Quote

Old   April 6, 2025, 07:48
Default
  #853
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 566
Rep Power: 21
JBeilke is on a distinguished road
Quote:
Originally Posted by bigphil View Post
FYI:

The 1st OpenFOAM HPC Challenge (OHC-1) at the upcoming OpenFOAM Workshop may be of interest to people here. I expect they will publicly share their results, which will be interesting.

A short comparison between OF and Wildkatze regarding the time up to the first iteration.

https://t.me/wildkatze_cfd/39

# Update

Some values (iteration time and drag) for the 110 million cell case:

https://t.me/wildkatze_cfd/40

Last edited by JBeilke; April 10, 2025 at 08:37.
JBeilke is offline   Reply With Quote

Old   April 10, 2025, 08:34
Default
  #854
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 566
Rep Power: 21
JBeilke is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
Looks like my dual E5-2696v2 128GB ram server does an iteration for every 36 seconds on the smallest, 67M test grid. I am running OF 2212 compiled with SPDP, so matrix solve in double precision rest single precision. That is maybe 30% faster than all double precision.

It would be interesting to see how well the drag coefficients for double precision and mixed precision match up.
JBeilke is offline   Reply With Quote

Old   April 13, 2025, 09:42
Default
  #855
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 381
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by JBeilke View Post
It would be interesting to see how well the drag coefficients for double precision and mixed precision match up.
The results below represent the final drag tally: i.e. Total, Pressure and Viscous drag. The first listing contains the mixed code results (single/double precision), and the second listing the double precision code results.


Code:
With SPDP:
8 Cd: 0.415865 0.398147 0.0177189 0
12 Cd: 0.414603 0.396965 0.0176377 0
16 Cd: 0.409216 0.39174 0.0174758 0
20 Cd: 0.406088 0.388682 0.0174062 0
24 Cd: 0.415135 0.397779 0.0173551 0

With DP:
8 Cd: 0.41088 0.393231 0.0176492 0
12 Cd: 0.409777 0.392361 0.0174158 0
16 Cd: 0.403686 0.386233 0.0174535 0
20 Cd: 0.408543 0.391123 0.0174206 0
24 Cd: 0.413563 0.39623 0.0173339 0
For the 100 iteration benchmarking runs, the differences are small and of the same order as the differences for various core counts. The solutions were obtained on the same snappyHexMesh generated by the SPDP code.
wkernkamp is offline   Reply With Quote

Old   April 15, 2025, 02:14
Default
  #856
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 566
Rep Power: 21
JBeilke is on a distinguished road
Thank you very much Will. I had to think quickly about which benchmark you were using -- DrivAer or motorBike. But a cw of 0.4 only fits the motorBike :-)

Maybe one of the moderators can move all posts about DrivAer to a separate thread.

Whether a deviation of one percent is a lot or a small amount probably depends on the situation. However, I was not really aware that the domain decomposition or the number of domains can have such an influence on the result.

If I try to optimize a geometry and are already happy about a half percent improvement, a deviation of one percent as a result of domain decomposition is a medium-sized disaster.
JBeilke is offline   Reply With Quote

Old   April 15, 2025, 16:51
Default
  #857
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 381
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by JBeilke View Post
Thank you very much Will. I had to think quickly about which benchmark you were using -- DrivAer or motorBike. But a cw of 0.4 only fits the motorBike :-)

Maybe one of the moderators can move all posts about DrivAer to a separate thread.

Whether a deviation of one percent is a lot or a small amount probably depends on the situation. However, I was not really aware that the domain decomposition or the number of domains can have such an influence on the result.

If I try to optimize a geometry and are already happy about a half percent improvement, a deviation of one percent as a result of domain decomposition is a medium-sized disaster.
You are right that the results were for the motorBike. The motorBike is not very well streamlined. This causes the answer to oscillate a bit. The oscillation is timed differently for each of these runs. Don't think there is a meaningful difference in the answer between them. (I ran the 24 core results for SPDP and DP out to 1000 steps and Cd does not stabilize like an airplane would.)
wkernkamp is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 04:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 06:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 09:54


All times are GMT -4. The time now is 14:18.