CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree495Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 1, 2022, 16:46
Default
  #481
New Member
 
Roland Siemons
Join Date: Mar 2021
Posts: 13
Rep Power: 5
RolandS is on a distinguished road
Quote:
Originally Posted by Simbelmynė View Post
As far as I can tell there is no difference between the basecase and the run_* decomposeParDict?


In that case it seems that the sed command fails. You should have scotch not hierarchical after the sed command.


The only suggestion I have left is to also look at the log.* files. In particular log.decomposePar, and try to find out why you execute the commands several times for each case.
sed command does do its job correctly (changed basecase to run cases).

Logs say:
Code:
decomposePar: error while loading shared libraries: libfiniteArea.so: cannot open shared object file: No such file or directory
Code:
blockMesh: error while loading shared libraries: libblockMesh.so: cannot open shared object file: No such file or directory
Code:
surfaceFeatureExtract: error while loading shared libraries: libfileFormats.so: cannot open shared object file: No such file or directory
Code:
simpleFoam: error while loading shared libraries: libfiniteVolume.so: cannot open shared object file: No such file or directory
Code:
snappyHexMesh: error while loading shared libraries: libfiniteVolume.so: cannot open shared object file: No such file or directory
However, such lib files ARE present in my OF install. Here:
/lib/openfoam/openfoam2012/platforms/linux64GccDPInt32Opt/lib/libfiniteArea.so
/lib/openfoam/openfoam2012/platforms/linux64GccDPInt32Opt/lib/libblockMesh.so
/lib/openfoam/openfoam2012/platforms/linux64GccDPInt32Opt/lib/libfileFormats.so
/lib/openfoam/openfoam2012/platforms/linux64GccDPInt32Opt/lib/libfiniteVolume.so

Could it be a PATH issue?

(Am investigating that further) Perhaps you do have a suggestion.


Greetz, R
RolandS is offline   Reply With Quote

Old   March 2, 2022, 12:23
Default
  #482
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 548
Rep Power: 15
Simbelmynė is on a distinguished road
Quote:
Originally Posted by RolandS View Post
sed command does do its job correctly (changed basecase to run cases).

In post #477 you show a dict file that is identical to the basecase file. This should not be the case after the sed command. You should have "method scotch;" in the decomposeParDict file in your run_* folder.



Quote:
Originally Posted by RolandS View Post
Could it be a PATH issue?

Yes perhaps. Not sure how openfoam.com installs. Some years back it was in a Docker container if I remember correctly.
Simbelmynė is offline   Reply With Quote

Old   March 2, 2022, 23:05
Default
  #483
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 339
Rep Power: 12
wkernkamp is on a distinguished road
Roland, the messages say that a mesh was already generated. Could it be that you ran it in the basecase directory for some reason? When you copy the basecase, these meshing files would be copied along.
wkernkamp is offline   Reply With Quote

Old   March 3, 2022, 22:21
Default opteron overclock on H8QG6 Motherboard
  #484
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 339
Rep Power: 12
wkernkamp is on a distinguished road
This is still the same supermicro opteron server with the H8QG6-F Motherboard, 32x8Gb DDR3-1600 single rank, for 4x Opteron 6376.

I have found since the last post that the On Demand governor yields better results, because the opterons turbo higher when some cores are idling. Furthermore, I made some changes to the default openmpi process placement for np=2,12,24 and 48. The default tended to place processes together on adjacent integer cores. These cores share a single FPU, but also cache, so for cache this is good, but for openfoam it is not. (The difference is ~45% for the 2 core case.)


The baseline result before Overclock is:
1 2161.03
2 1045.07
4 506.82
8 249.7
12 193.92
16 145.46
24 110.93
32 93.86
48 87.21
64 85.53

After overclock using a motherboard base clock of 240 MHz instead of 200 MHz, the results are:
1 2112.27
2 1026.49
4 492.64
8 241.08
12 183.19
16 134.26
24 100.11
32 84.72
48 82.74
64 79.54

This overclock was accomplished with the OCNG5.3 BIOS. It is easy t do. Follow instructions here: https://hardforum.com/threads/ocng5-...forms.1836265/

The temperatures did not go high, so the board can still be clocked higher. The ram can also be overclocked. I will try 1866 MHz. In the past the execution time was about inversely proportional to RAM speed.
masb likes this.
wkernkamp is offline   Reply With Quote

Old   March 29, 2022, 12:35
Default
  #485
Member
 
Ron Burnett
Join Date: Feb 2013
Posts: 42
Rep Power: 13
rnburne is on a distinguished road
HP DL560 G8
4x E5-4610v2
16x 8Gb 2Rx4 PC3L-12800R
Ubuntu 20.04 and OF8

Code:
   cores      snappy      simulation    simulation power
     1         37:21         1253              
     4         13:07          268             290
     8          7:55          143             325
    12          6:22          103             355
    16          6:04           82             390
    24          4:31           64             450
    28          4:03           59             480
    32          3:58           56             510
These older servers seem to perform well considering their cost....
this one was $520 shipped. And, while this seems satisfying at the moment,
there's always the question of finding a few more horsepower.
Any suggestions?
rnburne is offline   Reply With Quote

Old   March 29, 2022, 23:59
Default
  #486
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 339
Rep Power: 12
wkernkamp is on a distinguished road
You can invest in PC3-14900R 1866 MHz memory. Your time will reduce by nearly a factor of 1600/1866
wkernkamp is offline   Reply With Quote

Old   March 30, 2022, 00:13
Default
  #487
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 339
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by rnburne View Post
HP DL560 G8
4x E5-4610v2
16x 8Gb 2Rx4 PC3L-12800R
Ubuntu 20.04 and OF8


These older servers seem to perform well considering their cost....
this one was $520 shipped. And, while this seems satisfying at the moment,
there's always the question of finding a few more horsepower.
Any suggestions?

You can increase the memory speed to 1866 MHz if you replace CPU's with either one of these E5 v2 CPU's:


E5-4627v2, 4640v2, 4650v2, 4624Lv2 and 4657Lv2.


I have several servers with dual E5-4627v2. They do the benchmark in 100 sec (Using 1866 Memory speed.) I think it is 116 seconds with 1600 MHz. On one of them I force PC3L-1333 MHz memory to 1866 MHz and 1.5V without a problem. I think you might dip below 50 seconds if you do that.
wkernkamp is offline   Reply With Quote

Old   March 30, 2022, 15:07
Default AMD EPYC 7543 Performance
  #488
New Member
 
Gauteng
Join Date: Jul 2020
Posts: 6
Rep Power: 5
fromanza is on a distinguished road
Out the box Supermicro Desktop Tower System from Boston
Dual 32Core AMD EPYC 7543 - 256GB 3200 RAM - Running WSL2 in Windows 10 - OpenFOAM 9 - SMT disabled


Prepare case run_16...
Running surfaceFeatures on /mnt/e/BENCHMARK/run_16
Running blockMesh on /mnt/e/BENCHMARK/run_16
Running decomposePar on /mnt/e/BENCHMARK/run_16
Running snappyHexMesh in parallel on /mnt/e/BENCHMARK/run_16 using 16 processes

real 6m0.637s
user 65m32.943s
sys 1m45.140s
Prepare case run_32...
Running surfaceFeatures on /mnt/e/BENCHMARK/run_32
Running blockMesh on /mnt/e/BENCHMARK/run_32
Running decomposePar on /mnt/e/BENCHMARK/run_32
Running snappyHexMesh in parallel on /mnt/e/BENCHMARK/run_32 using 32 processes

real 6m50.431s
user 95m35.497s
sys 3m45.442s
Prepare case run_64...
Running surfaceFeatures on /mnt/e/BENCHMARK/run_64
Running blockMesh on /mnt/e/BENCHMARK/run_64
Running decomposePar on /mnt/e/BENCHMARK/run_64
Running snappyHexMesh in parallel on /mnt/e/BENCHMARK/run_64 using 64 processes

real 12m41.292s
user 268m15.494s
sys 8m57.240s
Run for 16...
Run for 32...
Run for 64...
# cores Wall time (s):
------------------------
16 42.51
32 31.18
64 25.99
Attached Files
File Type: zip BENCHMARK.zip (20.6 KB, 6 views)
fromanza is offline   Reply With Quote

Old   March 30, 2022, 18:44
Default
  #489
Member
 
Ron Burnett
Join Date: Feb 2013
Posts: 42
Rep Power: 13
rnburne is on a distinguished road
Quote:
You can increase the memory speed to 1866 MHz if you replace CPU's with either one of these E5 v2 CPU's:

E5-4627v2, 4640v2, 4650v2, 4624Lv2 and 4657Lv2.

I have several servers with dual E5-4627v2. They do the benchmark in 100 sec (Using 1866 Memory speed.) I think it is 116 seconds with 1600 MHz. On one of them I force PC3L-1333 MHz memory to 1866 MHz and 1.5V without a problem. I think you might dip below 50 seconds if you do that.
The 4610's were dirt cheap but other processors were also considered. For an additional 10% boost in performance, the added cost would have been roughly 30%. Since this machine will only see occasional use, I couldn't justify it.
rnburne is offline   Reply With Quote

Old   March 31, 2022, 08:34
Default 2x Intel Xeon 8173M 56 cores with 12*16GB DDR4 2666 RECC memory
  #490
New Member
 
CH Xu
Join Date: Jan 2013
Posts: 6
Rep Power: 13
neytirilover is on a distinguished road
Here's my results for 2x Intel Xeon Platinum 8173M 56 cores with 12*16GB DDR4 2666 RECC memory on a Supermicro motherboard:

Code:
# cores   Wall time (s):
------------------------
1            1090.33
2            547.38
4            250.41
6            160.19
12          85.7
16          66.51
20          56.2
24          49.75
48          35.44
56          33.84
flotus1, wkernkamp and Crowdion like this.
neytirilover is offline   Reply With Quote

Old   April 16, 2022, 11:54
Default
  #491
New Member
 
Alexander Kazantcev
Join Date: Sep 2019
Posts: 23
Rep Power: 6
AlexKaz is on a distinguished road
Quote:
Originally Posted by AlexKaz View Post
Xeon Silver 4314 Ice Lake-SP Scalable 3rd gen 10nm 2900MHz all cores, RAM 2666 8 dimms 8 channels, NUMA on, HT on (with off will be the same), bios power profile "Power (save)"

OpenFOAM v1806, openmpi 2.1.3, Puppy Linux Fossa mitigations = off (with on result ~ the same)

flow | mesh
1 649.63 16m40.79s
2 369.84 12m3.332s
4 204 7m5.208s
6 148.87 5m17.678s
8 123.86 4m23.601s
12 99.57 3m41.038s
16 85.1 3m18.140s
20 94.18 4m20.113s
24 89.79 3m37.573
28 86.99 3m49.439s
32 84.81 3m50.380s
Dual Xeon Silver 4314 16 dimms at 16 channels 2666
without tuning at bios
Supermicro X12 DDW A6
the same OpenFOAM version
flow mesh
1 817.35 19m22,924s
4 204.79 7m10,502s
8 108.29 4m19,569s
12 80.9 3m34,434s
16 65.61 3m6,958s
24 50.72 2m23,219s
32 43.84 2m14,769s
40 48.17 3m22,607s
48 45.78 3m28,241s
56 unknown 2m31,867s
60 43.02 2m57,801s
flotus1 likes this.
AlexKaz is offline   Reply With Quote

Old   April 26, 2022, 12:02
Default
  #492
n10
New Member
 
Join Date: Jun 2019
Posts: 1
Rep Power: 0
n10 is on a distinguished road
Hi all,

I ran the benchmark on a few configurations I could get my hands on:

Ryzen Threadripper 3970X (32cores), 4x16GB DDR4 2133MHz, Gigabyte TRX40 AORUS PRO WIFI
Code:
# cores   Wall time (s):
------------------------
1         627.58
2         369.89
4         188.68
8         118.69
12        113.76
16        106.68
20        110.7
24        107.11
30        108.41
32        107.71

2x EPYC 7302 (2x16cores), 16x16 DDR4 3200MHz, Supermicro H11DSi-NT Rev2
Code:
# cores   Wall time (s):
------------------------
1         738.07
2         390.51
4         169.63
6         107.2
8         80.55
12        54.59
16        41
24        33.98
32        28.03
2x EPYC 7542 (2x32cores), 16x32GB DDR4 3200MHz, HPE ProLiant DL385 Gen10 Plus
Code:
# cores   Wall time (s):
------------------------
1         719.75
2         367.03
4         156.85
6         101.61
8         76.86
12        53.69
16        40.29
24        32.71
32        26.73
48        23.02
64        20.97
All of them on Ubuntu 20.04, OpenFOAM-7.
flotus1 and Crowdion like this.
n10 is offline   Reply With Quote

Old   April 28, 2022, 13:01
Default
  #493
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 548
Rep Power: 15
Simbelmynė is on a distinguished road
5800X3D, 2 x 8 GB DDR4 Rank1 @ 3200 MT/s (14-14-14-14-28,1T)
OFv9, OpenSUSE Tumbleweed, GCC 11.2, kernel 5.17.4

The 1 core result is amazing and the 6 core result is pretty decent as well. I assume this is the fastest dual channel CPU for CFD right now. Well at least until someone with a large wallet can post some results for Alder Lake with DDR5 @ 6400+ MT/s EDIT: (missed the post a couple pages back, the i5-12600 with DDR5 @ 6000 MT/s is indeed faster, and not terribly expensive with a B660 motherboard, so definitely a better value if buying an entire new computer)
The single-core result is 33% faster than the 5900X (from this thread). The 5900X has a single core boost up to 4.8 GHz while the 5800X3D only boosts to 4.5 GHz. Apparently the extra V-Cache is more important than the extra single-core speed.



Code:
 cores       Simulation     Meshing
#                (s)      (min.sec)
1             314.21        12m23s
2             201.98        8m21s
4             149.98        5m05s
6             138.55        4m02s
Will update if I manage to push the memory and IF to 1800 MHz.



EDIT:
2 x 8 GB DDR4 Rank1 @3800 MT/s (16-16-16-16-32, 1T)

Code:
cores    Simulation         Meshing
#           (s)             (min.sec)
1            304              12m14
2            188              8m12
4            135              4m58
6            124              3m55
8            122              3m28
I have some results where the IF manages 2000 MHz, which admits 4000 MT/s in 1:1. Not fully stable though so i need a few more days to learn this particular CPU. The interesting part is that higher IF speeds means that the L3 cache latency decreases, so it not only admits higher bandwidths.

Last edited by Simbelmynė; April 30, 2022 at 04:21. Reason: Added some more benchmarks
Simbelmynė is offline   Reply With Quote

Old   April 28, 2022, 14:21
Default
  #494
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
You must have missed OpenFOAM benchmarks on various hardware
Speaking of particularly deep pockets, I don't think Alder Lake is such a bad deal compared to AMDs pretty steep asking price for the 5800X3D. DDR5 is still more expensive, but the prices have come down a lot. Right now its about even between spending more for the AMD CPU vs. DDR5 memory.
flotus1 is offline   Reply With Quote

Old   April 28, 2022, 16:47
Default
  #495
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 548
Rep Power: 15
Simbelmynė is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
You must have missed OpenFOAM benchmarks on various hardware
Speaking of particularly deep pockets, I don't think Alder Lake is such a bad deal compared to AMDs pretty steep asking price for the 5800X3D. DDR5 is still more expensive, but the prices have come down a lot. Right now its about even between spending more for the AMD CPU vs. DDR5 memory.
Yeah, missed that one. So Alder Lake @ 6000 MT/s is indeed faster than 5800X3D @3200 MT/s. The 5800X3D is much faster in single core. Obviously that may be a moot point for CFD, still impressive though.

In terms of value, I agree that a complete new system with 5800X3D is not worth it. However someone with Zen or Zen2 CPU may benefit from this upgrade compared to going with a new Alder Lake system.

This V-cache is probably a test for Zen4. 7000 series with ddr5 will be interesting!
Simbelmynė is offline   Reply With Quote

Old   May 6, 2022, 07:03
Default
  #496
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Quote:
Originally Posted by rnburne View Post
The 4610's were dirt cheap but other processors were also considered. For an additional 10% boost in performance, the added cost would have been roughly 30%. Since this machine will only see occasional use, I couldn't justify it.
Hi rnburne,

so if you do run into a little money for upgrades then I can warmly recommend the 4627v2. Check out post #416 for that config. And yes I agree those old servers deliver staggering performance/$ right now.


Kai.
Kailee71 is offline   Reply With Quote

Old   May 7, 2022, 00:36
Default E5-2697 v2
  #497
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 339
Rep Power: 12
wkernkamp is on a distinguished road
Supermicro X9DRi-LN4+/X9DR3-LN4+ 2xE5-2697 v2 16x 8 GB DDR3-1867



Code:
Meshing Times:
1 1496.26
2 1101.94
4 597.9
8 379.49
12 294.37
16 296.97
20 249.29
24 269.57
Flow Calculation:
1 998.49
2 518.38
4 264.6
8 140.26
12 109.28
16 94.83
20 90.71
24 91.93
Faster than I thought it would be.
wkernkamp is offline   Reply With Quote

Old   May 13, 2022, 03:55
Default
  #498
Member
 
Kailee
Join Date: Dec 2019
Posts: 35
Rep Power: 6
Kailee71 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
Supermicro X9DRi-LN4+/X9DR3-LN4+ 2xE5-2697 v2 16x 8 GB DDR3-1867



Code:
Meshing Times:
<SNIP>
20 249.29
24 269.57
Flow Calculation:
<SNIP>
20 90.71
24 91.93
Faster than I thought it would be.
Interesting. It confirms that a larger L3 can compensate for slightly slower core clocks, identical RAM speed presumed of course. 2690 v2 has identical sim speed, when 2697 v2 has 30Mb & 2.7GHz and 2690 v2 has 25Mb and 3GHz. The 2697's two extra cores add nothing in terms of speed as RAM is bottlenecked anyway.

If you're not worried about energy consumption then Ivy Bridge is about as cost effective as it gets currently.

Kai.
wkernkamp likes this.
Kailee71 is offline   Reply With Quote

Old   May 13, 2022, 19:21
Default
  #499
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 339
Rep Power: 12
wkernkamp is on a distinguished road
The result for 16 cores is about 5.5 seconds faster than same for my E5-4627v2 that turbo at 3.6 GHz. So you are right that the cache is more important than the clock. (Seems the only explanation).
wkernkamp is offline   Reply With Quote

Old   May 24, 2022, 01:55
Default Improved thermal paste application
  #500
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 339
Rep Power: 12
wkernkamp is on a distinguished road
2xE5-2697 v2 16x 8GB DDR-1866 MHz OF v2112


Meshing Times:
1 1534.48
2 1003.27
4 584.09
6 404.03
8 345.7
10 300.07
12 272.53
16 243.57
18 239.14
20 212.94
22 201.89
24 203.04


Flow Calculation:
1 976.76
2 512.45
4 232.43
6 159.07
8 129.94
10 113.5
12 102.33
16 90.8
18 88.11
20 85.73
22 84.44
24 84.02



This may be the best Ivy Bridge result in this thread..
flotus1 likes this.
wkernkamp is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 04:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 06:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 09:54


All times are GMT -4. The time now is 16:54.