CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree52Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 5, 2018, 07:10
Default OpenFOAM benchmarks on various hardware
  #1
Member
 
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 14
eric is on a distinguished road
** Update 2: I have created a page on the OpenFOAM wiki: https://openfoamwiki.net/index.php/Benchmarks . The updated plot will now be found there as I will eventually not be able to edit this post. But please continue to contribute further benchmarks in this thread! **

** Update: I have now added a plot with minimum time to solution for all hardware posted in this thread! I will try to keep this updated as more results are posted. Thank you for all the contributions! **

Hi,

I promised in another thread here to run some OpenFOAM benchmarks on different Intel hardware that I have available, so here they are. These are based on the motorBike benchmark, but I modified it to have more grid cells, run fewer iterations and to use scotch decomposition. You can find the full setup in the attached tar.gz-file. If you want to test on your hardware, you only need to run the run.sh script (you only need to change the number of cores in the three for loops if you want to run on a different number of cores). It would be interesting if more people could contribute to generate a modest database of benchmarks here.

The below table shows runtime in seconds. There is also a graph which shows the speedup.

Some observations, most relatively obvious :
  • There is an obvious correlation between speedup and number of memory controllers available. The old octocore E7 machine is very slow on single-core but shows great speedup. The other machines show modest speedup past ~2x the number of memory controllers.
  • A fast CPU helps for single-core simulations. The two processors with 3.7 GHz turbo frequency are the fastest here.
  • If you are buying new hardware, the Gold 6148 does not scale at all past ~16 cores so the 6130 or 6142 seem like better choices. Of course, this assumes you only have Intel available, if not AMD Epyc seems like a better choice based on the other threads in this forum.
Code:
#   Gold 6148  8x E7-8870  2x E5-2695 v2  2x E5-2643 v3  2x E5-2695 v4
1       874       2132        1451             883            1084
2       435       1124         597             468             578
4       225        476         281             215             273
6       164        297         205             153             189
8       136        203         178             126             146
12      111        148         150             101             104
16      101        104         140                              85
20       98         92         137                              76
24                  77         137                              71
36                  64                                          65


Attached Images
File Type: png openfoam_speedup.png (32.3 KB, 1518 views)
File Type: png openfoam_benchmarks_all.png (9.5 KB, 1412 views)
Attached Files
File Type: gz bench_template.tar.gz (8.7 KB, 163 views)
JBeilke, wyldckat, Blanco and 14 others like this.

Last edited by eric; February 15, 2018 at 17:25.
eric is offline   Reply With Quote

Old   February 7, 2018, 03:29
Default
  #2
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,957
Rep Power: 30
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Edit: now with modified controldict to get proper results.
mpirun thoroughly disliked my attempts to pin it to certain cores, resulting in abysmal performance for most cases. So these results are just with plain mpirun -np N

2x AMD Epyc 7301, 16x16GB 2Rx4 DDR4-2133 reg ECC, of_v1712, Opensuse Tumbleweed, kernel 4.14.14-1
Code:
# cores   Wall time (s)   speedup:
------------------------------------------------
01        1016.6           1.0
02         480.5           2.1
04         231.9           4.4
08         125.4           8.1
12          79.9          12.7
16          66.4          15.3
20          60.5          16.8
24          52.0          19.6
28          49.1          20.7
32          42.6          23.9
__________________
Please do not send me CFD-related questions via PM

Last edited by flotus1; February 8, 2018 at 05:52.
flotus1 is offline   Reply With Quote

Old   February 7, 2018, 06:25
Default
  #3
Member
 
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 14
eric is on a distinguished road
Thanks for running this, flotus. The error happens at the very end of the simulation so it shouldn't affect the timings by much. If you still want to fix it, see below. Impressive performance, it scales a lot better than the Intel machines. Would be nice to also have a dual socket Gold machine to compare against.

The error is happening when trying to calculate streamlines at the end of the simulation. Looks like this is due to version differences, I see you are using the v1712 version while I use the 5.x version. The easiest way to fix this is to disable the streamline calculation. Just open the file basecase/system/controlDict and remove the lines
Code:
#include streamlines
#include wallBoundedStreamlines
You should also delete all the run_* folders before rerunning the run.sh script.
eric is offline   Reply With Quote

Old   February 7, 2018, 06:34
Default
  #4
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 283
Rep Power: 13
JBeilke is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Bummer...
That benchmark script doesn't seem to run properly on my machine (dual AMD Epyc 7301)
I already get a "core dumped" message during the first serial run of simplefoam, then it halts while executing on 16 cores. I aborted this run with ctrl+c, the rest of the cases then finished, but somehow not all gave valid timing results

I attached the log files and shell output here, maybe you can tell me what went wrong.
Attachment 61229
Extrapolated from the 99th iteration in the log file you get :

Code:
1    1041.62
2    595
4    257
8    130
12    85
16    62
24    55
36    44
This means superlinear speedup with 16 cores (105%) and 74% with 32 cores. Not bad.

Only the single core performance is a bit low :-(
JBeilke is offline   Reply With Quote

Old   February 7, 2018, 06:47
Default
  #5
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 283
Rep Power: 13
JBeilke is on a distinguished road
Code:
#    i7-2600   i7-3960X    E5 1650 V3
 
1    1085       794         824
2     727       433         440
4               253         258
6               212         214
This is a bit strange since the E5 is normally about 10% faster than the 3960X. The first two configurations (2600 , 3960X) are run in a Virtual machine (Linux on top of Linux).

It is very interesting to see that the 3960X is the fastest processor for 1 or 2 core calculations.
JBeilke is offline   Reply With Quote

Old   February 7, 2018, 12:20
Default
  #6
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,957
Rep Power: 30
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Thanks for your input, I will run the case again tonight and edit the results, maybe throw in some better core-binding options. Mpirun or Linux are not fully aware which cores form a NUMA-node.
Until then, results from the machine I used to test your suggestion: single Xeon W3670 (6 cores) with triple-channel DDR3-1333, of_v1712, Opensuse Leap 42.2, kernel 4.4.104-39:
Code:
# cores   Wall time (s):
------------------------
1         1262.5
2          849.8
4          649.6
6          622.7
I think that adding some system and software version information might be a good idea when submitting and comparing benchmark results.
The single-core result for Epyc was to be expected, it only uses a single-core turbo of 2.7GHz. I already stated this in my initial review, AMD missed the spot for medium core count CPUs with higher clock speeds. A 16-core variant with >=3.5GHz or at least higher single-core turbo would have been no problem from a TDP perspective. Forcing you to buy the most expensive SKU with lots of useless cores to get at least 3.2GHz single core is what Intel would do

Edit: AMD Epyc results now edited in the second post.
Since there were no results for Xeon E5 "v1" yet: Dual Xeon E5-2687W, 16x8GB DDR3-1600, of_v1712, Opensuse leap 42.3, kernel 4.4.103-36
Code:
# cores   Wall time (s):
------------------------
01         898.8
02         502.1
04         235.1
06         169.7
08         141.6
10         128.4
12         119.3
14         116.3
16         112.6
__________________
Please do not send me CFD-related questions via PM

Last edited by flotus1; February 8, 2018 at 05:22.
flotus1 is offline   Reply With Quote

Old   February 9, 2018, 17:04
Default
  #7
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 313
Rep Power: 9
Simbelmynė is on a distinguished road
@eric

While the speedup of added cores is interesting, I also think that speedup vs other hardware is of great interest. Since this is present in this thread, perhaps you could also compile and maintain a plot in the first post (if the thread continues to grow that is)? I guess the metric is the lowest possible solution time on a given hardware. Possibly normalized against some system of choice.

I'll join in with 1950X, 7940X and 8700k soon, so you get some comparison for lower budget systems
Simbelmynė is offline   Reply With Quote

Old   February 9, 2018, 19:16
Default
  #8
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,957
Rep Power: 30
flotus1 will become famous soon enoughflotus1 will become famous soon enough
The problem with that is that you can no longer edit older posts after a few weeks. Maintaining a thread like this becomes impossible. This restriction kept me from starting one or two related threads in the past.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   February 10, 2018, 03:22
Default
  #9
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 313
Rep Power: 9
Simbelmynė is on a distinguished road
That's strange. A thread like this has definitely the possibility to be "sticky".

Oh, well, browsing to the last post only requires one extra mouse click


7940X, 32 (4x8) GB 3200 MHz RAM, CentOS 7.x, kernel 3.10.0
Code:
# cores   Wall time (s):
------------------------
1 764.36
2 419.98
4 233.26
6 188.29
8 169
12 160.28
14 168.73
Threadripper 1950X, 32 (4x8) GB 3200 MHz RAM, CentOS 7.x, kernel 4.14.5 (SMT on)
Code:
# cores   Wall time (s):
------------------------
1 827.21
2 465.01
4 235.17
6 198.81
8 170.73
12 154.26
16 154.9
8700K, 32 (4x8) GB 3200 MHz RAM, Mint 18.3, kernel 4.13.0
Code:
# cores   Wall time (s):
------------------------
1 531.44
2 312.15
4 249.55
6 247.83
It is also interesting to analyze the meshing time.

For the 8700K system we have:
Code:
# cores   real time:
------------------------
1            16m35s
2            10m56s
4            07m01s
6            05m30s
While the 1950X performs as:
Code:
# cores   real time:
------------------------
1            23m32s
2            16m01s
4            08m44s
6            06m50s
8            05m48s
12          04m38s
16          04m12s
It seems that the meshing part is not as memory bound as the CFD solver.

Last edited by Simbelmynė; February 10, 2018 at 06:26.
Simbelmynė is offline   Reply With Quote

Old   February 10, 2018, 06:22
Default
  #10
New Member
 
Join Date: Jan 2018
Posts: 6
Rep Power: 2
The_Sle is on a distinguished road
7820X@4,6Ghz, 4x8 GB 3400MHz RAM, Ubuntu 17.10, kernel 4.13.0-32

Code:
# cores   Wall time (s):  Speedup:
----------------------------------
1          756.42           1
2          376.09           2,0
4          205.46           3,7
6          168.24           4,5
8          160.05           4,7
Could it be that past 6 cores 4 channel memory is causing a bottleneck?

Code:
#Cores  Mesh time
-----------------
1       19m37s  1177s
2       13m3s   783s
4       7m23s   443s
6       5m30s   330s
8       5m8s    308s

Last edited by The_Sle; February 10, 2018 at 09:18. Reason: Added meshing data
The_Sle is offline   Reply With Quote

Old   February 10, 2018, 06:26
Default
  #11
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,567
Blog Entries: 42
Rep Power: 115
wyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of light
Greetings to all!

Quote:
Originally Posted by flotus1 View Post
The problem with that is that you can no longer edit older posts after a few weeks. Maintaining a thread like this becomes impossible. This restriction kept me from starting one or two related threads in the past.
edit: I forgot to remind people that there is a limit for forum members to edit their posts for 30 days; after that, the posts can only be edited by moderators.

Quote:
Originally Posted by Simbelmynė View Post
That's strange. A thread like this has definitely the possibility to be "sticky".
There are a few choices to solve this:
  1. The thread can be stickied if people ask for it from moderators (use the report button on the first post if you're feeling lazy in sending a PM to a moderator ).
  2. Blog posts can be edited forever by the original author, although there is a limit of 5000 characters, if I remember correctly, so you can post a link to it on the first post (I or any moderator can do that for you if you want).
  3. There is also the CFD-Online wiki: https://www.cfd-online.com/Wiki/Main_Page - this could be added as its own FAQ page.
  4. And in this specific case, since OpenFOAM is being used for benchmarking, it can be documented at openfoamwiki.net: https://openfoamwiki.net/index.php/Benchmarks
And many thanks for kicking off this thread with very valuable information!
Let me know if you want this thread stickied and/or want me to start a wiki page for this!

Best regards,
Bruno
Simbelmynė, rajibroy and Noco like this.

Last edited by wyldckat; February 13, 2018 at 17:53. Reason: see "edit:"
wyldckat is offline   Reply With Quote

Old   February 10, 2018, 06:28
Default
  #12
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 313
Rep Power: 9
Simbelmynė is on a distinguished road
Quote:
Originally Posted by The_Sle View Post
7820X@4,6Ghz, 4x8 GB 3400MHz RAM, Ubuntu 17.10, kernel 4.13.0-32

Code:
# cores   Wall time (s):  Speedup:
----------------------------------
1          756.42           1
2          376.09           2,0
4          205.46           3,7
6          168.24           4,5
8          160.05           4,7
Could it be that past 6 cores 4 channel memory is causing a bottleneck?

That is really interesting. It seems that the 7940X is a terrible price/performance option compared to the 7820X (this was perhaps known, but not that the 7820X is actually as fast as the 7940X regardless of the number of cores being used). Your system is overclocked on all cores?

Perhaps you have some other processes running that interfere with the simulation so some extent?

Finally I do not understand why your system is so slow on 1 core, compared to the 8700K, which runs @4.7 GHz on one core (and slower memory). They should be quite similar.
Simbelmynė is offline   Reply With Quote

Old   February 10, 2018, 09:20
Default
  #13
New Member
 
Join Date: Jan 2018
Posts: 6
Rep Power: 2
The_Sle is on a distinguished road
Quote:
Originally Posted by Simbelmynė View Post
That is really interesting. It seems that the 7940X is a terrible price/performance option compared to the 7820X (this was perhaps known, but not that the 7820X is actually as fast as the 7940X regardless of the number of cores being used). Your system is overclocked on all cores?

Perhaps you have some other processes running that interfere with the simulation so some extent?

Finally I do not understand why your system is so slow on 1 core, compared to the 8700K, which runs @4.7 GHz on one core (and slower memory). They should be quite similar.
I reran the tests 3 times, the best 1 core result was 736 seconds. Parallel results didn't show as much variance, only few seconds both ways here and there.

Yes it's running 4,6 GHz on all cores. I checked it with turbostat during runs, thermals are OK as well. The newer gen 8700K just is that much faster in single thread workloads I suppose. That 8700K is really impressive actually, and difference between X299 and TR is surprisingly small!
The_Sle is offline   Reply With Quote

Old   February 13, 2018, 14:10
Default
  #14
Member
 
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 14
eric is on a distinguished road
Thank you for all the contributions! I have made a new plot summarizing all the results, and asked Bruno to sticky the post so that I can keep updating it.

Interesting to see the performance of the "enthusiast" i7 and Threadripper processors, looks like good choices for workstations for testing/developing and pre/post-processing.
eric is offline   Reply With Quote

Old   February 13, 2018, 14:28
Default
  #15
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,957
Rep Power: 30
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Now I feel kind of sorry for adding Xeon W3670 and messing up the scaling in the diagram
But seriously, I think the inverse (iterations per second) would be a better metric to compare in a diagram. Otherwise the huge differences in performance at the top end become indistinguishable.
On a side note: It would be helpful if new contributions gave more information about the actual setup. Software versions, memory configuration...but more importantly: clock speeds for over-clockable processors.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   February 13, 2018, 17:36
Default
  #16
Member
 
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 14
eric is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Now I feel kind of sorry for adding Xeon W3670 and messing up the scaling in the diagram
But seriously, I think the inverse (iterations per second) would be a better metric to compare in a diagram. Otherwise the huge differences in performance at the top end become indistinguishable.
On a side note: It would be helpful if new contributions gave more information about the actual setup. Software versions, memory configuration...but more importantly: clock speeds for over-clockable processors.
I agree, I have updated the plot now
flotus1 likes this.
eric is offline   Reply With Quote

Old   February 14, 2018, 03:34
Default
  #17
New Member
 
Håvard B. Refvik
Join Date: Jun 2015
Location: Norway
Posts: 16
Rep Power: 5
havref is on a distinguished road
Thank you for starting this thread. I got my hands on a couple of Epyc 7601 processors this week, so figured I'd do the same tests on it for comparison. Will post results with a dual Epyc 7351 when our server arrives in a couple of weeks and a 2 x dual Epyc 7351 when I've had the time to set them up with infiniband.

2x Epyc 7601, 16x 8GB DDR4 2666MHz, 1TB SSD, running OpenFOAM 5.0 on Ubuntu 16.04.
Code:
# Cores	Wall time [s]	Speedup
------------------------------------------------------------		
 1	971.64	          1
 2	577.18	          1.7
 4	234.01	          4.2
 6	169.8	          5.7
 8	132.41	          7.3
12	 81.52	         11.9
16	 59.65	         16.3
20	 62.56	         15.5
24	 54.39	         17.9
28	 45.92	         21.2
32 	 43.42	         22.4
36	 42.83	         22.7
48	 40.5	         24.0
64	 35	         27.8
I removed streamlines and wallBoundedStreamlines from the controlDict. The rest of the case is identical to yours. Let me know if you want me to fill in the gaps between 36 and 64 cores.
flotus1, ErikAdr and Noco like this.

Last edited by havref; February 14, 2018 at 11:26. Reason: Added speeedup
havref is offline   Reply With Quote

Old   February 14, 2018, 15:21
Default
  #18
Member
 
Knut Erik T. Giljarhus
Join Date: Mar 2009
Location: Norway
Posts: 35
Rep Power: 14
eric is on a distinguished road
Nice, havref. Looking forward to seeing the 7351 results as well.

It's worth noting that at 64 cores there is only ~30 000 cells per core so communication may start to become a bottleneck.
eric is offline   Reply With Quote

Old   February 16, 2018, 16:26
Default
  #19
New Member
 
Chad
Join Date: Jan 2017
Posts: 8
Rep Power: 4
chad is on a distinguished road
2x Intel Gold 5118, 12x 8GB DDR4 2400 MHz, M2 SSD, OpenFOAM 4.1, Ubuntu 17.10 Kernel 4.13.0-32

# cores Wall time (s):
------------------------
1: 1083.38
2: 558.41
4: 254.74
8: 131.22
16: 80.48
20: 73.1
24: 79.35

While still a novice when it comes to CFD, these results did surprise me as a bit slow. If anyone thinks I may have missed something, let me know and I'll gladly re-run these.
The_Sle likes this.
chad is offline   Reply With Quote

Old   February 18, 2018, 05:26
Default
  #20
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,957
Rep Power: 30
flotus1 will become famous soon enoughflotus1 will become famous soon enough
You could try to run a newer version of OpenFOAM. And since it is mostly the parallel performance >16cores which seems a bit low you could check if RAM came configured properly. Some of the Skylake-SP dual-socket motherboards have more than 12 DIMM-slots, populating memory correctly is crucial here.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 05:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 07:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 10:54


All times are GMT -4. The time now is 08:57.