CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree480Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 12, 2018, 06:53
Default
  #81
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
It's pretty amazing that 4x EPYC 7351's are roughly equivalent to (maybe slightly faster than) my whole cluster, which has 10x 10 core processors and almost perfect node scaling thanks to infiniband.

Memory bandwidth FTW
spaceprop is offline   Reply With Quote

Old   July 12, 2018, 10:48
Default
  #82
Senior Member
 
Join Date: Oct 2011
Posts: 239
Rep Power: 16
naffrancois is on a distinguished road
Quote:
Originally Posted by spaceprop View Post
That's faster than the dual 7601's in the main chart (data from havref posted earlier in this thread). Can you give some more info about your setup?
Yes it is a bit faster which surprised me as well, it does not make more sense to me than to you. Maybe more background tasks were running on the 2*7601 platform. I cannot say much more, I am running ubuntu 16.04 LTS and downloaded and installed the OF binaries following the website procedure using docker etc.

I just switched off smt nothing more on the optimization side. I only ran this serie of tests it is not an average
naffrancois is offline   Reply With Quote

Old   July 12, 2018, 11:02
Default
  #83
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,398
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
A few things come to mind since the dual Epyc 7601 running 32 cores was also slower than my dual Epyc 7301 setup.
I suspect that internal memory organization plays an important role here. Mine used dual-rank modules which I highly recommend. I can only speculate that the 7601 setup used single-rank.
And of course different linux kernels were used. From my experience, older versions can severely hurt performance for Epyc CPUs and in general for CPUs released after the kernel.
Then when benchmarking (or running heavy jobs) I make sure the system is as idle as possible, caches are cleared and turbo modes are used to the full extent.
flotus1 is offline   Reply With Quote

Old   July 13, 2018, 04:28
Default
  #84
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
dual-rank modules which I highly recommend.
Agreed. I saw significant performance gains from dual rank vs single rank DDR3.

Quote:
Originally Posted by flotus1 View Post
From my experience, older versions can severely hurt performance for Epyc CPUs and in general for CPUs released after the kernel.
Then when benchmarking (or running heavy jobs) I make sure the system is as idle as possible, caches are cleared and turbo modes are used to the full extent.
I never considered kernel versions...that's a good point. I make sure the system is idle and I check frequencies with turbostat to make sure they turbo, but what are the caches you mention?
spaceprop is offline   Reply With Quote

Old   July 13, 2018, 04:35
Default
  #85
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
Quote:
Originally Posted by naffrancois View Post
Yes it is a bit faster which surprised me as well, it does not make more sense to me than to you.
I saw you have 16x8GB 2666MHz RAM, but what rank is it? If it's 2R (dual rank), and the 2x 7601 was 1R, that might explain it.

Quote:
Originally Posted by havref View Post
2x Epyc 7601, 16x 8GB DDR4 2666MHz, 1TB SSD, running OpenFOAM 5.0 on Ubuntu 16.04.
havref: Is your RAM single rank?

I'm curious now, haha
spaceprop is offline   Reply With Quote

Old   July 13, 2018, 05:10
Default
  #86
Senior Member
 
Join Date: Oct 2011
Posts: 239
Rep Power: 16
naffrancois is on a distinguished road
It should be as I specifically asked for it. I did not check though. Is there a terminal command to check that, I d rather not open the case now
naffrancois is offline   Reply With Quote

Old   July 13, 2018, 05:14
Default
  #87
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,398
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by spaceprop View Post
I never considered kernel versions...that's a good point. I make sure the system is idle and I check frequencies with turbostat to make sure they turbo, but what are the caches you mention?
When the system is new I check if it handles sustained heavy load with the advertised turbo frequencies.
Before running a time-critical simulation I do
Code:
# free && sync && echo 3 > /proc/sys/vm/drop_caches && free
to clear caches and
Quote:
# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
to make sure the highest possible CPU frequency is always used on all cores.
Many people will try to tell you that clearing caches beforehand is not necessary because the system will do a sufficient job of organizing memory. I found that this is not always the case.
sourav90 likes this.
flotus1 is offline   Reply With Quote

Old   July 13, 2018, 05:37
Default
  #88
Senior Member
 
Join Date: Oct 2011
Posts: 239
Rep Power: 16
naffrancois is on a distinguished road
dmidecode -t memory tells dual rank
linuxguy123 likes this.
naffrancois is offline   Reply With Quote

Old   July 13, 2018, 05:38
Default
  #89
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
Quote:
Originally Posted by naffrancois View Post
It should be as I specifically asked for it. I did not check though. Is there a terminal command to check that, I d rather not open the case now
Code:
sudo dmidecode -t memory
I think that will give you the part number, which you can then look up.
spaceprop is offline   Reply With Quote

Old   July 13, 2018, 05:39
Default
  #90
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
Quote:
Originally Posted by naffrancois View Post
dmidecode -t memory tells dual rank
Cool, thanks.
spaceprop is offline   Reply With Quote

Old   July 13, 2018, 05:40
Default
  #91
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Before running a time-critical simulation I do
Code:
# free && sync && echo 3 > /proc/sys/vm/drop_caches && free
to clear caches
Huh...never knew this.
spaceprop is offline   Reply With Quote

Old   July 25, 2018, 03:47
Default
  #92
New Member
 
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 7
tpearson-raptor is on a distinguished road
2x IBM POWER9 Sforza 22 core CPUs [1], 8x16GB 2Rx4 DDR4-2400 registered ECC, OpenFOAM 5.x (GitHub version), Ubuntu 18.04, kernel 4.18-rc1

Code:
# Cores           Wall time [s]
------------------------------------------------------------              
1         677.38
2         366.04
4         180.1
6         124.17
8         96.64
12         70.16
16         56.39
20         47.47
24         41.76
44         36.71
[1] https://raptorcs.com/content/CP9M08/intro.html
spaceprop likes this.
tpearson-raptor is offline   Reply With Quote

Old   July 25, 2018, 03:55
Default
  #93
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
Quote:
Originally Posted by tpearson-raptor View Post
2x IBM POWER9 Sforza 22 core CPUs
so much want, so not enough money
tpearson-raptor likes this.
spaceprop is offline   Reply With Quote

Old   July 25, 2018, 04:06
Default
  #94
New Member
 
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 7
tpearson-raptor is on a distinguished road
Quote:
Originally Posted by spaceprop View Post
so much want, so not enough money
Fully understood, that's a high end professional workstation we benchmarked

In general, POWER9 pricing isn't that bad compared to Intel / EPYC; while the 22 core CPUs are the top-end, rather expensive parts, take a look at the 18 core devices (basically one step down from the premium 22-core CPUs) for best value. Performance will be pretty close to the full 22 core results in practice since the 18 core can boost to higher clocks before it hits thermal limits.
tpearson-raptor is offline   Reply With Quote

Old   July 25, 2018, 04:35
Default
  #95
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,398
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That is pretty impressive for only 8 memory channels. Does it have some kind of L4 cache?
flotus1 is offline   Reply With Quote

Old   July 25, 2018, 05:04
Default
  #96
New Member
 
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 7
tpearson-raptor is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That is pretty impressive for only 8 memory channels. Does it have some kind of L4 cache?
No, but each module has over 100MB of L3 cache; this system had 220MB L3 in total (5MB/core, 44 cores). POWER is also traditionally very strong on I/O of all sorts including to and from DRAM.
tpearson-raptor is offline   Reply With Quote

Old   July 25, 2018, 16:46
Default
  #97
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18
kyle is on a distinguished road
Doesn't POWER9 have 8 memory channels per socket? If so you're only using half of the memory channels.
kyle is offline   Reply With Quote

Old   July 25, 2018, 17:01
Default
  #98
New Member
 
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 7
tpearson-raptor is on a distinguished road
Quote:
Originally Posted by kyle View Post
Doesn't POWER9 have 8 memory channels per socket? If so you're only using half of the memory channels.
While POWER9 supports up to 8 memory channels in general per module, it also comes in three different packages: Sforza, Monza, and LaGrange. Sforza is the smallest package and only exposes four channels per module. LaGrange is the largest module, but consumes significantly more power and is much more expensive. Its main feature is BlueLink for GPU compute, versus the more balanced features of the Sforza package.

For an example of just how large LaGrange/Monza packages are, check out the picture here: https://www.tomshardware.com/news/ib...ers,36054.html That's what's needed for 8 memory channels to be exposed alongside all the PCIe lanes, etc. Sforza's roughly 1/2 the size on each side.

Last edited by tpearson-raptor; July 25, 2018 at 20:39.
tpearson-raptor is offline   Reply With Quote

Old   July 26, 2018, 02:49
Default
  #99
New Member
 
Timothy Pearson
Join Date: Jul 2018
Location: United States
Posts: 6
Rep Power: 7
tpearson-raptor is on a distinguished road
After a bit of tuning....

2x IBM POWER9 Sforza 22 core CPUs, 8x16GB 2Rx4 DDR4-2400 registered ECC, OpenFOAM 5.x (GitHub version), Ubuntu 18.04, kernel 4.18-rc1, OpenFOAM modified to build with mcpu=power9 instead of mcpu=power8:

Code:
# Cores           Wall time [s]
------------------------------------------------------------              
1         659.81
2         355.5
4         176.6
6         121.2
8         94.65
12         68.4
16         55.63
20         46.81
24         41.51
44         36.3
EDIT: SMT2 results removed, looks like the benchmark script can let an error in the meshing stage propagate without halting, thus unexpectedly feeding bad data into the analysis stage.

Last edited by tpearson-raptor; July 27, 2018 at 05:43.
tpearson-raptor is offline   Reply With Quote

Old   July 26, 2018, 04:09
Default
  #100
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 498
Rep Power: 20
JBeilke is on a distinguished road
Quote:
Originally Posted by tpearson-raptor View Post
Fully understood, that's a high end professional workstation we benchmarked

Just some questions:
  • What about the noise of the machine? Are there plans for a quiet design?
  • Did you try to run programs like StarCCM+ in a VirtualMachine? How is the performance?
Many Thanks
Jörn
JBeilke is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 04:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 06:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 09:54


All times are GMT -4. The time now is 02:21.