OpenFOAM benchmarks on various hardware

nmc1988 · October 19, 2018, 00:32

Quote:

Originally Posted by Simbelmynë

The 2690v2 is a great price/performance choice if you can accept buying a refurbished system!

Thank you, I have compared some benchmark in this thread and I have found that 2xE5 2695v2 is even slower than 2690v2 and 2667v2. The reason might be higher number of cores but slower frequency
See attached images

flotus1 · October 19, 2018, 04:30

You can't compare results from different people without using a healthy error margin. There are just too many variables involved, other than the name of the CPU.
2690v2 or 2667v2 is just a matter of personal preference. They cost about the same, the latter has slightly higher performance per core.
The 12-core variants of this generation are usually too expensive and add little value for a CFD workstation.

edomalley1 · October 19, 2018, 22:39

Just got this system up and running - 2 x AMD Epyc 7301, 16x16GB 2666Mhz 2R RAM, Ubuntu 18.04, OpenFOAM 6.

Code:

# cores   Wall time (s):
------------------------
1        1039.55
2        505.02
4        233.76
6        157.64
8        116.84
12       85.11
16       62.09
20       58.6
24       50.54
28       46.54
32       46.57

I thought it would be a little faster given the 2666 memory. Also I find it puzzling that 32 cores was the same speed as 28. The screen turned off during the 32 run and I turned it back on... not sure if that would have effected it. I also had some small things running in the background - system monitor and settings.

Now if I could only get my Radeon WX 4100 driver to install without errors and causing me to have to re-install Ubuntu if I dare to restart with the driver installed. Seriously, I'm thinking about installing Windows 10. It's driving me nuts.

Simbelmynë · October 20, 2018, 05:10

Seeing that you run OpenFOAM 6, did you modify the script?

Your setup is near identical to our setup (if you run Ubuntu 18.04.1). The difference is that we only have 128 GB RAM and a 1050 ti as GPU.

Linux and Radeon has traditionally been a pain and I always go with Nvidia for our Linux setups. Not sure if this is the cause of your problems though.

It seems your results are fine until you start to go heavily threaded. Perhaps the heat-sink of one CPU is poorly mounted, making it hit thermal throttling?

Try to check this with the "top" command.

There are also some hardware monitors for Ubuntu that you could try. Not sure how well they are adapted to the current generation CPUs though.

edomalley1 · October 20, 2018, 10:59

Yes, I modified the script. The geometry section in SHM to the new format, location of the geometry in the allmesh files, of course the run.sh file to include runs up to 32 cores... also removed #include "streamlines", etc. If I don't do that it can't find the geometry and runs through the whole thing in like 8 seconds!

I'm running sensor-detect now so I can monitor temps - I really don't want to reinstall a CPU!

I'm seriously considering just using Windows 10 - not just because of the graphics card. I swap back and forth between Solidworks and OpenFOAM all the time and it will make work flow a lot faster. Right now I'm running Solidworks on a totally different machine.

I'll run this test on Windows too and see what the speed difference is.

flotus1 · October 20, 2018, 11:55

SMT is disabled? Socket interleaving for memory is off?
You could also check if all DIMMs are detected properly. Allegedly, this can be a problem with those SP3 sockets. In which case you would have to re-install the CPUs.

edomalley1 · October 20, 2018, 16:14

Quote:

Originally Posted by flotus1

SMT is disabled? Socket interleaving for memory is off?
You could also check if all DIMMs are detected properly. Allegedly, this can be a problem with those SP3 sockets. In which case you would have to re-install the CPUs.

SMT and interleaving are both on Auto. For SMT, it's either Auto or none. But performance manager is always showing 64 cores so I think generally it is on.

Interleaving options are Auto, none, channel, die, socket. Which is best?

Also, all the DIMMS are detected.

flotus1 · October 20, 2018, 18:08

SMT needs to be set to "None". Auto just means on

For Interleaving you can test "channel" or "die" and see what works best for you or if it makes any difference at all. But the SMT setting should already do the trick.

edomalley1 · October 21, 2018, 10:54

OK cool. I reran 24-32 cores same as before, just to make sure I get a similar result and the results for 32 improved by about 6 seconds. It seems like a more sensible result:

Code:

# cores   Wall time (s):
 ------------------------
 24 52.45
 28 47.36
 32 40.68

Then I turned SMT off and re-ran it:

Code:

# cores   Wall time (s):
 ------------------------
 24 50.2
 28 44.77
 32 38.87

Given Flotus1's results with a very similar system, but 2133 ram instead of 2666, this modest improvement in speed seems reasonable to me. It is still about 2 seconds slower than Simbelmyne's results with only half of the ram at the same speed though, which is puzzling. Is it just that memory bandwidth is still the bottleneck even if there is only 128 Gb of memory, so adding more memory just does not help?

flotus1 · October 21, 2018, 11:46

Since this benchmark fits comfortably into 16GB of RAM, adding more does not help at all.
For best results, clear caches before running the benchmark and set the frequency governor to performance mode.

this requires root privileges

Code:

sync; echo 3 > /proc/sys/vm/drop_caches

this should not

Code:

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

edomalley1 · October 21, 2018, 12:16

Even better

:

Code:

# cores   Wall time (s):
 ------------------------
 24 47.49
 28 41.48
 32 37.7

spaceprop · October 25, 2018, 05:42

Quote:

Originally Posted by spaceprop

Updating my earlier table with more results

More updates and hardware upgrades.

X-axis is iter/s. The processor(s) and RAM are called out in the graph. All RAM is dual rank, and if faster than max allowed by the processor, it has been downclocked. All OF+v1712, OpenMPI 3.1.0, CentOS 7.5. I applied the following BIOS settings to all: HT disabled (if applicable), anything related to maximum performance on, but no OC.

This one includes the results from my homelab 5-node cluster:

5.29 ips

, which is almost perfect node scaling. FDR Infiniband <3

Pretty much maxed out the processors for the given hardware, so this is as fast as this will get without changing to newer servers and/or adding more nodes. I might be able to eek out a few seconds by clearing caches and setting the scaling_governor (as suggested by flotus1), and maybe by using renumberMesh, but unless OpenFOAM magically gets more efficient/faster, then there isn't much more I can do from the software side either.

I'm continually impressed by the performance of the EPYC's. Two dual 7351 (or bigger) servers with back-to-back FDR IB should be able to match/beat my whole cluster. I'd love to do that, but RAM prices are awful.

Has anyone seen the der8auer's (famous overclocker) EPYC build? He used a prototype Elmor Labs to EVC v2 to overclock dual 7601's to 4GHz (all cores) on a Supermicro H11DSI and an ASUS RS700a-e9. He had to use novec submersion or dry ice, but he was able to get a fairly good overclock with a water cooler. Cinebench scores are useless for us, but it'd be awesome if some people with EPYC builds could get their hands on one of the EVC v2 boards when they come out, OC, and run this benchmark.

ekrumrick · November 12, 2018, 15:07

Hello all, I did the benchmark on OpenFOAM 5.0, Ubuntu 16.04 in an i7 8700, 16 GB DDR4 2400 MHz RAM, 240 GB SSD (+1TB HDD) with the following results:

Code:

# cores   Wall time (s): 
------------------------
 1 606.66
 2 394.91
 4 335.43
 6 334.41
12 325
18 329
24 337

I disabled SMT, but did the "multicore" simulations anyway, and I am a little bit confused about the results... ¿why does wall time diminishes with more "processes" if number of cores is just 6?

gkarlsen · November 12, 2018, 15:35

Quote:

Originally Posted by ekrumrick

Hello all, I did the benchmark on OpenFOAM 5.0, Ubuntu 16.04 in an i7 8700, 16 GB DDR4 2400 MHz RAM, 240 GB SSD (+1TB HDD) with the following results:

Code:

# cores   Wall time (s): 
------------------------
 1 606.66
 2 394.91
 4 335.43
 6 334.41
12 161.76
18 109.35
24 84.1
36 43.09

I disabled SMT, but did the "multicore" simulations anyway, and I am a little bit confused about the results... ¿why does wall time diminishes with more "processes" if number of cores is just 6?

Probably the mpirun just fails as you are asking it to run with more cores than what is available. If you open the logfile you should find errors.

ekrumrick · November 12, 2018, 16:33

Quote:

Originally Posted by gkarlsen

Probably the mpirun just fails as you are asking it to run with more cores than what is available. If you open the logfile you should find errors.

Thanks Geir, I checked the log files and found they are ok, but, what I informed as "Wall Time" is really "Execution Time", now I will correct this in my previous post and inform "Clock Time".
Regards,

Ezequiel

Simbelmynë · November 13, 2018, 05:09

Quote:

Originally Posted by ekrumrick

Hello all, I did the benchmark on OpenFOAM 5.0, Ubuntu 16.04 in an i7 8700, 16 GB DDR4 2400 MHz RAM, 240 GB SSD (+1TB HDD) with the following results:

Code:

# cores   Wall time (s): 
------------------------
 1 606.66
 2 394.91
 4 335.43
 6 334.41
12 325
18 329
24 337

I disabled SMT, but did the "multicore" simulations anyway, and I am a little bit confused about the results... ¿why does wall time diminishes with more "processes" if number of cores is just 6?

If you have the possibility to overclock the memory you can reach much better results with your 8700k. It becomes bandwidth-limited over 4 cores so there is no need to run more processes.

fluidic_bob · November 15, 2018, 14:10

HP DL380-G6 12x4GB PC3-10600R (running at 10,000MB/s), 2xX5650 on OpenFoam 5.x with Ubuntu Server 18.04.01 LTS kernel 4.15.0-20-generic

Code:

# cores   Wall time (s):
------------------------
1 1357.41
2 766.88
4 372.52
6 297.73
8 265.43
10 251.88
12 242.74

Morlind · December 12, 2018, 16:19

I have upgraded hardware after a couple test systems to find the right package. This is an extremely cost effective solution via my local, very helpful server refurb store.

Dell R820 quad E5-4650 V2 2.4ghz, 128gb RAM 1333, Ubuntu 16.04, OpenFoam 5

cores Wall time

1 1147.32

2 587.49

4 247.01

6 173.57

8 127.48

12 93.17

16 73.78

20 64.45

40 55.05

I am most impressed since this is literally twice as fast as my dual V2 system but only cost a tiny amount more. I paid ~$2600 for this system and it's a real powerhouse!

Simbelmynë · December 13, 2018, 04:50

That was really impressive. I would say that this is now probably the most cost-efficient solution right now. Is the dell motherboard limited to 1333 MHz memory? Most of the good 2690v2 results posted here have used the much faster 1866 MHz memory so if that works you have much room for improvement as well.

Rec · December 13, 2018, 11:59

I have AMD Ryzen 7 1800X 8-core, @ 3.6 GHz, 2 X 16GB Samsung DDR4, SSD M.2 950 EVO 256 Gb
I have installed UBUNTU 16.04 LTS and OpenFOAM 5.0.
When I run the test I get the following result:

# cores Wall time (s):
------------------------
1 893.2
2
4
6
8

I see in log from OpenFOAM/bench_template/run_2/ log.simpleFoam
I try this in Ubunty 18.4, Centos 7.6, can you help me?

Code:

Starting time loop

streamLine streamLines:
    automatic track length specified through number of sub cycles : 5

[1] 
[1] 
[1] --> FOAM FATAL ERROR: 
[1] Attempt to return primitive entry ITstream : IOstream.functions.streamLines.seedSampleSet, line 0, IOstream: Version 2.0, format ASCII, line 0, OPENED, GOOD
    primitiveEntry 'seedSampleSet' comprises 
        on line 0 the word 'uniform'
 as a sub-dictionary
[1] 
[1]     From function virtual const Foam::dictionary& Foam::primitiveEntry::dict() const
[1]     in file db/dictionary/primitiveEntry/primitiveEntry.C at line 189.
[1] 
FOAM parallel run aborting
[1] 
[0] 
[0] 
[0] --> FOAM FATAL ERROR: 
[0] Attempt to return primitive entry ITstream : /home/sergey/OpenFOAM/bench_template/run_2/system/controlDict.functions.streamLines.seedSampleSet, line 45, IOstream: Version 2.0, format ASCII, line 0, OPENED, GOOD
    primitiveEntry 'seedSampleSet' comprises 
        on line 45 the word 'uniform'
 as a sub-dictionary
[0] 
[0]     From function virtual const Foam::dictionary& Foam::primitiveEntry::dict() const
[0]     in file db/dictionary/primitiveEntry/primitiveEntry.C at line 189.
[0] 
FOAM parallel run aborting
[0] 
[1] #0  Foam::error::printStack(Foam::Ostream&)[0] #0  Foam::error::printStack(Foam::Ostream&) at ??:?
 at ??:?
[1] #1  Foam::error::abort()[0] #1  Foam::error::abort() at ??:?
[1] #2  Foam::primitiveEntry::dict() const at ??:?
[0] #2  Foam::primitiveEntry::dict() const at primitiveEntry.C:?
[1] #3  Foam::functionObjects::streamLine::read(Foam::dictionary const&) at primitiveEntry.C:?
[0] #3  Foam::functionObjects::streamLine::read(Foam::dictionary const&) at ??:?
[1] #4  Foam::functionObjects::streamLine::streamLine(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #4  Foam::functionObjects::streamLine::streamLine(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #5  Foam::functionObject::adddictionaryConstructorToTable<Foam::functionObjects::streamLine>::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #5  Foam::functionObject::adddictionaryConstructorToTable<Foam::functionObjects::streamLine>::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #6  Foam::functionObject::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #6  Foam::functionObject::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #7  Foam::functionObjects::timeControl::timeControl(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #7  Foam::functionObjects::timeControl::timeControl(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #8  Foam::functionObjectList::read() at ??:?
[0] #8  Foam::functionObjectList::read() at ??:?
[1] #9  Foam::Time::loop() at ??:?
[0] #9  Foam::Time::loop() at ??:?
[1] #10  Foam::simpleControl::loop() at ??:?
[0] #10  Foam::simpleControl::loop() at ??:?
[1] #11   at ??:?
[0] #11  ?? at ??:?
[1] #12  __libc_start_main at ??:?
[0] #12  __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6"
[1] #13  ? in "/lib/x86_64-linux-gnu/libc.so.6"
[0] #13  --------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 at ??:?
? at ??:?
[kb-4:14244] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[kb-4:14244] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

October 19, 2018, 04:30		#142
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,400 Rep Power: 47	You can't compare results from different people without using a healthy error margin. There are just too many variables involved, other than the name of the CPU. 2690v2 or 2667v2 is just a matter of personal preference. They cost about the same, the latter has slightly higher performance per core. The 12-core variants of this generation are usually too expensive and add little value for a CFD workstation. nmc1988 likes this.

October 19, 2018, 22:39		#143
edomalley1 Member Ed O'Malley Join Date: Nov 2017 Posts: 30 Rep Power: 8	Just got this system up and running - 2 x AMD Epyc 7301, 16x16GB 2666Mhz 2R RAM, Ubuntu 18.04, OpenFOAM 6. Code: # cores Wall time (s): ------------------------ 1 1039.55 2 505.02 4 233.76 6 157.64 8 116.84 12 85.11 16 62.09 20 58.6 24 50.54 28 46.54 32 46.57 I thought it would be a little faster given the 2666 memory. Also I find it puzzling that 32 cores was the same speed as 28. The screen turned off during the 32 run and I turned it back on... not sure if that would have effected it. I also had some small things running in the background - system monitor and settings. Now if I could only get my Radeon WX 4100 driver to install without errors and causing me to have to re-install Ubuntu if I dare to restart with the driver installed. Seriously, I'm thinking about installing Windows 10. It's driving me nuts.

October 21, 2018, 10:54		#149
edomalley1 Member Ed O'Malley Join Date: Nov 2017 Posts: 30 Rep Power: 8	OK cool. I reran 24-32 cores same as before, just to make sure I get a similar result and the results for 32 improved by about 6 seconds. It seems like a more sensible result: Code: # cores Wall time (s): ------------------------ 24 52.45 28 47.36 32 40.68 Then I turned SMT off and re-ran it: Code: # cores Wall time (s): ------------------------ 24 50.2 28 44.77 32 38.87 Given Flotus1's results with a very similar system, but 2133 ram instead of 2666, this modest improvement in speed seems reasonable to me. It is still about 2 seconds slower than Simbelmyne's results with only half of the ram at the same speed though, which is puzzling. Is it just that memory bandwidth is still the bottleneck even if there is only 128 Gb of memory, so adding more memory just does not help?

October 21, 2018, 11:46		#150
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,400 Rep Power: 47	Since this benchmark fits comfortably into 16GB of RAM, adding more does not help at all. For best results, clear caches before running the benchmark and set the frequency governor to performance mode. this requires root privileges Code: sync; echo 3 > /proc/sys/vm/drop_caches this should not Code: echo performance \| tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor Clément_G likes this.

October 21, 2018, 12:16		#151
edomalley1 Member Ed O'Malley Join Date: Nov 2017 Posts: 30 Rep Power: 8	Even better : Code: # cores Wall time (s): ------------------------ 24 47.49 28 41.48 32 37.7 spaceprop likes this.

October 20, 2018, 05:10		#144
Simbelmynë Senior Member Join Date: May 2012 Posts: 548 Rep Power: 15	Seeing that you run OpenFOAM 6, did you modify the script? Your setup is near identical to our setup (if you run Ubuntu 18.04.1). The difference is that we only have 128 GB RAM and a 1050 ti as GPU. Linux and Radeon has traditionally been a pain and I always go with Nvidia for our Linux setups. Not sure if this is the cause of your problems though. It seems your results are fine until you start to go heavily threaded. Perhaps the heat-sink of one CPU is poorly mounted, making it hit thermal throttling? Try to check this with the "top" command. There are also some hardware monitors for Ubuntu that you could try. Not sure how well they are adapted to the current generation CPUs though.

October 20, 2018, 10:59		#145
edomalley1 Member Ed O'Malley Join Date: Nov 2017 Posts: 30 Rep Power: 8	Yes, I modified the script. The geometry section in SHM to the new format, location of the geometry in the allmesh files, of course the run.sh file to include runs up to 32 cores... also removed #include "streamlines", etc. If I don't do that it can't find the geometry and runs through the whole thing in like 8 seconds! I'm running sensor-detect now so I can monitor temps - I really don't want to reinstall a CPU! I'm seriously considering just using Windows 10 - not just because of the graphics card. I swap back and forth between Solidworks and OpenFOAM all the time and it will make work flow a lot faster. Right now I'm running Solidworks on a totally different machine. I'll run this test on Windows too and see what the speed difference is.

October 20, 2018, 11:55		#146
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,400 Rep Power: 47	SMT is disabled? Socket interleaving for memory is off? You could also check if all DIMMs are detected properly. Allegedly, this can be a problem with those SP3 sockets. In which case you would have to re-install the CPUs.

October 20, 2018, 18:08		#148
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,400 Rep Power: 47	SMT needs to be set to "None". Auto just means on For Interleaving you can test "channel" or "die" and see what works best for you or if it makes any difference at all. But the SMT setting should already do the trick.

November 12, 2018, 15:07		#153
ekrumrick New Member Ezequiel Krumrick Join Date: Mar 2014 Location: Argentina Posts: 10 Rep Power: 12	Hello all, I did the benchmark on OpenFOAM 5.0, Ubuntu 16.04 in an i7 8700, 16 GB DDR4 2400 MHz RAM, 240 GB SSD (+1TB HDD) with the following results: Code: # cores Wall time (s): ------------------------ 1 606.66 2 394.91 4 335.43 6 334.41 12 325 18 329 24 337 I disabled SMT, but did the "multicore" simulations anyway, and I am a little bit confused about the results... ¿why does wall time diminishes with more "processes" if number of cores is just 6? __________________ Best regards, Ezequiel Last edited by ekrumrick; November 12, 2018 at 16:36. Reason: Correction in Wall time (s)

November 15, 2018, 14:10	Hp dl380-g6	#157
fluidic_bob New Member Bob B Join Date: Oct 2018 Posts: 1 Rep Power: 0	HP DL380-G6 12x4GB PC3-10600R (running at 10,000MB/s), 2xX5650 on OpenFoam 5.x with Ubuntu Server 18.04.01 LTS kernel 4.15.0-20-generic Code: # cores Wall time (s): ------------------------ 1 1357.41 2 766.88 4 372.52 6 297.73 8 265.43 10 251.88 12 242.74

December 12, 2018, 16:19		#158
Morlind New Member Rob Join Date: Apr 2018 Posts: 18 Rep Power: 8	I have upgraded hardware after a couple test systems to find the right package. This is an extremely cost effective solution via my local, very helpful server refurb store. Dell R820 quad E5-4650 V2 2.4ghz, 128gb RAM 1333, Ubuntu 16.04, OpenFoam 5 cores Wall time 1 1147.32 2 587.49 4 247.01 6 173.57 8 127.48 12 93.17 16 73.78 20 64.45 40 55.05 I am most impressed since this is literally twice as fast as my dual V2 system but only cost a tiny amount more. I paid ~$2600 for this system and it's a real powerhouse! Last edited by Morlind; December 13, 2018 at 10:14.

December 13, 2018, 04:50		#159
Simbelmynë Senior Member Join Date: May 2012 Posts: 548 Rep Power: 15	That was really impressive. I would say that this is now probably the most cost-efficient solution right now. Is the dell motherboard limited to 1333 MHz memory? Most of the good 2690v2 results posted here have used the much faster 1866 MHz memory so if that works you have much room for improvement as well.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology	wyldckat	OpenFOAM	17	November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days	joegi.geo	OpenFOAM Announcements from Other Sources	0	October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016	cfd.direct	OpenFOAM Announcements from Other Sources	0	May 3, 2016 04:57
New OpenFOAM Forum Structure	jola	OpenFOAM	2	October 19, 2011 06:55
Hardware for OpenFOAM LES	LijieNPIC	Hardware	0	November 8, 2010 09:54