CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Community New Posts Updated Threads Search

Like Tree495Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 19, 2018, 00:32
Default
  #141
New Member
 
Join Date: Nov 2016
Posts: 15
Rep Power: 9
nmc1988 is on a distinguished road
Quote:
Originally Posted by Simbelmynė View Post
The 2690v2 is a great price/performance choice if you can accept buying a refurbished system!
Thank you, I have compared some benchmark in this thread and I have found that 2xE5 2695v2 is even slower than 2690v2 and 2667v2. The reason might be higher number of cores but slower frequency
See attached images
Attached Images
File Type: png e5.png (24.7 KB, 126 views)
File Type: png 2695v2.png (9.5 KB, 120 views)
nmc1988 is offline   Reply With Quote

Old   October 19, 2018, 04:30
Default
  #142
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
You can't compare results from different people without using a healthy error margin. There are just too many variables involved, other than the name of the CPU.
2690v2 or 2667v2 is just a matter of personal preference. They cost about the same, the latter has slightly higher performance per core.
The 12-core variants of this generation are usually too expensive and add little value for a CFD workstation.
nmc1988 likes this.
flotus1 is offline   Reply With Quote

Old   October 19, 2018, 22:39
Default
  #143
Member
 
Ed O'Malley
Join Date: Nov 2017
Posts: 30
Rep Power: 8
edomalley1 is on a distinguished road
Just got this system up and running - 2 x AMD Epyc 7301, 16x16GB 2666Mhz 2R RAM, Ubuntu 18.04, OpenFOAM 6.


Code:
# cores   Wall time (s):
------------------------
1        1039.55
2        505.02
4        233.76
6        157.64
8        116.84
12       85.11
16       62.09
20       58.6
24       50.54
28       46.54
32       46.57
I thought it would be a little faster given the 2666 memory. Also I find it puzzling that 32 cores was the same speed as 28. The screen turned off during the 32 run and I turned it back on... not sure if that would have effected it. I also had some small things running in the background - system monitor and settings.


Now if I could only get my Radeon WX 4100 driver to install without errors and causing me to have to re-install Ubuntu if I dare to restart with the driver installed. Seriously, I'm thinking about installing Windows 10. It's driving me nuts.
edomalley1 is offline   Reply With Quote

Old   October 20, 2018, 05:10
Default
  #144
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 548
Rep Power: 15
Simbelmynė is on a distinguished road
Seeing that you run OpenFOAM 6, did you modify the script?


Your setup is near identical to our setup (if you run Ubuntu 18.04.1). The difference is that we only have 128 GB RAM and a 1050 ti as GPU.



Linux and Radeon has traditionally been a pain and I always go with Nvidia for our Linux setups. Not sure if this is the cause of your problems though.



It seems your results are fine until you start to go heavily threaded. Perhaps the heat-sink of one CPU is poorly mounted, making it hit thermal throttling?


Try to check this with the "top" command.


There are also some hardware monitors for Ubuntu that you could try. Not sure how well they are adapted to the current generation CPUs though.
Simbelmynė is offline   Reply With Quote

Old   October 20, 2018, 10:59
Default
  #145
Member
 
Ed O'Malley
Join Date: Nov 2017
Posts: 30
Rep Power: 8
edomalley1 is on a distinguished road
Yes, I modified the script. The geometry section in SHM to the new format, location of the geometry in the allmesh files, of course the run.sh file to include runs up to 32 cores... also removed #include "streamlines", etc. If I don't do that it can't find the geometry and runs through the whole thing in like 8 seconds!

I'm running sensor-detect now so I can monitor temps - I really don't want to reinstall a CPU!

I'm seriously considering just using Windows 10 - not just because of the graphics card. I swap back and forth between Solidworks and OpenFOAM all the time and it will make work flow a lot faster. Right now I'm running Solidworks on a totally different machine.

I'll run this test on Windows too and see what the speed difference is.
edomalley1 is offline   Reply With Quote

Old   October 20, 2018, 11:55
Default
  #146
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
SMT is disabled? Socket interleaving for memory is off?
You could also check if all DIMMs are detected properly. Allegedly, this can be a problem with those SP3 sockets. In which case you would have to re-install the CPUs.
flotus1 is offline   Reply With Quote

Old   October 20, 2018, 16:14
Default
  #147
Member
 
Ed O'Malley
Join Date: Nov 2017
Posts: 30
Rep Power: 8
edomalley1 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
SMT is disabled? Socket interleaving for memory is off?
You could also check if all DIMMs are detected properly. Allegedly, this can be a problem with those SP3 sockets. In which case you would have to re-install the CPUs.
SMT and interleaving are both on Auto. For SMT, it's either Auto or none. But performance manager is always showing 64 cores so I think generally it is on.

Interleaving options are Auto, none, channel, die, socket. Which is best?

Also, all the DIMMS are detected.
edomalley1 is offline   Reply With Quote

Old   October 20, 2018, 18:08
Default
  #148
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
SMT needs to be set to "None". Auto just means on
For Interleaving you can test "channel" or "die" and see what works best for you or if it makes any difference at all. But the SMT setting should already do the trick.
flotus1 is offline   Reply With Quote

Old   October 21, 2018, 10:54
Default
  #149
Member
 
Ed O'Malley
Join Date: Nov 2017
Posts: 30
Rep Power: 8
edomalley1 is on a distinguished road
OK cool. I reran 24-32 cores same as before, just to make sure I get a similar result and the results for 32 improved by about 6 seconds. It seems like a more sensible result:


Code:
# cores   Wall time (s):
 ------------------------
 24 52.45
 28 47.36
 32 40.68
Then I turned SMT off and re-ran it:


Code:
# cores   Wall time (s):
 ------------------------
 24 50.2
 28 44.77
 32 38.87
Given Flotus1's results with a very similar system, but 2133 ram instead of 2666, this modest improvement in speed seems reasonable to me. It is still about 2 seconds slower than Simbelmyne's results with only half of the ram at the same speed though, which is puzzling. Is it just that memory bandwidth is still the bottleneck even if there is only 128 Gb of memory, so adding more memory just does not help?
edomalley1 is offline   Reply With Quote

Old   October 21, 2018, 11:46
Default
  #150
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Since this benchmark fits comfortably into 16GB of RAM, adding more does not help at all.
For best results, clear caches before running the benchmark and set the frequency governor to performance mode.

this requires root privileges
Code:
sync; echo 3 > /proc/sys/vm/drop_caches
this should not
Code:
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Clément_G likes this.
flotus1 is offline   Reply With Quote

Old   October 21, 2018, 12:16
Default
  #151
Member
 
Ed O'Malley
Join Date: Nov 2017
Posts: 30
Rep Power: 8
edomalley1 is on a distinguished road
Even better :


Code:
# cores   Wall time (s):
 ------------------------
 24 47.49
 28 41.48
 32 37.7
spaceprop likes this.
edomalley1 is offline   Reply With Quote

Old   October 25, 2018, 05:42
Default
  #152
Member
 
Join Date: Jan 2014
Posts: 32
Rep Power: 12
spaceprop is on a distinguished road
Quote:
Originally Posted by spaceprop View Post
Updating my earlier table with more results
More updates and hardware upgrades.

X-axis is iter/s. The processor(s) and RAM are called out in the graph. All RAM is dual rank, and if faster than max allowed by the processor, it has been downclocked. All OF+v1712, OpenMPI 3.1.0, CentOS 7.5. I applied the following BIOS settings to all: HT disabled (if applicable), anything related to maximum performance on, but no OC.



This one includes the results from my homelab 5-node cluster:



5.29 ips , which is almost perfect node scaling. FDR Infiniband <3

Pretty much maxed out the processors for the given hardware, so this is as fast as this will get without changing to newer servers and/or adding more nodes. I might be able to eek out a few seconds by clearing caches and setting the scaling_governor (as suggested by flotus1), and maybe by using renumberMesh, but unless OpenFOAM magically gets more efficient/faster, then there isn't much more I can do from the software side either.

I'm continually impressed by the performance of the EPYC's. Two dual 7351 (or bigger) servers with back-to-back FDR IB should be able to match/beat my whole cluster. I'd love to do that, but RAM prices are awful.

Has anyone seen the der8auer's (famous overclocker) EPYC build? He used a prototype Elmor Labs to EVC v2 to overclock dual 7601's to 4GHz (all cores) on a Supermicro H11DSI and an ASUS RS700a-e9. He had to use novec submersion or dry ice, but he was able to get a fairly good overclock with a water cooler. Cinebench scores are useless for us, but it'd be awesome if some people with EPYC builds could get their hands on one of the EVC v2 boards when they come out, OC, and run this benchmark.
spaceprop is offline   Reply With Quote

Old   November 12, 2018, 15:07
Default
  #153
New Member
 
Ezequiel Krumrick
Join Date: Mar 2014
Location: Argentina
Posts: 10
Rep Power: 12
ekrumrick is on a distinguished road
Hello all, I did the benchmark on OpenFOAM 5.0, Ubuntu 16.04 in an i7 8700, 16 GB DDR4 2400 MHz RAM, 240 GB SSD (+1TB HDD) with the following results:

Code:
# cores   Wall time (s): 
------------------------
 1 606.66
 2 394.91
 4 335.43
 6 334.41
12 325
18 329
24 337
I disabled SMT, but did the "multicore" simulations anyway, and I am a little bit confused about the results... æwhy does wall time diminishes with more "processes" if number of cores is just 6?
__________________
Best regards,

Ezequiel

Last edited by ekrumrick; November 12, 2018 at 16:36. Reason: Correction in Wall time (s)
ekrumrick is offline   Reply With Quote

Old   November 12, 2018, 15:35
Default
  #154
Member
 
Geir Karlsen
Join Date: Nov 2013
Location: Norway
Posts: 59
Rep Power: 13
gkarlsen is on a distinguished road
Quote:
Originally Posted by ekrumrick View Post
Hello all, I did the benchmark on OpenFOAM 5.0, Ubuntu 16.04 in an i7 8700, 16 GB DDR4 2400 MHz RAM, 240 GB SSD (+1TB HDD) with the following results:

Code:
# cores   Wall time (s): 
------------------------
 1 606.66
 2 394.91
 4 335.43
 6 334.41
12 161.76
18 109.35
24 84.1
36 43.09
I disabled SMT, but did the "multicore" simulations anyway, and I am a little bit confused about the results... æwhy does wall time diminishes with more "processes" if number of cores is just 6?
Probably the mpirun just fails as you are asking it to run with more cores than what is available. If you open the logfile you should find errors.
gkarlsen is offline   Reply With Quote

Old   November 12, 2018, 16:33
Default
  #155
New Member
 
Ezequiel Krumrick
Join Date: Mar 2014
Location: Argentina
Posts: 10
Rep Power: 12
ekrumrick is on a distinguished road
Quote:
Originally Posted by gkarlsen View Post
Probably the mpirun just fails as you are asking it to run with more cores than what is available. If you open the logfile you should find errors.

Thanks Geir, I checked the log files and found they are ok, but, what I informed as "Wall Time" is really "Execution Time", now I will correct this in my previous post and inform "Clock Time".
Regards,


Ezequiel
__________________
Best regards,

Ezequiel
ekrumrick is offline   Reply With Quote

Old   November 13, 2018, 05:09
Default
  #156
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 548
Rep Power: 15
Simbelmynė is on a distinguished road
Quote:
Originally Posted by ekrumrick View Post
Hello all, I did the benchmark on OpenFOAM 5.0, Ubuntu 16.04 in an i7 8700, 16 GB DDR4 2400 MHz RAM, 240 GB SSD (+1TB HDD) with the following results:

Code:
# cores   Wall time (s): 
------------------------
 1 606.66
 2 394.91
 4 335.43
 6 334.41
12 325
18 329
24 337
I disabled SMT, but did the "multicore" simulations anyway, and I am a little bit confused about the results... æwhy does wall time diminishes with more "processes" if number of cores is just 6?



If you have the possibility to overclock the memory you can reach much better results with your 8700k. It becomes bandwidth-limited over 4 cores so there is no need to run more processes.
ekrumrick likes this.
Simbelmynė is offline   Reply With Quote

Old   November 15, 2018, 14:10
Default Hp dl380-g6
  #157
New Member
 
Bob B
Join Date: Oct 2018
Posts: 1
Rep Power: 0
fluidic_bob is on a distinguished road
HP DL380-G6 12x4GB PC3-10600R (running at 10,000MB/s), 2xX5650 on OpenFoam 5.x with Ubuntu Server 18.04.01 LTS kernel 4.15.0-20-generic
Code:
# cores   Wall time (s):
------------------------
1 1357.41
2 766.88
4 372.52
6 297.73
8 265.43
10 251.88
12 242.74
fluidic_bob is offline   Reply With Quote

Old   December 12, 2018, 16:19
Default
  #158
New Member
 
Rob
Join Date: Apr 2018
Posts: 18
Rep Power: 8
Morlind is on a distinguished road
I have upgraded hardware after a couple test systems to find the right package. This is an extremely cost effective solution via my local, very helpful server refurb store.



Dell R820 quad E5-4650 V2 2.4ghz, 128gb RAM 1333, Ubuntu 16.04, OpenFoam 5


cores Wall time

1 1147.32

2 587.49

4 247.01

6 173.57

8 127.48

12 93.17

16 73.78

20 64.45

40 55.05


I am most impressed since this is literally twice as fast as my dual V2 system but only cost a tiny amount more. I paid ~$2600 for this system and it's a real powerhouse!

Last edited by Morlind; December 13, 2018 at 10:14.
Morlind is offline   Reply With Quote

Old   December 13, 2018, 04:50
Default
  #159
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 548
Rep Power: 15
Simbelmynė is on a distinguished road
That was really impressive. I would say that this is now probably the most cost-efficient solution right now. Is the dell motherboard limited to 1333 MHz memory? Most of the good 2690v2 results posted here have used the much faster 1866 MHz memory so if that works you have much room for improvement as well.
Simbelmynė is offline   Reply With Quote

Old   December 13, 2018, 11:59
Default
  #160
Rec
New Member
 
Sergey
Join Date: Jan 2018
Posts: 18
Rep Power: 8
Rec is on a distinguished road
I have AMD Ryzen 7 1800X 8-core, @ 3.6 GHz, 2 X 16GB Samsung DDR4, SSD M.2 950 EVO 256 Gb
I have installed UBUNTU 16.04 LTS and OpenFOAM 5.0.
When I run the test I get the following result:

# cores Wall time (s):
------------------------
1 893.2
2
4
6
8

I see in log from OpenFOAM/bench_template/run_2/ log.simpleFoam
I try this in Ubunty 18.4, Centos 7.6, can you help me?

Code:
Starting time loop

streamLine streamLines:
    automatic track length specified through number of sub cycles : 5

[1] 
[1] 
[1] --> FOAM FATAL ERROR: 
[1] Attempt to return primitive entry ITstream : IOstream.functions.streamLines.seedSampleSet, line 0, IOstream: Version 2.0, format ASCII, line 0, OPENED, GOOD
    primitiveEntry 'seedSampleSet' comprises 
        on line 0 the word 'uniform'
 as a sub-dictionary
[1] 
[1]     From function virtual const Foam::dictionary& Foam::primitiveEntry::dict() const
[1]     in file db/dictionary/primitiveEntry/primitiveEntry.C at line 189.
[1] 
FOAM parallel run aborting
[1] 
[0] 
[0] 
[0] --> FOAM FATAL ERROR: 
[0] Attempt to return primitive entry ITstream : /home/sergey/OpenFOAM/bench_template/run_2/system/controlDict.functions.streamLines.seedSampleSet, line 45, IOstream: Version 2.0, format ASCII, line 0, OPENED, GOOD
    primitiveEntry 'seedSampleSet' comprises 
        on line 45 the word 'uniform'
 as a sub-dictionary
[0] 
[0]     From function virtual const Foam::dictionary& Foam::primitiveEntry::dict() const
[0]     in file db/dictionary/primitiveEntry/primitiveEntry.C at line 189.
[0] 
FOAM parallel run aborting
[0] 
[1] #0  Foam::error::printStack(Foam::Ostream&)[0] #0  Foam::error::printStack(Foam::Ostream&) at ??:?
 at ??:?
[1] #1  Foam::error::abort()[0] #1  Foam::error::abort() at ??:?
[1] #2  Foam::primitiveEntry::dict() const at ??:?
[0] #2  Foam::primitiveEntry::dict() const at primitiveEntry.C:?
[1] #3  Foam::functionObjects::streamLine::read(Foam::dictionary const&) at primitiveEntry.C:?
[0] #3  Foam::functionObjects::streamLine::read(Foam::dictionary const&) at ??:?
[1] #4  Foam::functionObjects::streamLine::streamLine(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #4  Foam::functionObjects::streamLine::streamLine(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #5  Foam::functionObject::adddictionaryConstructorToTable<Foam::functionObjects::streamLine>::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #5  Foam::functionObject::adddictionaryConstructorToTable<Foam::functionObjects::streamLine>::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #6  Foam::functionObject::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #6  Foam::functionObject::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #7  Foam::functionObjects::timeControl::timeControl(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #7  Foam::functionObjects::timeControl::timeControl(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[1] #8  Foam::functionObjectList::read() at ??:?
[0] #8  Foam::functionObjectList::read() at ??:?
[1] #9  Foam::Time::loop() at ??:?
[0] #9  Foam::Time::loop() at ??:?
[1] #10  Foam::simpleControl::loop() at ??:?
[0] #10  Foam::simpleControl::loop() at ??:?
[1] #11   at ??:?
[0] #11  ?? at ??:?
[1] #12  __libc_start_main at ??:?
[0] #12  __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6"
[1] #13  ? in "/lib/x86_64-linux-gnu/libc.so.6"
[0] #13  --------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 at ??:?
? at ??:?
[kb-4:14244] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[kb-4:14244] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Rec is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 04:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 06:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 09:54


All times are GMT -4. The time now is 04:31.