CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM benchmarks on various hardware

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree549Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 25, 2024, 15:54
Default Performance of Epyc Turin
  #801
Member
 
dab bence
Join Date: Mar 2013
Posts: 48
Rep Power: 13
danbence is on a distinguished road
An AMD performance brief compares the Turin 9755 (2x128) with the Genoa 9654 (2x96) and shows a 43% uplift on the composite OpenFoam benchmarks they chose.

http://www.amd.com/content/dam/amd/e...b-openfoam.pdf
danbence is offline   Reply With Quote

Old   October 29, 2024, 02:41
Default
  #802
New Member
 
Marc
Join Date: Mar 2022
Posts: 6
Rep Power: 4
gumersindu is on a distinguished road
Hi all,

I'm trying to run the benchmark attached on the original post named "bench_template.tar.gz" on my PC: 2 x Intel Xeon E5-2690 v4 (14 cores, 2.6 GHz, 35Mb L3) | 8 x 16GB DDR4 ECC | 1TB HDD | Ubuntu 24.04 LTS | openFOAM-2312

It looks like snappyHexMesh is failing to create the mesh. Is there an updated version maybe?
gumersindu is offline   Reply With Quote

Old   October 29, 2024, 04:18
Default
  #803
Senior Member
 
andy
Join Date: May 2009
Posts: 314
Rep Power: 18
andy_ is on a distinguished road
Quote:
Originally Posted by gumersindu View Post
It looks like snappyHexMesh is failing to create the mesh. Is there an updated version maybe?
It didn't work for me either (Ubuntu 24.04) but the test case seemed to be one of the tutorials with, if memory serves, an increased grid density so I ran that. I'm not in the office at the moment but will look for the script I used when I am back.
andy_ is offline   Reply With Quote

Old   October 29, 2024, 06:53
Default
  #804
Senior Member
 
andy
Join Date: May 2009
Posts: 314
Rep Power: 18
andy_ is on a distinguished road
OK some of what I did is coming back but I am not an openfoam user and was hacking to get something working rather than carefully setting up a benchmark.

Version 11 and 12 of openfoam are organised significantly differently and require different scripts. Don't know about earlier versions.

I ran version 12 something like this (change the list of number of processors for the mesh, solver and writing the timing to taste):

Code:
#!/bin/bash
# PREPROCS="1 2 4 8 16"
# RUNPROCS="1 2 4 8 16"
PREPROCS=""
RUNPROCS="1"
TIMPROCS="1 2 4 8 16"

# Prepare cases
# This example runs on 1, 2 and 4 cores
for i in $PREPROCS; do
    d=run_$i
    echo "Prepare case ${d}..."
    cp -r basecase $d
    cd $d

    cp $FOAM_TUTORIALS/resources/geometry/motorBike.obj.gz constant/geometry/
    surfaceFeatures > log.surfaceFeatures 2>&1
    blockMesh > log.blockMesh 2>&1

    if [ $i -eq 1 ] 
    then
        snappyHexMesh -overwrite > log.snappyHexMesh 2>&1
    else
        sed -i "s/numberOfSubdomains.*/numberOfSubdomains ${i};/" system/decomposeParDict
        decomposePar -copyZero > log.decomposePar 2>&1
        mpirun -np ${i} snappyHexMesh -overwrite -parallel > log.snappyHexMesh 2>&1
    fi
    cd ..
done

# Run cases
for i in $RUNPROCS; do
    echo "Run for ${i}..."
    cd run_$i
    if [ $i -eq 1 ] 
    then
        potentialFoam > log.potentialFoam 2>&1
        foamRun -solver incompressibleFluid > log.incompressibleFluid 2>&1
    else
        # mpirun -np ${i} patchSummary  -parallel > log.patchSummary 2>&1
        mpirun -np ${i} potentialFoam -parallel > log.potentialFoam 2>&1
        mpirun -np ${i} foamRun -solver incompressibleFluid -parallel > log.incompressibleFluid 2>&1
        reconstructPar -latestTime > log.reconstructPar 2>&1

        # foamRun -solver incompressibleFluid -parallel
        #mpirun -np ${i} foamRun -solver incompressibleFluid -parallel > log.simpleFoam 2>&1
    fi
    cd ..
done

# Extract times
echo "# cores   Wall time (s):"
echo "------------------------"
for i in $TIMPROCS; do
    echo $i `grep Execution run_${i}/log.incompressibleFluid | tail -n 1 | cut -d " " -f 3`
done
and version 11 something like this:

Code:
#!/bin/bash

# Prepare cases
# This example runs on 1, 2 and 4 cores
for i in 1 2 4; do
    d=run_$i
    echo "Prepare case ${d}..."
    cp -r basecase $d
    cd $d
    if [ $i -eq 1 ] 
    then
        mv Allmesh_serial Allmesh
    fi
    sed -i "s/method.*/method scotch;/" system/decomposeParDict
    sed -i "s/numberOfSubdomains.*/numberOfSubdomains ${i};/" system/decomposeParDict
    time ./Allmesh
    cd ..
done

# Run cases
for i in 1 2 4; do
    echo "Run for ${i}..."
    cd run_$i
    if [ $i -eq 1 ] 
    then
        simpleFoam > log.simpleFoam 2>&1
    else
        mpirun -np ${i} simpleFoam -parallel > log.simpleFoam 2>&1
    fi
    cd ..
done

# Extract times
echo "# cores   Wall time (s):"
echo "------------------------"
for i in 1 2 4; do
    echo $i `grep Execution run_${i}/log.simpleFoam | tail -n 1 | cut -d " " -f 3`
done
The basecase was the motorBikeSteady tutorial with the following changes:

system/controlDict:
< endTime 500;
> endTime 100;

system/blockMeshDict:
< hex (0 1 2 3 4 5 6 7) (20 8 8) simpleGrading (1 1 1)
> hex (0 1 2 3 4 5 6 7) (40 16 16) simpleGrading (1 1 1)

system/decomposeParDict:
< numberOfSubdomains 16;
> numberOfSubdomains 2;

The first two change the number of iterations and the mesh size hints to match. Not sure about the 3rd but I may have been fiddling. I am not an openfoam user and was making guesses at the likely meaning of parameters. What is really needed is for an openfoam user to diff the earlier benchmark and the current tutorial and keep the parameter changes that are relevant. Whatever, my benchmark results are broadly in line with expectations and so if there are differences they are small.

Last edited by andy_; October 29, 2024 at 12:41.
andy_ is offline   Reply With Quote

Old   October 29, 2024, 19:56
Default
  #805
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by andy_ View Post
OK some of what I did is coming back but I am not an openfoam user and was hacking to get something working rather than carefully setting up a benchmark.

Version 11 and 12 of openfoam are organised significantly differently and require different scripts. Don't know about earlier versions.

I ran version 12 something like this .....

The first two change the number of iterations and the mesh size hints to match. Not sure about the 3rd but I may have been fiddling. I am not an openfoam user and was making guesses at the likely meaning of parameters. What is really needed is for an openfoam user to diff the earlier benchmark and the current tutorial and keep the parameter changes that are relevant. Whatever, my benchmark results are broadly in line with expectations and so if there are differences they are small.
I did not see your result. From the looks of it you made the correct changes.


I find it extremely annoying that the basic call for the simpleFoam solution has been changed by openfoam.org. They remained the same for OpenFoam.com. (OpenFOAM v2312). I remember loosing all interest in the ruby language because they kept changing the language so that you had to rewrite your code for every version. Developers that don't know that better is the enemy of good, should be shot to save us all a lot of time.
Crowdion likes this.
wkernkamp is offline   Reply With Quote

Old   October 30, 2024, 06:24
Default
  #806
Senior Member
 
andy
Join Date: May 2009
Posts: 314
Rep Power: 18
andy_ is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
I did not see your result. From the looks of it you made the correct changes.
My results are a few posts up on the previous page and thanks for the confirmation.
andy_ is offline   Reply With Quote

Old   October 30, 2024, 22:24
Default
  #807
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by andy_ View Post
My results are a few posts up on the previous page and thanks for the confirmation.
I looked at your previous posts. I was confused because some are quotes of other people 's results. Is your current system the one doing just over 100 seconds? If it is, you can do ~64 seconds with two 16+ cores. My fastest one does 60 seconds. They have the same total 8 memory channels at 2400 MT/s as you. However, they have much more L3 cache and that is a factor too. Upgrade your bios to the latest before installing the high core count cpus!
andy_ likes this.
wkernkamp is offline   Reply With Quote

Old   October 31, 2024, 05:35
Default
  #808
New Member
 
Marc
Join Date: Mar 2022
Posts: 6
Rep Power: 4
gumersindu is on a distinguished road
Hi all,

I finally modified the motorbike tutorial to match the same configuration as in the benchmark from the original post. I've attached the modified code which worked for v2312.

These are the results I got for this PC config: HP Z840 | 2 x Intel Xeon E5-2690 v4 (14 cores, 2,6 GHz, 35Mb L3) | 8 x 16GB DDR4 ECC | 1TB HDD | Ubuntu 24.04 LTS

Code:
cores  MeshTime(s)     RunTime(s)     
-----------------------------------
1      1403.79         1098.68        
2      949.89          551.16         
4      495.73          246.11         
6      361.35          163.72         
8      293.58          128.46         
12     244.06          99.28          
16     229.99          84.12          
20     186.59          78.14          
24     183.3           74.44          
28     177.25          72.7
Attached Files
File Type: gz benchmark_v2312.tar.gz (9.6 KB, 11 views)
Kolan, wkernkamp and Crowdion like this.
gumersindu is offline   Reply With Quote

Old   November 10, 2024, 15:56
Default Mac M4 Clusters ?
  #809
Member
 
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7
linuxguy123 is on a distinguished road
The new Mac Mini M4 is very fast and really cheap and can be purchased with a 10 GB Ethernet port.

How would a cluster (8 or 16) of Minis perform using a 10 GB Ethernet backbone ?

The plain Mini also has a Thunderbolt 4 port that can transfer data at 40Gb/s while the Pro Mini has a Thunderbolt 5 port that can transfer data at 120Gb/s. I bet that a special router could be designed to give these machines incredible backbone bandwidth. Thunderbolt encapsulates PCIe. https://en.wikipedia.org/wiki/Thunderbolt_(interface)
bigphil likes this.
linuxguy123 is offline   Reply With Quote

Old   November 10, 2024, 17:40
Default
  #810
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14
wkernkamp is on a distinguished road
Quote:
Originally Posted by linuxguy123 View Post
The new Mac Mini M4 is very fast and really cheap and can be purchased with a 10 GB Ethernet port.

How would a cluster (8 or 16) of Minis perform using a 10 GB Ethernet backbone ?

The plain Mini also has a Thunderbolt 4 port that can transfer data at 40Gb/s while the Pro Mini has a Thunderbolt 5 port that can transfer data at 120Gb/s. I bet that a special router could be designed to give these machines incredible backbone bandwidth. Thunderbolt encapsulates PCIe. https://en.wikipedia.org/wiki/Thunderbolt_(interface)
10 Gb ethernet is plenty fast for a small cluster. Get one and publish benchmark result?
bigphil likes this.
wkernkamp is offline   Reply With Quote

Old   November 11, 2024, 07:01
Default
  #811
Senior Member
 
andy
Join Date: May 2009
Posts: 314
Rep Power: 18
andy_ is on a distinguished road
Quote:
Originally Posted by linuxguy123 View Post
The new Mac Mini M4 is very fast and really cheap and can be purchased with a 10 GB Ethernet port.
About 15 years ago or so Apple brought out and advertised their "really fast" consumer chip when I was about to buy a cluster for the department I was working in at the time. I think it was a powerpc chip (?) but cheapened for the consumer market. Despite my skepticism the head of department was a decades old Apple fan and I felt obliged to benchmark it.

So I contacted Apple to get some values for relevant benchmarks rather than the irrelevant ones PC publications tended to use and Apple was using in their advertising to demonstrate how much "faster" their chip was compared to current intel chips. They didn't have any. So I asked to be put through to their internal technical support. Extraordinarly (to naive me) they didn't have that either. Technical support was provided by 3rd parties and so they put me through to a chain of shops which indeed had a small technical support department. Unfortunately it was technical support for what Apple customers tend to want to do with Apple computers (e.g. generating media using point and click) rather than crunching numbers. They were happy to give me access to the hardware but they had little idea what I was on about and when I sat down to compile and run some benchmarks the Apple development environment had not even been installed. As expected the benchmarks ran fast on tiny problems but slowly on normal sized problems. The department's cluster ended up using fairly expensive motherboards with fast memory support and the cheapest intel chips (i.e. lowest clockspeed) that supported it.

Given how Apple operates, their target market and how they price things the possibility of any Apple hardware offering a general high technical performance for the money is pretty low. It is not zero though and given the effectiveness of their marketing people looking to purchase clusters to crunch numbers will benefit from relevant hard evidence (unless they are fanboys of course). Clusters of ARM chips may well be about to become a good choice for CFD but I rather doubt Apple will be the supplier because of their pricing.

Perhaps I should add that Apple may be a reasonable choice for a desktop if CFD is only part of what is done with the machine. Indeed for 6 years I used an Apple laptop for office, lab and presenting but less so for software development or running simulations like CFD.
andy_ is offline   Reply With Quote

Old   November 11, 2024, 12:43
Default
  #812
Member
 
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by andy_ View Post
About 15 years ago
A lot has changed since then.

Quote:
So I contacted Apple to get some values for relevant benchmarks rather than the irrelevant ones PC publications tended to use and Apple was using in their advertising to demonstrate how much "faster" their chip was compared to current intel chips.
It is very easy to compare computers on various benchmarks these days without relying on the manufacturer to do so. Geekbench, for example.

Quote:
They didn't have any. So I asked to be put through to their internal technical support. Extraordinarly (to naive me) they didn't have that either. Technical support was provided by 3rd parties and so they put me through to a chain of shops which indeed had a small technical support department. Unfortunately it was technical support for what Apple customers tend to want to do with Apple computers (e.g. generating media using point and click) rather than crunching numbers. They were happy to give me access to the hardware but they had little idea what I was on about and when I sat down to compile and run some benchmarks the Apple development environment had not even been installed. As expected the benchmarks ran fast on tiny problems but slowly on normal sized problems.
What does this have to do with anything today ?

Quote:
Given how Apple operates, their target market and how they price things the possibility of any Apple hardware offering a general high technical performance for the money is pretty low.
Let me introduce you to the Mac Mini. https://www.apple.com/mac-mini/

10 cores, 16 GB RAM, 256 GB SSD, 3 Thunderbolt 4 ports, US$600. Can add 10 GB Ethernet for $125. Please show me a faster unit of computing for less money.

Quote:
It is not zero though and given the effectiveness of their marketing people looking to purchase clusters to crunch numbers will benefit from relevant hard evidence (unless they are fanboys of course). Clusters of ARM chips may well be about to become a good choice for CFD but I rather doubt Apple will be the supplier because of their pricing.

I am not an Apple fan. I don't own a single Apple product. I'm just looking for the cheapest way to run CFD cases fast. If you can show why an M4 Mac Mini won't do that then I am all ears. Otherwise you are adding nothing to this conversation.
linuxguy123 is offline   Reply With Quote

Old   November 11, 2024, 12:54
Default
  #813
Member
 
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
10 Gb ethernet is plenty fast for a small cluster. Get one and publish benchmark result?
I have never run a cluster. How does one predict the performance of a cluster given the performance of a single machine within that cluster ?

How big is a "small" cluster ? 10 nodes ? 20 ? 32 ?

M4 Mac Minis supposedly have a memory bandwidth of "120 GB/s". M4 Mac Mini Pros supposedly have "over half a terrabyte/sec" of memory bandwidth . https://en.wikipedia.org/wiki/Apple_M4

AMD EPYC Rome (Zen2, 7002) has a memory bandwidth of ~ 200 GB/sec (single socket). 8 Channels of DDR4-3200.
linuxguy123 is offline   Reply With Quote

Old   November 12, 2024, 20:01
Default
  #814
Senior Member
 
Join Date: Jun 2016
Posts: 102
Rep Power: 10
xuegy is on a distinguished road
M4 Mac mini base model 4P+6E

# cores Wall time (s):
------------------------
1 315.54
2 191.29
4 118.64
8 111.61

The efficiency core is kind of useless. Can't wait to see M4 Pro/M4 Max results.
xuegy is offline   Reply With Quote

Old   November 12, 2024, 20:10
Default
  #815
Member
 
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by xuegy View Post
M4 Mac mini base model 4P+6E

# cores Wall time (s):
------------------------
1 315.54
2 191.29
4 118.64
8 111.61

The efficiency core is kind of useless. Can't wait to see M4 Pro/M4 Max results.
111 seconds is pretty good for a $600 off the shelf box. The Pro is supposed to have a lot more bandwidth.

10 M4 Minis in a cluster would be 15 seconds ?
linuxguy123 is offline   Reply With Quote

Old   November 12, 2024, 21:05
Default
  #816
Senior Member
 
Join Date: Jun 2016
Posts: 102
Rep Power: 10
xuegy is on a distinguished road
Quote:
Originally Posted by linuxguy123 View Post
111 seconds is pretty good for a $600 off the shelf box. The Pro is supposed to have a lot more bandwidth.

10 M4 Minis in a cluster would be 15 seconds ?
Better wait for the new Mac Studio with M4 Max.
xuegy is offline   Reply With Quote

Old   November 12, 2024, 23:04
Default
  #817
Member
 
Guy
Join Date: Jun 2019
Posts: 44
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by xuegy View Post
Better wait for the new Mac Studio with M4 Max.

Not sure it will be more cost competitive. It will be faster but also probably 4-5x more expensive.


We'll see. We live in interesting times.
linuxguy123 is offline   Reply With Quote

Old   November 12, 2024, 23:07
Default
  #818
Senior Member
 
Join Date: Jun 2016
Posts: 102
Rep Power: 10
xuegy is on a distinguished road
Quote:
Originally Posted by linuxguy123 View Post
Not sure it will be more cost competitive. It will be faster but also probably 4-5x more expensive.


We'll see. We live in interesting times.
I would value it based on memory bandwidth. $100 per 20GB/s is OK.
DVSoares likes this.
xuegy is offline   Reply With Quote

Old   November 15, 2024, 05:31
Default
  #819
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 43
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
Quote:
Originally Posted by xuegy View Post
Better wait for the new Mac Studio with M4 Max.
It seems that M3 max enlarge the L3 (LLC?) cache to 64M and practical (CPU only?) memory bandwidth is higher than M2 max.

Really hope M4 does the same

It's a pitty linux-asahi works only on M2.
aparangement is offline   Reply With Quote

Old   November 17, 2024, 18:32
Default
  #820
New Member
 
Kevin Nolan
Join Date: Nov 2012
Posts: 13
Rep Power: 14
Kolan is on a distinguished road
So I've gotten my hands on a dual E5-2630-v3 Xeon Workstation with 128 (8x16) gigs of ECC 2133 MHz RAM. It's an ASUS Z10PA-D8 motherboard.

I've installed Ubunto 24.04 and OpenFOAM 2406 and ran gumersindu's updated benchmark.

Code:
cores  MeshTime(s)     RunTime(s)     
-----------------------------------
1      1692.75         1095.52        
2      1161.71         561.19         
4      575.49          252.13         
6      449.91          172.21         
8      371.42          140.73         
12     296.43          111.46         
16     272.53          98.67

I've also got an M4 Pro (12 Core 48GB) Mac Mini on the way.

For reference here is my M3 Max again.


Code:
cores  MeshTime(s)     RunTime(s)     
-----------------------------------
1      510.43          377.13         
2      311.33          209.7          
4      195.35          110.33         
6      145.09          77.5     
8      124.87          63.6                 
12     125.53          81.98
I'll post the M4 Pro when it arrives in a few weeks.
bigphil, aparangement and Crowdion like this.

Last edited by Kolan; November 17, 2024 at 19:11. Reason: updated 6 and 12 core runs for the M3 Max
Kolan is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 16:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 20:20
OpenFOAM Training Beijing 22-26 Aug 2016 cfd.direct OpenFOAM Announcements from Other Sources 0 May 3, 2016 05:57
New OpenFOAM Forum Structure jola OpenFOAM 2 October 19, 2011 07:55
Hardware for OpenFOAM LES LijieNPIC Hardware 0 November 8, 2010 10:54


All times are GMT -4. The time now is 08:18.