CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

New 128 mini cluster - Cascade Lake SP or EPYC Rome?

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree2Likes
  • 2 Post By SLC

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 21, 2019, 16:28
Default New 128 mini cluster - Cascade Lake SP or EPYC Rome?
  #1
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 13
SLC is on a distinguished road
I'm moving up from a 30ish core setup to a 120ish core setup.

I've already got 2 Skylake SP nodes (Dual Xeon 6146) with a total of 44 cores. I've had licenses to run on 36 of these cores.

I can now spend ca. €40,000 on expanding my compute setup (am in Norway, stuff is expensive here!). In addition I'll be expanding my HPC license so that I can run on up to 132 cores (3 ANSYS HPC Packs). Am running Windows Server.

I have two options
  • Keep my existing Skylake nodes, and add an additional 4 nodes each consisting of 24 cores:
    • 2 x Intel Xeon Gold 6246 12c, all core turbo 4.1 GHz
    • Six channel memory @ 2933 MHz, theoretical bandwidth 131.13 GiB/s
    • 12 x 8 GB DDR4 2933 single rank

  • "Ditch" my Skylake nodes, and purchase 4 nodes each consisting of 32 cores:
    • 2 x EPYC 7302 16c, turbo 3.3 GHz
    • Eight channel memory @ 3200 MHz, theoretical bandwidth 190.7 GiB/s
    • 16 x 16 GB DDR4 3200 dual rank


If I stick with Intel I'll end up with 144 cores spread across 6 nodes. If I ditch Intel and go with EPYC I'll end up with 128 cores spread across 4 nodes. Licensing wise I'll be able to run on 132. Infiniband interconnect.

The two options above cost approximately the same. If I go with EPYC I'll probably keep one of my existing Skylake nodes to use as a head/storage node.

The Skylake SP CPUs only really scale decently up to 9 - 10 cores per CPU. So my 6 node Intel setup will probably only scale decently up to 120 cores.

The Epyc Rome benchmarks in the OpenFOAM thread are pretty spectacular, and indicate pretty good scaling up to 32 cores on a dual cpu single node. So the EPYC setup would most likely scale well all the way to 128 cores in my 4 node setup.

If you take some artistic liberties with the numbers in that thread I'd say that a compute setup based on the EPYC 7302 is approx. 20 % faster than a Skylake SP setup for an equivalent amount of cores.

ctd's post here: OpenFOAM benchmarks on various hardware
Code:
2X EPYC 7302, 16x16GB 2Rx8 DDR4-3200 ECC, Ubuntu 18.04.3, OF v7
# cores   Wall time (s):
 ------------------------
1 711.73
2 345.65
4 164.97
8 84.15
12 55.9
16 47.45
20 38.14
24 34.21
28 30.51
32 26.89
Hrushi's post here: OpenFOAM benchmarks on various hardware
Code:
2 x Intel Xeon Gold 6136, 12 * 16 GB DDR4 2666MHz, Ubuntu 16.04 LTS,
# cores   Wall time (s):
------------------------
1             874.54
2             463.34
4             205.23
6             137.95
8             106.04
12             74.63
16             61.09
20             53.26
24             49.17
If we rudimentally scale the dual Xeon Gold 6136 (3.6 GHz all core turbo) to an equivalent Xeon Gold 6146 speed (3.9 GHz), you're gonna get a benchmark time of 56.4s and 45.4s for 16 and 24 cores respectively.

This compares to times of 47.5s and 34.21 for 16 and 24 cores on the dual EPYC. Thus the 2 x EPYC 7302 is ca. 16 - 25 % faster compared to the 2 x Xeon 6146 for the same number of cores on a single node.


I haven't found any numbers indicating the improvement of Cascade Lake vs Skylake Xeons, so it's hard to say where exactly Cascade Lake stands.


Appreciate your thoughts! What would you guys do with the €40,000?

Last edited by SLC; November 22, 2019 at 12:49. Reason: Mixed up cores/nodes in a couple of places :-)
SLC is offline   Reply With Quote

Old   November 22, 2019, 09:12
Default
  #2
Senior Member
 
Micael
Join Date: Mar 2009
Location: Canada
Posts: 156
Rep Power: 17
Micael is on a distinguished road
Epyc system should be faster and cheaper. Faster as you have exposed, cheaper because those EPYC CPU are much cheaper (I am surprise you found both option are about same price).

Also, I think you mixed up the words "node" and "core" in few places in your post.
Micael is offline   Reply With Quote

Old   November 22, 2019, 12:54
Default
  #3
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 13
SLC is on a distinguished road
Quote:
Originally Posted by Micael View Post
Epyc system should be faster and cheaper. Faster as you have exposed, cheaper because those EPYC CPU are much cheaper (I am surprise you found both option are about same price).

Also, I think you mixed up the words "node" and "core" in few places in your post.
Yes, yes I did. Fixed now, thanks.

The CPUs are cheaper, but part of it is made up for a few more sticks of ram for the EPYC systems.

Got an updated price offer today on the systems as described above, and the EPYC machines end up approx. 12.5 % cheaper than the Intel Xeon systems.

One thing that could be troubling is EPYC performance on Windows Server (will be running Windows OS on bare metal). Linux just isnít an option for me, I would have no idea what I was doing
SLC is offline   Reply With Quote

Old   November 22, 2019, 14:24
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,308
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
The difference is more than just a few more DIMMs for Epyc. You configured 12x8GB vs 16x16GB. A mistake? Intel CPUs will also benefit from 2 ranks per channel.
Epyc 2nd gen should be much easier to run on Windows, compared to first gen. At least when configured with only one NUMA domain per CPU.
Tread with caution when using the results in out OpenFOAM benchmark thread. There were huge variances between similar setups, just based on who ran them. And there were outliers which turned out to be invalid results. So with only one result for Epyc 2nd gen, take it with a grain of salt.
flotus1 is offline   Reply With Quote

Old   November 23, 2019, 09:43
Default
  #5
Senior Member
 
Micael
Join Date: Mar 2009
Location: Canada
Posts: 156
Rep Power: 17
Micael is on a distinguished road
I would consider building around "Supermicro A+ Server 2124BT-HTR - 2U BigTwin". Might be the most cost effective.
Micael is offline   Reply With Quote

Old   November 23, 2019, 14:45
Default
  #6
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 13
SLC is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
The difference is more than just a few more DIMMs for Epyc. You configured 12x8GB vs 16x16GB. A mistake? Intel CPUs will also benefit from 2 ranks per channel.
Epyc 2nd gen should be much easier to run on Windows, compared to first gen. At least when configured with only one NUMA domain per CPU.
Tread with caution when using the results in out OpenFOAM benchmark thread. There were huge variances between similar setups, just based on who ran them. And there were outliers which turned out to be invalid results. So with only one result for Epyc 2nd gen, take it with a grain of salt.
You’re right, it was just the way the Intel nodes were specced by my sales rep.

If i put in 12 x 16 GB dual rank sticks in the Intel nodes then they become 23% more expensive than the Epyc nodes. So it’s quite a significant saving. By going for 4 EPYC nodes I’ve saved enough money to pay for my infiniband setup (switch + NICs).

Would you personally go for Xeon or Epyc nodes flotus? I see that the max memory bandwidth is achieved using NPS4, I guess that will present 8 NUMA domains to Windows OS. Not sure if that’s problematic for Fluent/CFX.

It’s a fair warning to treat the benchmark result with care, but there’s not much else out there.

There is this benchmark from AMD on Fluent, where they state that a 2 x EPYC Rome 7542 32-core (64 core total) is 62 % faster than a 2 x Xeon Gold 6248 20-core (40 core total) setup. Of course they gleefully ignore the fact that the EPYC system has 60 % more cores and so “ought” to be 60 % faster https://www.amd.com/system/files/doc...SYS-FLUENT.pdf

Last edited by SLC; November 24, 2019 at 02:20.
SLC is offline   Reply With Quote

Old   November 24, 2019, 02:19
Default
  #7
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 13
SLC is on a distinguished road
Quote:
Originally Posted by Micael View Post
I would consider building around "Supermicro A+ Server 2124BT-HTR - 2U BigTwin". Might be the most cost effective.
Fair recommendation. Unfortunately I have to purchase from Dell.
SLC is offline   Reply With Quote

Old   November 24, 2019, 11:55
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,308
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Would you personally go for Xeon or Epyc nodes flotus? I see that the max memory bandwidth is achieved using NPS4, I guess that will present 8 NUMA domains to Windows OS. Not sure if that’s problematic for Fluent/CFX
I personally am glad that I don't have to make that decision
With a Linux operating system, it would be much easier to decide. Current Xeons can't hold a candle against Epyc 2nd gen for CFD. Due to the lack of benchmarks on Windows, I can't make a clear recommendation. Other than switching to Linux of course.
For maximum performance in NUMA-Aware applications like Fluent, Epyc Rome CPUs need to be configured in NPS4 mode. Which will result in 4 NUMA nodes per CPU presented to the OS.
flotus1 is offline   Reply With Quote

Old   December 16, 2019, 16:25
Default
  #9
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 13
SLC is on a distinguished road
Little update.

I am awaiting to receive one of each of the following machines for testing and benchmark purposes::

Dell PowerEdge R640
  • 2 x Intel Xeon Gold “Cascade Lake” 6246 12c, all core turbo 4.1 GHz
  • 12 x 16GB RDIMM, 2933MT/s, Dual Rank

Dell PowerEdge R6525
  • 2 x AMD EPYC 7302 16c, turbo 3.3 GHz
  • 16 x 16GB RDIMM, 3200MT/s, Dual Rank

I’ll post Fluent benchmark results (on Windows Server 2109) as soon as I can. Will probably be in about a months time as the lead times on the machines from Dell is several weeks at this point.
Micael and flotus1 like this.
SLC is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Other] mesh airfoil NACA0012 anand_30 OpenFOAM Meshing & Mesh Conversion 13 March 7, 2022 17:22
AMD Epyc Mini Cluster Hardware for StarCCM+ clearsign Hardware 1 April 24, 2019 16:28
[blockMesh] non-orthogonal faces and incorrect orientation? nennbs OpenFOAM Meshing & Mesh Conversion 7 April 17, 2013 05:42
[blockMesh] error message with modeling a cube with a hold at the center hsingtzu OpenFOAM Meshing & Mesh Conversion 2 March 14, 2012 09:56
channelFoam for a 3D pipe AlmostSurelyRob OpenFOAM 3 June 24, 2011 13:06


All times are GMT -4. The time now is 12:03.