CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Hardware

Epyc vs Xeon Skylake SP

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   July 11, 2017, 15:59
Default Epyc vs Xeon Skylake SP
  #1
New Member
 
Join Date: May 2013
Posts: 12
Rep Power: 6
hpvd is on a distinguished road
Epyc vs Xeon Skylake SP
- which one to choose?

First (non CFD) benchmarks are available:

full article:
http://www.anandtech.com/show/11544/...-of-the-decade

deeplink to begin of benchmarks:
http://www.anandtech.com/show/11544/...-the-decade/11

What do you think??
hpvd is offline   Reply With Quote

Old   July 11, 2017, 16:58
Default
  #2
New Member
 
Join Date: May 2013
Posts: 12
Rep Power: 6
hpvd is on a distinguished road
just copied this

Quote:
Originally Posted by kyle View Post
Found a Euler3d benchmark for Skylake SP:

https://hothardware.com/reviews/inte...-review?page=6

Still nothing for EPYC that I see
hpvd is offline   Reply With Quote

Old   July 13, 2017, 08:14
Default
  #3
New Member
 
Join Date: May 2013
Posts: 12
Rep Power: 6
hpvd is on a distinguished road
Skylake SP's AVX512 Units seems to have some advantages also in fluent (v18.1).
- or is it just the 6Ch Memory??

Quote:
In Fluent, we’ve added support for Intel® Advanced Vector Extensions 2 (AVX2) optimized binary, so that we can take better advantage of the advanced vector processing capabilities of Intel Xeon processors. Our benchmark results also show the Intel Xeon Gold 6148 processor boosts performance for ANSYS Fluent 18.1 by up to 41% versus a previous-generation processor — and provides up to 34% higher performance per core.
In detail:
Xeon Gold 6148: 20c@2,4Ghz 2 AVX512 Units
vs
E5-2967v4: 18c@2,3Ghz


http://www.ansys-blog.com/boost-ansy...-technologies/
hpvd is offline   Reply With Quote

Old   July 13, 2017, 10:35
Default
  #4
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,488
Rep Power: 25
flotus1 will become famous soon enoughflotus1 will become famous soon enough
I hardly think these performance gains are due to AVX. Last time I checked AVX2 instructions performance for one of our LBM codes, performance gains were in the low single digits. Which is to be expected in a memory bound workload.
6xDDR4-2666 vs 4xDDR4-2400 is 67% more theoretical memory bandwidth. Thats all there is to these performance gains. But it would not be clever to highlight this in a marketing publication because AMD Epyc is even better in exactly this metric. So they might try to pin the performance gains to AVX instead.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   July 13, 2017, 13:19
Default
  #5
New Member
 
Join Date: May 2013
Posts: 12
Rep Power: 6
hpvd is on a distinguished road
just the same what I was thinking when reading this....
hpvd is offline   Reply With Quote

Old   July 14, 2017, 03:24
Default
  #6
Senior Member
 
Onno
Join Date: Jan 2012
Location: Germany
Posts: 101
Rep Power: 8
Kaskade is on a distinguished road
Sorry, if I'm highjacking the topic, but I've been loooking into getting a new server/cluster and I have two quick questions, you seem to able to answer.

1) If I have a multi-socket-server, do the CPUs share the memory bandwidth?

2) Is there some rule to determine if the interconnect or the memory bandwidth will be the bottle neck? Can two servers connected by infiniband be faster than a single server, given the same number of cores?
Kaskade is offline   Reply With Quote

Old   July 14, 2017, 07:14
Default
  #7
Senior Member
 
Onno
Join Date: Jan 2012
Location: Germany
Posts: 101
Rep Power: 8
Kaskade is on a distinguished road
Some more research and I answered my own question: each socket has it's own memory access.

Which means I have a to change the second question: would you expect a two-socket-board with two 16-core cpus to be faster, than a single 32-core cpu? The communication betweens the cores of the CPUs would be slowed down due to the interconnect, but memory bandwidth is doubled.
Kaskade is offline   Reply With Quote

Old   July 14, 2017, 10:13
Default
  #8
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,488
Rep Power: 25
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Yes, a dual-socket setup with two 16-core CPUs is much faster for CFD workloads than a single 32-core CPU. The reasons are that latencies are still fairly low on a dual-socket board, so communication overhead is usually negligible. At least the effect is much less important than having twice the memory bandwidth.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   July 14, 2017, 15:04
Default
  #9
Senior Member
 
Onno
Join Date: Jan 2012
Location: Germany
Posts: 101
Rep Power: 8
Kaskade is on a distinguished road
Thank you for the quick response.

That brings me back to my original question: Is there a rule to determine the bottleneck? For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs, but four 8-core-CPUs installed in two two-socket-system connected by infiband won't?
Kaskade is offline   Reply With Quote

Old   July 14, 2017, 15:37
Default
  #10
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,488
Rep Power: 25
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Quote:
For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs
Absolutely
Quote:
but four 8-core-CPUs installed in two two-socket-system connected by infiband won't?
Infiniband interconnects do not really slow things down for only two nodes.
If the code you use supports distributed memory parallelization, you don't need to pay fo the expensive quad-socket hardware.

Let me share a recent experience I had with memory "bottlenecks". I refurbished an old workstation with 2x Xeon E5-2643. These are 4-core CPUs with 4 memory channels. Sounds like enough memory bandwidth per core, right?. Replacing the DDR3-1333 DIMMs with otherwise identical DDR3-1600 still increased performance in Ansys Fluent by 12% when solving on 8 cores. The point is, you really can not have too much memory performance for CFD.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   July 14, 2017, 15:57
Default
  #11
Senior Member
 
Onno
Join Date: Jan 2012
Location: Germany
Posts: 101
Rep Power: 8
Kaskade is on a distinguished road
Unfortunately the extra cost of buying 2 servers, 4 cpus and infiband cards, instead of a single server with one 32-core-CPU is easily quantifiable, while the actual performance benefit is depending on the case.

At least I know now, that I was wrong in assuming that the interconnect would be the main issue.

For anyone who hasn't seen it yet: computerbase.de (german) compiled a nice comprehensive list of the current gen server cpus.
Kaskade is offline   Reply With Quote

Old   July 15, 2017, 06:14
Default
  #12
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,488
Rep Power: 25
flotus1 will become famous soon enoughflotus1 will become famous soon enough
You should at least consider the solution in between. A dual-socket board with 2x16 cores. A single 32-core CPU is just pointless for any CFD application.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   July 17, 2017, 00:44
Default
  #13
Senior Member
 
Onno
Join Date: Jan 2012
Location: Germany
Posts: 101
Rep Power: 8
Kaskade is on a distinguished road
Last friday I asked our hardware vendor to give us quotes for 2x16 systems based on AMD and Intel to get an idea of the difference in price. I assume you would argue that the greater memory bandwidth of AMD will outweigh the higher clockrates and (potential) benefits of AVX-512, correct?
Kaskade is offline   Reply With Quote

Old   July 17, 2017, 02:32
Default
  #14
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,488
Rep Power: 25
flotus1 will become famous soon enoughflotus1 will become famous soon enough
That might be a possible outcome. But with no real-world benchmarks or available hardware in sight I am hesitant to draw this conclusion.
There are some details in the architecture of Epyc that have its drawbacks, like high latency for far cache access and lower bandwidth for memory access outside of the CCX. But then again, the latest iteration of Intel processors does not seem to be without flaws either. I just can't say with certainty without any CFD benchmarks.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   July 17, 2017, 02:59
Default
  #15
Senior Member
 
Onno
Join Date: Jan 2012
Location: Germany
Posts: 101
Rep Power: 8
Kaskade is on a distinguished road
Our computationally most intensive cases would be multi-phase simulations using sliding meshes. Since most, if not all, benchmark case come down to single phase flow with an in- and an outlet condition, I would take benchmarks results with a grain of salt anyway.

Currently we are running our simulations on two virtual machines, so any bare-metal system would likely be a vast improvement.
Kaskade is offline   Reply With Quote

Old   July 20, 2017, 01:19
Default
  #16
Senior Member
 
Onno
Join Date: Jan 2012
Location: Germany
Posts: 101
Rep Power: 8
Kaskade is on a distinguished road
The first vendor got back to me. For now they only sell Intel, so the lack of benchmarks might turn out to not be the main issue.

They are quoting me various configurations based on Xeon Gold 6134, since it has the best per core performance. One benefit of the Skylake-SP is that the even the "Gold" CPU can be used in four-socket-systems, meaning I need only one machine (although the price compared to 2 systems with infiniband might stay the same). One drawback of 6 memory channels in combination with a 8 core CPU is that I end up with less than the 8 GB per core that I wanted or with significantly more.

I am also looking at the Xeon Gold 6136, which 10% more expensive but offers 50% more cores. I hope that the difference in base frequency is offset by the turbo-boost when only using 8 out of 12 cores.

Edit: If the information on wikichip is correct, the difference between 6134 and 6136 when using 8 cores is only 100 Mhz.

Last edited by Kaskade; July 20, 2017 at 05:35.
Kaskade is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD EPYC or Intel Skylake-EP xuegy Hardware 1 June 27, 2017 08:31
AMD EPYC or Intel Skylake-EP xuegy OpenFOAM Running, Solving & CFD 0 June 27, 2017 08:00


All times are GMT -4. The time now is 03:19.