CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   Epyc vs Xeon Skylake SP (https://www.cfd-online.com/Forums/hardware/190354-epyc-vs-xeon-skylake-sp.html)

hpvd July 11, 2017 15:59

Epyc vs Xeon Skylake SP
 
Epyc vs Xeon Skylake SP
- which one to choose?

First (non CFD) benchmarks are available:

full article:
http://www.anandtech.com/show/11544/...-of-the-decade

deeplink to begin of benchmarks:
http://www.anandtech.com/show/11544/...-the-decade/11

What do you think??

hpvd July 11, 2017 16:58

just copied this

Quote:

Originally Posted by kyle (Post 656697)
Found a Euler3d benchmark for Skylake SP:

https://hothardware.com/reviews/inte...-review?page=6

Still nothing for EPYC that I see


hpvd July 13, 2017 08:14

Skylake SP's AVX512 Units seems to have some advantages also in fluent (v18.1).
- or is it just the 6Ch Memory??

Quote:

In Fluent, we’ve added support for Intel® Advanced Vector Extensions 2 (AVX2) optimized binary, so that we can take better advantage of the advanced vector processing capabilities of Intel Xeon processors. Our benchmark results also show the Intel Xeon Gold 6148 processor boosts performance for ANSYS Fluent 18.1 by up to 41% versus a previous-generation processor — and provides up to 34% higher performance per core.
In detail:
Xeon Gold 6148: 20c@2,4Ghz 2 AVX512 Units
vs
E5-2967v4: 18c@2,3Ghz


http://www.ansys-blog.com/boost-ansy...-technologies/

flotus1 July 13, 2017 10:35

I hardly think these performance gains are due to AVX. Last time I checked AVX2 instructions performance for one of our LBM codes, performance gains were in the low single digits. Which is to be expected in a memory bound workload.
6xDDR4-2666 vs 4xDDR4-2400 is 67% more theoretical memory bandwidth. Thats all there is to these performance gains. But it would not be clever to highlight this in a marketing publication because AMD Epyc is even better in exactly this metric. So they might try to pin the performance gains to AVX instead.

hpvd July 13, 2017 13:19

just the same what I was thinking when reading this....

Kaskade July 14, 2017 03:24

Sorry, if I'm highjacking the topic, but I've been loooking into getting a new server/cluster and I have two quick questions, you seem to able to answer.

1) If I have a multi-socket-server, do the CPUs share the memory bandwidth?

2) Is there some rule to determine if the interconnect or the memory bandwidth will be the bottle neck? Can two servers connected by infiniband be faster than a single server, given the same number of cores?

Kaskade July 14, 2017 07:14

Some more research and I answered my own question: each socket has it's own memory access.

Which means I have a to change the second question: would you expect a two-socket-board with two 16-core cpus to be faster, than a single 32-core cpu? The communication betweens the cores of the CPUs would be slowed down due to the interconnect, but memory bandwidth is doubled.

flotus1 July 14, 2017 10:13

Yes, a dual-socket setup with two 16-core CPUs is much faster for CFD workloads than a single 32-core CPU. The reasons are that latencies are still fairly low on a dual-socket board, so communication overhead is usually negligible. At least the effect is much less important than having twice the memory bandwidth.

Kaskade July 14, 2017 15:04

Thank you for the quick response.

That brings me back to my original question: Is there a rule to determine the bottleneck? For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs, but four 8-core-CPUs installed in two two-socket-system connected by infiband won't?

flotus1 July 14, 2017 15:37

Quote:

For example four 8-core-CPUs installed in a 4-socket-system will be faster than two 16-core CPUs
Absolutely
Quote:

but four 8-core-CPUs installed in two two-socket-system connected by infiband won't?
Infiniband interconnects do not really slow things down for only two nodes.
If the code you use supports distributed memory parallelization, you don't need to pay fo the expensive quad-socket hardware.

Let me share a recent experience I had with memory "bottlenecks". I refurbished an old workstation with 2x Xeon E5-2643. These are 4-core CPUs with 4 memory channels. Sounds like enough memory bandwidth per core, right?. Replacing the DDR3-1333 DIMMs with otherwise identical DDR3-1600 still increased performance in Ansys Fluent by 12% when solving on 8 cores. The point is, you really can not have too much memory performance for CFD.

Kaskade July 14, 2017 15:57

Unfortunately the extra cost of buying 2 servers, 4 cpus and infiband cards, instead of a single server with one 32-core-CPU is easily quantifiable, while the actual performance benefit is depending on the case.

At least I know now, that I was wrong in assuming that the interconnect would be the main issue.

For anyone who hasn't seen it yet: computerbase.de (german) compiled a nice comprehensive list of the current gen server cpus.

flotus1 July 15, 2017 06:14

You should at least consider the solution in between. A dual-socket board with 2x16 cores. A single 32-core CPU is just pointless for any CFD application.

Kaskade July 17, 2017 00:44

Last friday I asked our hardware vendor to give us quotes for 2x16 systems based on AMD and Intel to get an idea of the difference in price. I assume you would argue that the greater memory bandwidth of AMD will outweigh the higher clockrates and (potential) benefits of AVX-512, correct?

flotus1 July 17, 2017 02:32

That might be a possible outcome. But with no real-world benchmarks or available hardware in sight I am hesitant to draw this conclusion.
There are some details in the architecture of Epyc that have its drawbacks, like high latency for far cache access and lower bandwidth for memory access outside of the CCX. But then again, the latest iteration of Intel processors does not seem to be without flaws either. I just can't say with certainty without any CFD benchmarks.

Kaskade July 17, 2017 02:59

Our computationally most intensive cases would be multi-phase simulations using sliding meshes. Since most, if not all, benchmark case come down to single phase flow with an in- and an outlet condition, I would take benchmarks results with a grain of salt anyway.

Currently we are running our simulations on two virtual machines, so any bare-metal system would likely be a vast improvement.

Kaskade July 20, 2017 01:19

The first vendor got back to me. For now they only sell Intel, so the lack of benchmarks might turn out to not be the main issue.

They are quoting me various configurations based on Xeon Gold 6134, since it has the best per core performance. One benefit of the Skylake-SP is that the even the "Gold" CPU can be used in four-socket-systems, meaning I need only one machine (although the price compared to 2 systems with infiniband might stay the same). One drawback of 6 memory channels in combination with a 8 core CPU is that I end up with less than the 8 GB per core that I wanted or with significantly more.

I am also looking at the Xeon Gold 6136, which 10% more expensive but offers 50% more cores. I hope that the difference in base frequency is offset by the turbo-boost when only using 8 out of 12 cores.

Edit: If the information on wikichip is correct, the difference between 6134 and 6136 when using 8 cores is only 100 Mhz.

Kaskade August 9, 2017 05:06

Say I would be comparing two one-socket systems. One 8-core, one 16-core. Both run at the same clock rate, both have the same memory configuration. Even under the assumption that the memory bandwidth is the bottle neck, the 16-core system would still be almost twice as fast, wouldn't it?

flotus1 August 9, 2017 05:28

Nope, that's the whole point of the term "bottleneck".
The actual speedup you get depends on how "tight" the bottleneck is. Somewhere between a factor of 1 and 2.

chad August 10, 2017 12:46

Just out of curiosity, when will both of these CPU's be available to the "consumer" friendly market such as NewEgg/Amazon?

I see AMD links to SuperMicro and a few other distributors for the Epycs (the new Xeons have similar availability) but I was wondering if someone had a bit more insight on purchasing them.

Thanks.

Kaskade August 10, 2017 12:53

I think that question would be better directed at a retailer. (As a consumer I would rather just get a very nice used car instead of a Xeon Platinum.)

Right now even a lot of server vendors still list the old models on their websites.


All times are GMT -4. The time now is 21:04.