CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   8 channel vs 4 channel memory bandwidth using only 8 cores! (https://www.cfd-online.com/Forums/hardware/228338-8-channel-vs-4-channel-memory-bandwidth-using-only-8-cores.html)

Habib-CFD June 27, 2020 21:18

8 channel vs 4 channel memory bandwidth using only 8 cores!
 
Hi friends,
I am very curious to know about the effect of high memory bandwidth in low core count CPUs and in CFD applications.

Can Epyc 7252 provide a similar performance of 7302p limited to 8 cores (assuming full channel populated)?
The 7252 benefits from the 100 MHz higher base clock, however, the L3 cache and memory bandwidth are half.


I found the results in OpenFOAM benchmark:
2x7302 total 8 Cores 81.4s
1x3960X total 8 Cores 101.5s


How it is possible? :confused: in 8 cores benchmark, the 4 channel bandwidth must be enough!!

Allen_ June 28, 2020 01:24

Quote:

Originally Posted by Habib-CFD (Post 776246)
2x7302P total 8 Cores 81.4s
1x3960X total 8 Cores 101.5s


How it is possible? :confused: in 8 cores benchmark, the 4 channel bandwidth must be enough!!

This is a pretty apples and oranges comparison. The 7302P is a single-socket only chip, so any test with two of them is going to also include an interconnect. I think a safe assumption here is that each 7302P is running four cores with an infinband connection. As fast as infiniband is it's not as fast as the memory on a single socket, so these ROME chips are being saddled with a pretty big disadvantage from the slower message passing they're being forced to do. I suspect the results of running 8 cores on a single 7203P would be better, and that might give a better idea of 4 vs. 8 memory channels. Of course the best comparison would be to test a 7302 against one of the 64Mb cache models. If I had the funds do such a comparison I would, but alas...

Also the TR has a decent clock speed advantage and it should have pretty similar IPC to the 7302 since they're both zen2 chips, so that's 2 big advantages it's got going for it here yet it's still ~20 seconds slower. I might be making some incorrect assumptions but to me this comparison is pretty good evidence of 8 channels being better than 4, even at these small core counts.

Habib-CFD June 28, 2020 01:34

Thank you Allen.

I checked the results again, the benchmark comes from 2x7302 (my mistake to type P). But, again it is difficult to accept the differences in 8 cores.

Allen_ June 28, 2020 03:22

Quote:

Originally Posted by Habib-CFD (Post 776251)
the benchmark comes from 2x7302

Ahh ok, that makes more sense.

Quote:

Originally Posted by Habib-CFD (Post 776251)
But, again it is difficult to accept the differences in 8 cores.


We can try to isolate the difference caused by the 8 (actually 16 available in this case) vs. 4 channel chips, at least to some degree, by comparing the amount of work done on a per-core per-clock basis, though we'll have to make some assumptions:
1) Both chips are running with their base memory speed of 3200MHz, i.e. the TR is not overclocked in any way.

2) The chips have similar IPC since they are both zen2, though this may be a moot point since we already have the product of their work.
7302: 8 cores running at 3.3 GHz do the work in 81.4 seconds, so with perfect scaling 1 core at 3.3 GHz would do it in 651.2 seconds, and at 1 Ghz it would do it in 2149 seconds.

3960X: 101.5s * 8 = 812s, 812s * 4.5GHz = 3654 seconds.

3654/2149 = ~1.7


Ok so in this case a 7302 core is 1.7x faster per GHz than a 3960X core, but they had 4x the memory bandwidth. There obviously is some difference, but going any further with the available information is beyond me. I'd also be interested to see how the results here would differ with all 8 of the 7302's cores on a single socket. Hopefully someone with more knowledge will chime in.

Habib-CFD June 28, 2020 10:39

How about the CPU cache effect !? not only the RAM bandwidth.


C: 2x7302,..................................3960x,... .............................7252
1: 3.3GHz (723s).........................4.5GHz (550s).....................3.2GHz, (?)
2: 3.3GHz (328s).........................4.0GHz (299s).....................3.1GHz, (?)
4: 3.0GHz (164s).........................3.8GHz (162s).....................3.1GHz, (?)
8: 3.0GHz (81s)...........................3.8GHz (102s).....................3.1GHz, (effective,??)

flotus1 June 29, 2020 11:26

1 Attachment(s)
Quote:

Originally Posted by Habib-CFD (Post 776246)
How it is possible? :confused: in 8 cores benchmark, the 4 channel bandwidth must be enough!!

Any particular reason for that assumption? I would be inclined to disagree with it.

Anyway, we can just dissect the results at hand further, and do our own strong scaling analysis:
Attachment 78730
It clearly shows that the 3960X is well below ideal scaling at 8 threads, while the 2x Epyc 7302 even shows super-linear scaling with up to 16 threads.
My conclusion: That TR 3960X is being held back by the performance of its memory subsystem. Even at a thread count of 8, memory bandwidth is not enough to keep those fast cores busy. The effect might be more pronounced than it should be, OP never responded to the comments he got.
And those Epyc 7302 perform much better thanks to an abundance of available memory bandwidth. The super-linear effect might be explained by effectively doubling the available L3 cache with two CPUs. Other 7302 results are closer to the ideal line, so some caution is required when drawing conclusions like these.

Habib-CFD June 29, 2020 20:13

Thank you flotus1.
Actually, I start to investigate lightly the Epyc 8 cores line up due to the poor scaling of my CFD software. With your explanation, the 7262 expects to provide noticeable performance in comparison to the 7252 in CFD application. The 10900x is also available with an almost similar total budget (7252), but I am not interested to be involved with VRM and CPU temperature.

flotus1 June 30, 2020 09:20

If it is memory bandwidth that is holding your code back -and that's a big if- then the Epyc 7262 might be an option.
I can't remember each discussion we had on that topic in detail: have you already tried a monolithic CPU, with low latency and high clock speeds? Such as Intel 10900x?
But at this point, I think no CPU would give you a significant performance improvement.

Wait a minute... you already tried an Epyc 7302p, right? If that's the case, forget about the 7262. It's the same CPU, just with less active cores.

Habib-CFD June 30, 2020 10:55

Quote:

Originally Posted by flotus1 (Post 776442)
Wait a minute... you already tried an Epyc 7302p, right? If that's the case, forget about the 7262. It's the same CPU, just with less active cores.


You are right, you taught me how to scale the Flow 3D using 7302p :rolleyes:. We concluded that the Flow 3D scale fall after 4 cores rapidly, and now they announce the new HPC release for in-house clusters!! I am still using the previous version due to license issues.


Anyway, the reason that I interest in EPYC line-up is low-temperature working conditions. The 7302p reach to a maximum of 42 degrees by 16 cores and 100 percent load, really awesome. I am planing a budget PC to use it for a long time with-out worry about VRM temperature or even liquid cooler etc. Something like used 7252 bundle. Best performance per budget (total 1000 to 1200$), not the fastest. I have not tested 10900x, no chance for additional budget if I miss the 24 hours workload condition. Every time that I check the X299 motherboards, the thick heat-sink warn me something might be very hot :(.

flotus1 June 30, 2020 17:45

Where do I start... ;)

A decent sized heatsink is not a sign that a component runs hot. It is a sign that the manufacturer put some thought into the thermal design. At least if we are talking about actual heatsinks, and not those slabs of massive metal that some manufacturers started to put on their boards. This was one of the reasons why the first batch of X299 boards was universally torn apart by reviewers. Those were all form over function.
But still: these catastrophic results were mostly obtained on open-air test benches, with CPU water cooling. Meaning absolutely zero airflow for the VRMs, In a case with case fans and an air cooler on the CPU, the situation would have been much less severe.

VRM temperatures on boards for AMD Epyc are also a "hot" topic. VRMs on these boards are not as over-engineered as on most X299 boards. Meaning they need to operate closer to their maximum capacity, and thus running hotter. The tiny heatsinks, which mostly rely on plenty of airflow for adequate temperatures, don't help either.
VRMs on my H11DSi in a well-ventilated workstation case regularly went over 90°C before I went with water cooling. That's not an issue, but it shows that the VRM design and cooling solution is inferior to many consumer boards.

Which finally brings us to the CPU. Intel CPUs run hot, right? Well most of them don't, at least when they are not overclocked. In order to get an edge in marketing over their competitors, some motherboard manufacturers have resorted to overclocking CPUs out of the box. "Look, using our board, you will get 2% more performance than with brand x". And they do so with borderline insane voltage settings. With power limits enforced and voltages in a normal range, modern Intel CPUs can be quite easy to cool.
This also ties in with VRM temperatures. Of course the 18-core X299 SKU will run hot, and cause more heat in the VRMs. But your are looking at 8-10 cores, where temperatures are much easier to keep in check.

With all that out of the way, the go-to option for X299 is ASRock X299 Extreme 4. A sub-200$ part with adequate VRM cooling, capable of handling an 18-core CPU, so 8-10 cores will be no challenge. And pretty much all the features one would ever need.
Since you are searching for 8-core Epyc CPUs, the closer match would be the the I7-9800X for ~350$. A decent air cooler for 50$ or above will get the job done, along with some additional fans in the case.

Habib-CFD June 30, 2020 20:15

Dear Alex, Thank you for your precise statement.


All times are GMT -4. The time now is 07:13.