|
[Sponsors] |
8 channel vs 4 channel memory bandwidth using only 8 cores! |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
June 27, 2020, 21:18 |
8 channel vs 4 channel memory bandwidth using only 8 cores!
|
#1 |
Member
Join Date: Oct 2019
Posts: 63
Rep Power: 6 |
Hi friends,
I am very curious to know about the effect of high memory bandwidth in low core count CPUs and in CFD applications. Can Epyc 7252 provide a similar performance of 7302p limited to 8 cores (assuming full channel populated)? The 7252 benefits from the 100 MHz higher base clock, however, the L3 cache and memory bandwidth are half. I found the results in OpenFOAM benchmark: 2x7302 total 8 Cores 81.4s 1x3960X total 8 Cores 101.5s How it is possible? in 8 cores benchmark, the 4 channel bandwidth must be enough!! Last edited by Habib-CFD; June 28, 2020 at 01:28. |
|
June 28, 2020, 01:24 |
|
#2 | |
New Member
Allen
Join Date: Jan 2020
Posts: 8
Rep Power: 6 |
Quote:
Also the TR has a decent clock speed advantage and it should have pretty similar IPC to the 7302 since they're both zen2 chips, so that's 2 big advantages it's got going for it here yet it's still ~20 seconds slower. I might be making some incorrect assumptions but to me this comparison is pretty good evidence of 8 channels being better than 4, even at these small core counts. |
||
June 28, 2020, 01:34 |
|
#3 |
Member
Join Date: Oct 2019
Posts: 63
Rep Power: 6 |
Thank you Allen.
I checked the results again, the benchmark comes from 2x7302 (my mistake to type P). But, again it is difficult to accept the differences in 8 cores. |
|
June 28, 2020, 03:22 |
|
#4 | |
New Member
Allen
Join Date: Jan 2020
Posts: 8
Rep Power: 6 |
Ahh ok, that makes more sense.
Quote:
We can try to isolate the difference caused by the 8 (actually 16 available in this case) vs. 4 channel chips, at least to some degree, by comparing the amount of work done on a per-core per-clock basis, though we'll have to make some assumptions: 1) Both chips are running with their base memory speed of 3200MHz, i.e. the TR is not overclocked in any way.7302: 8 cores running at 3.3 GHz do the work in 81.4 seconds, so with perfect scaling 1 core at 3.3 GHz would do it in 651.2 seconds, and at 1 Ghz it would do it in 2149 seconds. 3960X: 101.5s * 8 = 812s, 812s * 4.5GHz = 3654 seconds. 3654/2149 = ~1.7 Ok so in this case a 7302 core is 1.7x faster per GHz than a 3960X core, but they had 4x the memory bandwidth. There obviously is some difference, but going any further with the available information is beyond me. I'd also be interested to see how the results here would differ with all 8 of the 7302's cores on a single socket. Hopefully someone with more knowledge will chime in. |
||
June 28, 2020, 10:39 |
|
#5 |
Member
Join Date: Oct 2019
Posts: 63
Rep Power: 6 |
How about the CPU cache effect !? not only the RAM bandwidth.
C: 2x7302,..................................3960x,... .............................7252 1: 3.3GHz (723s).........................4.5GHz (550s).....................3.2GHz, (?) 2: 3.3GHz (328s).........................4.0GHz (299s).....................3.1GHz, (?) 4: 3.0GHz (164s).........................3.8GHz (162s).....................3.1GHz, (?) 8: 3.0GHz (81s)...........................3.8GHz (102s).....................3.1GHz, (effective,??) |
|
June 29, 2020, 11:26 |
|
#6 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Quote:
Anyway, we can just dissect the results at hand further, and do our own strong scaling analysis: scaling.png It clearly shows that the 3960X is well below ideal scaling at 8 threads, while the 2x Epyc 7302 even shows super-linear scaling with up to 16 threads. My conclusion: That TR 3960X is being held back by the performance of its memory subsystem. Even at a thread count of 8, memory bandwidth is not enough to keep those fast cores busy. The effect might be more pronounced than it should be, OP never responded to the comments he got. And those Epyc 7302 perform much better thanks to an abundance of available memory bandwidth. The super-linear effect might be explained by effectively doubling the available L3 cache with two CPUs. Other 7302 results are closer to the ideal line, so some caution is required when drawing conclusions like these. |
||
June 29, 2020, 20:13 |
|
#7 |
Member
Join Date: Oct 2019
Posts: 63
Rep Power: 6 |
Thank you flotus1.
Actually, I start to investigate lightly the Epyc 8 cores line up due to the poor scaling of my CFD software. With your explanation, the 7262 expects to provide noticeable performance in comparison to the 7252 in CFD application. The 10900x is also available with an almost similar total budget (7252), but I am not interested to be involved with VRM and CPU temperature. |
|
June 30, 2020, 09:20 |
|
#8 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
If it is memory bandwidth that is holding your code back -and that's a big if- then the Epyc 7262 might be an option.
I can't remember each discussion we had on that topic in detail: have you already tried a monolithic CPU, with low latency and high clock speeds? Such as Intel 10900x? But at this point, I think no CPU would give you a significant performance improvement. Wait a minute... you already tried an Epyc 7302p, right? If that's the case, forget about the 7262. It's the same CPU, just with less active cores. |
|
June 30, 2020, 10:55 |
|
#9 | |
Member
Join Date: Oct 2019
Posts: 63
Rep Power: 6 |
Quote:
You are right, you taught me how to scale the Flow 3D using 7302p . We concluded that the Flow 3D scale fall after 4 cores rapidly, and now they announce the new HPC release for in-house clusters!! I am still using the previous version due to license issues. Anyway, the reason that I interest in EPYC line-up is low-temperature working conditions. The 7302p reach to a maximum of 42 degrees by 16 cores and 100 percent load, really awesome. I am planing a budget PC to use it for a long time with-out worry about VRM temperature or even liquid cooler etc. Something like used 7252 bundle. Best performance per budget (total 1000 to 1200$), not the fastest. I have not tested 10900x, no chance for additional budget if I miss the 24 hours workload condition. Every time that I check the X299 motherboards, the thick heat-sink warn me something might be very hot . |
||
June 30, 2020, 17:45 |
|
#10 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Where do I start...
A decent sized heatsink is not a sign that a component runs hot. It is a sign that the manufacturer put some thought into the thermal design. At least if we are talking about actual heatsinks, and not those slabs of massive metal that some manufacturers started to put on their boards. This was one of the reasons why the first batch of X299 boards was universally torn apart by reviewers. Those were all form over function. But still: these catastrophic results were mostly obtained on open-air test benches, with CPU water cooling. Meaning absolutely zero airflow for the VRMs, In a case with case fans and an air cooler on the CPU, the situation would have been much less severe. VRM temperatures on boards for AMD Epyc are also a "hot" topic. VRMs on these boards are not as over-engineered as on most X299 boards. Meaning they need to operate closer to their maximum capacity, and thus running hotter. The tiny heatsinks, which mostly rely on plenty of airflow for adequate temperatures, don't help either. VRMs on my H11DSi in a well-ventilated workstation case regularly went over 90°C before I went with water cooling. That's not an issue, but it shows that the VRM design and cooling solution is inferior to many consumer boards. Which finally brings us to the CPU. Intel CPUs run hot, right? Well most of them don't, at least when they are not overclocked. In order to get an edge in marketing over their competitors, some motherboard manufacturers have resorted to overclocking CPUs out of the box. "Look, using our board, you will get 2% more performance than with brand x". And they do so with borderline insane voltage settings. With power limits enforced and voltages in a normal range, modern Intel CPUs can be quite easy to cool. This also ties in with VRM temperatures. Of course the 18-core X299 SKU will run hot, and cause more heat in the VRMs. But your are looking at 8-10 cores, where temperatures are much easier to keep in check. With all that out of the way, the go-to option for X299 is ASRock X299 Extreme 4. A sub-200$ part with adequate VRM cooling, capable of handling an 18-core CPU, so 8-10 cores will be no challenge. And pretty much all the features one would ever need. Since you are searching for 8-core Epyc CPUs, the closer match would be the the I7-9800X for ~350$. A decent air cooler for 50$ or above will get the job done, along with some additional fans in the case. |
|
June 30, 2020, 20:15 |
|
#11 |
Member
Join Date: Oct 2019
Posts: 63
Rep Power: 6 |
Dear Alex, Thank you for your precise statement.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Memory bandwidth problem? | MSF | Hardware | 6 | May 8, 2019 07:28 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 05:36 |
Lenovo C30 memory configuration and discussions with Lenovo | matthewe | Hardware | 3 | October 17, 2013 10:23 |
Optimal 32-core system | bindesboll | Hardware | 17 | July 9, 2013 10:58 |
CFX CPU time & real time | Nick Strantzias | CFX | 8 | July 23, 2006 17:50 |