CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Memory for AMD Epyc CPUs

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 18, 2022, 11:52
Default Memory for AMD Epyc CPUs
  #1
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Which memory setup would have higher bandwidth on a Naples EPYC machine - 2666 MHz 2 rank or 3200 MHz single rank ? Assume the motherboard will allow the memory to be over clocked.


What about on a Rome EPYC machine ?


Thanks
linuxguy123 is offline   Reply With Quote

Old   January 18, 2022, 12:31
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
You can not run memory beyond the official spec with these platforms. Even if the motherboard allows you to enter a higher frequency, the CPUs themselves are locked. Maybe there are some ES/QS CPUs floating around with unlocked memory controllers, but I haven't heard of anyone pulling this off.

Since this locks us to DDR4-3200 for Rome, and DDR4-2666 for Naples, the only free parameter is memory ranks per channel. 2 is better than 1 for bandwidth.
flotus1 is offline   Reply With Quote

Old   January 18, 2022, 12:49
Default
  #3
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
You can not run memory beyond the official spec with these platforms. Even if the motherboard allows you to enter a higher frequency, the CPUs themselves are locked. Maybe there are some ES/QS CPUs floating around with unlocked memory controllers, but I haven't heard of anyone pulling this off.

Wendell of Level1 Tech overclocked the memory on a 7551 without any issues. It ran at 3200MHz even though the official spec on Naples is 2666MHz.



Quote:
Since this locks us to DDR4-3200 for Rome, and DDR4-2666 for Naples, the only free parameter is memory ranks per channel. 2 is better than 1 for bandwidth.

How much of a difference does memory rank make in the bandwidth ?
linuxguy123 is offline   Reply With Quote

Old   January 18, 2022, 13:06
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That's neat, do you have a link to the video/article?
The difference in maximum bandwidth is usually around 15% between 1 and 2 ranks per channel.
flotus1 is offline   Reply With Quote

Old   January 18, 2022, 14:23
Default
  #5
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That's neat, do you have a link to the video/article?
The difference in maximum bandwidth is usually around 15% between 1 and 2 ranks per channel.

https://www.youtube.com/watch?v=1ZwRYprMF0w @ 12:00


It turns out that Naples EPYC runs the memory fastest on 1 DIMM per channel of SINGLE rank memory. See Table 4 on page 8. https://developer.amd.com/wp-content.../56301_1.0.pdf


1 DIMM per channel of Dual Rank has a memory bandwidth of 154 GB/s.

1 DIMM per channel of Single Rank has a memory bandwidth of 170 GB/s.

Last edited by linuxguy123; January 18, 2022 at 20:27.
linuxguy123 is offline   Reply With Quote

Old   January 18, 2022, 16:26
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I'm not entirely convinced by that. He briefly mentions that the option is there in bios, and suggests that it will run at DDR4-3200. But that's never shown. I remain skeptical.
I have a modded bios on my Supermicro H11DSI that exposes tons of additional options. Higher memory frequency being one of them. I was never able to get it to work, and still haven't seen anyone else doing it. Professional overclockers trying to set world records could not do it. Maybe it's a different story on single-socket systems, I don't know about that.

Now for the easy part:
The PDF you linked shows two things: Epyc Naples has staggered memory frequency specs depending on the memory population. That's nothing new, the more ranks per channel, the lower the officially supported memory speed. I am still running DDR4-2666 with 2 ranks per channel on Epyc 7551. As do a lot of other people.
The more important thing to note here is: the tables list maximum theoretical memory bandwidth. Calculated via (frequency x 64Bit x 2 x number of memory channels).
The whole point here is that one rank per channel can not get close to that theoretical maximum, even in synthetic benchmarks like stream. You need at least 2 ranks per channel for that.

Edit: they even state that right above those tables:
Quote:
While it may seem that a decreased operational frequency with two DIMMs populated is not ideal for memory intensive workloads, the additional chip selects being used, or ranks of memory, can outweigh the change in operating memory speed in certain workloads.
flotus1 is offline   Reply With Quote

Old   January 18, 2022, 18:07
Default
  #7
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Yes, but that paper specifically states this:


Quote:
Memory Bandwidth Sensitive Workloads
Memory bound workloads will benefit from the maximum available memory speed. This can be
achieved with one2666 MHz DIMM per channel in a single slot per channel platform.This
configuration would be beneficial in memory bound high-performance computing (HPC)
workloads such as:computational fluid dynamics (CFD), weather modeling, crash simulation,
and oil and gas (O&G) exploration.
The only way one can achieve a memory speed of 2666 MHz is with single rank memory. It makes sense that single rank memory will be faster because dual rank memory will need 2 chip selects to read both banks. One chip select for the first bank and a second chip select for the second bank.


I think the quote you gave refers to computations on large data sets that benefit from keeping the entire dataset in RAM versus accessing a disk. Dual rank memory allows more memory per DIMM, all else being equal. Notice the description of that situation is labelled "memory intensive". CFD is bandwidth intensive, not memory intensive.


Quote:
While it may seem that a decreased operational frequency with two DIMMs populated is not ideal for memory intensive workloads, the additional chip selects being used, or ranks of memory, can outweigh the change in [they mean reduction in] operating memory speed in certain workloads.

Last edited by linuxguy123; January 18, 2022 at 20:41.
linuxguy123 is offline   Reply With Quote

Old   January 19, 2022, 02:30
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
The only way one can achieve a memory speed of 2666 MHz is with single rank memory
No, it isn't. As I already told you, my personal workstation is running 2R DDR4-2666 just fine on Naples. As do the other 9 Naples systems I bought for work. And the thousands of systems that were bought with such memory configurations, because neither the seller nor the buyer knew or cared about this rather obscure limitation.

What's our goal here? Arguing for arguments sake?
Your first post was about how much performance can be gained by actually overclocking memory on Naples.
And now you want to let some obscure "official" spec prevent you from having your cake, and eating it too?
Pick a side, mate

Last edited by flotus1; January 19, 2022 at 06:49.
flotus1 is offline   Reply With Quote

Old   January 19, 2022, 11:41
Default
  #9
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
No, it isn't. As I already told you, my personal workstation is running 2R DDR4-2666 just fine on Naples.
Then, according to AMD you are over clocking your memory. Which is fine, I'm not saying it isn't.

As do the other 9 Naples systems I bought for work. And the thousands of systems that were bought with such memory configurations, because neither the seller nor the buyer knew or cared about this rather obscure limitation.

Quote:
What's our goal here? Arguing for arguments sake?
Let's not get personal here. I Have no dog in this fight. I'm just trying to figure out which memory to buy. You say 2 rank. AMD says single rank.

Have you ever benchmarked the same machine back to back with single rank memory versus 2 rank memory ?
linuxguy123 is offline   Reply With Quote

Old   January 19, 2022, 13:05
Default
  #10
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That's hardly a personal attack, and definitely was not intended as such. I just don't like it when these discussions keep running in circles, to a point where I have to start doubting the motivation.
It's just not a clear-cut technical decision. Either you are fine with running technically out of spec, or you are not. Then again, the obscure nature of this particular specification means that tons of people are violating it, without knowing about it, and without problems.

No, I have not personally run the type of benchmark you suggest.
flotus1 is offline   Reply With Quote

Old   January 19, 2022, 13:24
Default
  #11
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That's hardly a personal attack, and definitely was not intended as such. I just don't like it when these discussions keep running in circles, to a point where I have to start doubting the motivation.
It's just not a clear-cut technical decision. Either you are fine with running technically out of spec, or you are not. Then again, the obscure nature of this particular specification means that tons of people are violating it, without knowing about it, and without problems.

No, I have not personally run the type of benchmark you suggest.
I'm fine running Dual Rank memory at 2666MHz. Regardless of rank, I'll attempt to overclock the memory on my system when I get it running.

Are you absolutely sure that your memory is running at 2666MHz ? Is there a POST message in the IPMI that states so ?

All I want to know is what is faster, Single Rank or Dual Rank, so I can buy memory. AMD says Single Rank.

If dual rank memory runs at 2666 MHz, they are probably equal. I'm guessing that not all Dual Rank memory runs at 2666MHz.

What evidence do you have that says Dual Rank memory runs faster than Single Rank ?
linuxguy123 is offline   Reply With Quote

Old   January 19, 2022, 15:09
Default
  #12
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Yeah, I'm pretty sure about the memory frequency in my systems

About that evidence...
For lack of a better term, you will have to take my word for it. I have been doing this for quite a while now. Reading technical documentation, blog posts, news articles, benchmarks from professionals and amateurs, discussing with experts, and sporadically running my own benchmarks. What I didn't do is keep a database or list with links links to prove what I learned over the years.

To wrap this up from my side: for maximum performance in CFD workloads, use dual-rank memory. One DIMM per channel. And if possible, a motherboard with one DIMM slot per channel.
flotus1 is offline   Reply With Quote

Old   January 19, 2022, 15:16
Default
  #13
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
To wrap this up from my side: for maximum performance in CFD workloads, use dual-rank memory.

That is exactly opposite of what AMD says. And I am pretty sure they have tested it.
linuxguy123 is offline   Reply With Quote

Old   January 19, 2022, 16:04
Default
  #14
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
It is the opposite of how you interpreted what AMD states in their document. That's a difference.
Screenshot_20220119_214734.png
Please note how the text does not mention ranks. And also note that the image caption reads "1 DIMM Per Channel SR/DR RDIMM or LRDIMM operating at 2666 MHz"
This of course leaves the ambiguity between SR and DR. That's where I come in, sharing my expertise with you.

Listen, this topic isn't some dubious bs that I convinced myself of. It is a relatively well-known fact among tech enthusiasts and experts. If you want a different opinion, you will have to ask someone who isn't me.
flotus1 is offline   Reply With Quote

Old   January 19, 2022, 16:17
Default
  #15
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
It is the opposite of how you interpreted what AMD states in their document. That's a difference.
Attachment 88052
Please note how the text does not mention ranks. And also note that the image caption reads "1 DIMM Per Channel SR/DR RDIMM or LRDIMM operating at 2666 MHz"
This of course leaves the ambiguity between SR and DR. That's where I come in, sharing my expertise with you.

Listen, this topic isn't some dubious bs that I convinced myself of. It is a relatively well-known fact among tech enthusiasts and experts. If you want a different opinion, you will have to ask someone who isn't me.

According to that diagram, it doesn't matter if the memory is SR or DR. Which is what I said a few posts above.


Worst case, as far as I can tell, there is no performance penalty for using single rank memory on memory bandwidth bound processes, such as CFD.
linuxguy123 is offline   Reply With Quote

Old   January 19, 2022, 17:23
Default
  #16
New Member
 
Francisco
Join Date: Sep 2018
Location: Portugal
Posts: 27
Rep Power: 7
ships26 is on a distinguished road
Some food for thought: I'm very far from an expert on this topic, so don't ask me for many details on this, but I think that saying that 1R is exactly the same as 2R is not fully accurate.
There is at least one potential advantage of 2R at the same frequency, which is rank interleaving: https://en.wikipedia.org/wiki/Interleaved_memory.

I have no idea how much of an improvement (if any) this would provide to an Epyc based build regarding CFD workloads in particular, or if this technology is even included in Epyc systems, but it could help @flotus's case.
Maybe you already knew about this too. Still,I thought it could be a lead worth investigating.
ships26 is offline   Reply With Quote

Old   January 19, 2022, 19:37
Default
  #17
Member
 
Guy
Join Date: Jun 2019
Posts: 39
Rep Power: 7
linuxguy123 is on a distinguished road
EPYC processors do not use interwoven memory. If they did, the dual rank and dual DIMM setups would be faster than the single rank.
linuxguy123 is offline   Reply With Quote

Old   January 20, 2022, 03:17
Default
  #18
Member
 
Erik Andresen
Join Date: Feb 2016
Location: Denmark
Posts: 35
Rep Power: 10
ErikAdr is on a distinguished road
At https://www.spec.org/cpu2017/results/ various computer brands present their systems for a fixed suite of testcases. They want their systems to perform well compared to the competition. The memory they use are (nearly) always either dual rank og quad rank. For Epyc's I think quad rank performs the best, but only with a slight edge to dual rank. Some years ago, I saw to equal Epyc systems (except for memory) where one system was with single rank and til other with dual rank. The dual rank system was about 5 to 10 % faster in some testcases. It was long ago, so I think it was a system with an Epyc 7??1.
ErikAdr is offline   Reply With Quote

Old   January 20, 2022, 05:47
Default
  #19
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,406
Rep Power: 48
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Moved the discussion to its own thread, because it got bit too long and doesn't quite match the topic of the thread where it originated.
flotus1 is offline   Reply With Quote

Old   January 20, 2022, 18:15
Default
  #20
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 549
Rep Power: 16
Simbelmynė is on a distinguished road
While I agree with the general consensus in the community regarding rank2 vs rank1 explained by @flotus1, I think this is a bit muddy as well. In normal consumer (enthusiast) systems you will get vastly different memory results from memory kits that are similar on paper, due to the - sometimes - large difference in memory sub-timings applied by the motherboard and XMP.


As such, it is very difficult to make a straight up comparison. If you push the memory controller to the maximum, then I suspect that you will reach similar results regardless of configuration. This comes from the observation that you can usually run single rank memory in more than 10% higher frequency compared to dual rank (same goes for 1 kit per memory channel compared to 2 kits per memory channel).


If two different kits (one single- and one dual rank) are tested at the same timings and frequency then I would put my money on that the dual rank system performs better. But as stated above, if both kits are pushed to the maximum, then I am not sure.


Since server systems will not run above a certain memory frequency, this usually becomes a moot point, unless dual rank configurations on a specific motherboard forces the sub-timings to be really poor. Most server motherboard BIOSes will not give you any options to change sub-timings.



Anyways, I have never benchmarked server systems myself, I have just listened to the collected wisdom and left the benchmarks to my consumer platform at home
Simbelmynė is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
General recommendations for CFD hardware [WIP] flotus1 Hardware 18 February 29, 2024 12:48
Used Memory Accumulates During Course of Simulation Until interFoam gets Killed Ship Designer OpenFOAM Running, Solving & CFD 7 October 6, 2023 01:26
4-core Workstation Builds dominicafonso Hardware 9 April 11, 2021 06:42
Epyc 7551 vs 6850K; Ansys Mechanical Bench Duke711 Hardware 24 March 26, 2020 10:16
AMD Epyc CFD benchmarks with Ansys Fluent flotus1 Hardware 55 November 12, 2018 05:33


All times are GMT -4. The time now is 02:59.