CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Memory bandwidth and memory interleaving

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By Sly

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 25, 2015, 17:10
Default Memory bandwidth and memory interleaving
  #1
Sly
New Member
 
Sylvain Boulanger
Join Date: Nov 2014
Posts: 17
Rep Power: 11
Sly is on a distinguished road
I’m currently in the process of assessing the hardware requirements for a small cluster I want to build. This will be my first build and I would like some input on questions I have. From what I’ve gathered in this forum, a CFD computer will generally have its bottleneck on memory bandwidth. That being said, I decided to evaluate CPUs against their memory bandwidth with the following equation:

((DP FLOPS x 8 bytes of information per operation)/CAS Latency)/Memory bandwidth
DP FLOPS are determined by the CPU architecture.
All CPUs analysed are 64 bit so 8 bytes.
Division by the CAS latency is to ensure a full use of the memory modules.

Now, I know this is all theoretical but to a certain degree, it should give an estimate of the system’s efficiency. Now, through my research to learn about all this I’ve stumbled upon something called memory interleaving. In some Supermicro motherboard user guides, I read that interleaved memory will be 128 bits instead of 64 bits. See the following link in the motherboard manual page 2-8 under support:
http://www.supermicro.com/aplus/moth...dgt-hlibqf.cfm

If this is the way I understand it, it would negatively affect about 20% of the CPUs I analysed because they would not be able to max out their memory bandwidth. So here are my questions:
1. Is the equation sound or is there a major flaw that I’m not aware of in it?
2. Does interleaved memory really yield 128 bits per cycle per channel?
3. If the answer to question 2 is no, how can we quantify (in bits per cycle per channel) the effect of interleaving whether it is 2,4, 8-way rank interleaving or node interleaving (for multisocket boards)?
Sly is offline   Reply With Quote

Old   February 18, 2015, 15:55
Default
  #2
New Member
 
Join Date: Mar 2013
Location: Canada
Posts: 22
Rep Power: 13
Daveo643 is on a distinguished road
Interesting question. I cannot say I can help answer your question, but maybe we can bounce ideas off one another and exchange notes offline.

As a result of this thread, I quickly made a spreadsheet where I put your equation and also played with various memory frequencies and CAS latencies. I'd like to know from where you got the equation and understand what parameter it is in fact trying to describe (I'm a mechanical engineer not a computer scientist). If I understand it correctly and analyze the dimensions of the terms in your equation, it's sort of like the bandwidth or time it takes to service each FP operation. I suppose you want to maximize the bandwidth and minimize the time to optimize the system. But I can't reconcile this definition with your equation with commonly used parameters when it comes to memory.

I am familiar with MHz/CL (performance index)
http://www.anandtech.com/print/8959/...ta-and-crucial

and 1/MHz (latency time)

To optimise performance, it is desirable to increase the frequency and reduce CAS latency where possible. Based on the index defined by your equation, since it is in the quotient, if you increase the memory bandwidth (the accepted definition is frequency x word size or bus width) the result goes down, but in the same vein if your increase the CAS latency, the index value also goes down, which directionally contradicts an improvement in performance. I think some reworking of your equation is in order.

See the below picture and ignore the stuff in the upper part.
Daveo643 is offline   Reply With Quote

Old   February 19, 2015, 13:41
Default
  #3
Sly
New Member
 
Sylvain Boulanger
Join Date: Nov 2014
Posts: 17
Rep Power: 11
Sly is on a distinguished road
Ok, when I wrote this, it made a lot of sense in my head but now that you answered me I had to read myself again and realised what I said doesn’t work at all. The initial idea was to compare the amount of data the CPU could generate against the amount the memory could via a ratio. The (very wrong) reason I put the CAS latency in there was because I thought the memory wasn’t fully used until the data generated by the CPU was bigger than the data generated by the memory by a factor equal to the CAS latency. That would have implied some sort of multiphase memory controller with the number of phases equal to the CAS latency and, well, this was all a figment of my imagination.
Arnold32 likes this.
Sly is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Advice on the technical requirements for a new Fluent Workstation Dorit Hardware 32 July 17, 2018 02:48
mpirun, best parameters pablodecastillo Hardware 18 November 10, 2016 12:36
Optimal memory configuration bindesboll Hardware 3 October 23, 2013 06:42
Lenovo C30 memory configuration and discussions with Lenovo matthewe Hardware 3 October 17, 2013 10:23
Memory bandwidth vs number of GB BrainPop Hardware 2 August 22, 2011 01:40


All times are GMT -4. The time now is 12:54.