|
[Sponsors] |
January 25, 2015, 17:10 |
Memory bandwidth and memory interleaving
|
#1 |
New Member
Sylvain Boulanger
Join Date: Nov 2014
Posts: 17
Rep Power: 11 |
I’m currently in the process of assessing the hardware requirements for a small cluster I want to build. This will be my first build and I would like some input on questions I have. From what I’ve gathered in this forum, a CFD computer will generally have its bottleneck on memory bandwidth. That being said, I decided to evaluate CPUs against their memory bandwidth with the following equation:
((DP FLOPS x 8 bytes of information per operation)/CAS Latency)/Memory bandwidth DP FLOPS are determined by the CPU architecture. All CPUs analysed are 64 bit so 8 bytes. Division by the CAS latency is to ensure a full use of the memory modules. Now, I know this is all theoretical but to a certain degree, it should give an estimate of the system’s efficiency. Now, through my research to learn about all this I’ve stumbled upon something called memory interleaving. In some Supermicro motherboard user guides, I read that interleaved memory will be 128 bits instead of 64 bits. See the following link in the motherboard manual page 2-8 under support: http://www.supermicro.com/aplus/moth...dgt-hlibqf.cfm If this is the way I understand it, it would negatively affect about 20% of the CPUs I analysed because they would not be able to max out their memory bandwidth. So here are my questions: 1. Is the equation sound or is there a major flaw that I’m not aware of in it? 2. Does interleaved memory really yield 128 bits per cycle per channel? 3. If the answer to question 2 is no, how can we quantify (in bits per cycle per channel) the effect of interleaving whether it is 2,4, 8-way rank interleaving or node interleaving (for multisocket boards)? |
|
February 18, 2015, 15:55 |
|
#2 |
New Member
Join Date: Mar 2013
Location: Canada
Posts: 22
Rep Power: 13 |
Interesting question. I cannot say I can help answer your question, but maybe we can bounce ideas off one another and exchange notes offline.
As a result of this thread, I quickly made a spreadsheet where I put your equation and also played with various memory frequencies and CAS latencies. I'd like to know from where you got the equation and understand what parameter it is in fact trying to describe (I'm a mechanical engineer not a computer scientist). If I understand it correctly and analyze the dimensions of the terms in your equation, it's sort of like the bandwidth or time it takes to service each FP operation. I suppose you want to maximize the bandwidth and minimize the time to optimize the system. But I can't reconcile this definition with your equation with commonly used parameters when it comes to memory. I am familiar with MHz/CL (performance index) http://www.anandtech.com/print/8959/...ta-and-crucial and 1/MHz (latency time) To optimise performance, it is desirable to increase the frequency and reduce CAS latency where possible. Based on the index defined by your equation, since it is in the quotient, if you increase the memory bandwidth (the accepted definition is frequency x word size or bus width) the result goes down, but in the same vein if your increase the CAS latency, the index value also goes down, which directionally contradicts an improvement in performance. I think some reworking of your equation is in order. See the below picture and ignore the stuff in the upper part. |
|
February 19, 2015, 13:41 |
|
#3 |
New Member
Sylvain Boulanger
Join Date: Nov 2014
Posts: 17
Rep Power: 11 |
Ok, when I wrote this, it made a lot of sense in my head but now that you answered me I had to read myself again and realised what I said doesn’t work at all. The initial idea was to compare the amount of data the CPU could generate against the amount the memory could via a ratio. The (very wrong) reason I put the CAS latency in there was because I thought the memory wasn’t fully used until the data generated by the CPU was bigger than the data generated by the memory by a factor equal to the CAS latency. That would have implied some sort of multiphase memory controller with the number of phases equal to the CAS latency and, well, this was all a figment of my imagination.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Advice on the technical requirements for a new Fluent Workstation | Dorit | Hardware | 32 | July 17, 2018 02:48 |
mpirun, best parameters | pablodecastillo | Hardware | 18 | November 10, 2016 12:36 |
Optimal memory configuration | bindesboll | Hardware | 3 | October 23, 2013 06:42 |
Lenovo C30 memory configuration and discussions with Lenovo | matthewe | Hardware | 3 | October 17, 2013 10:23 |
Memory bandwidth vs number of GB | BrainPop | Hardware | 2 | August 22, 2011 01:40 |