Memory bandwidth and memory interleaving

Sly · January 25, 2015, 17:10

I’m currently in the process of assessing the hardware requirements for a small cluster I want to build. This will be my first build and I would like some input on questions I have. From what I’ve gathered in this forum, a CFD computer will generally have its bottleneck on memory bandwidth. That being said, I decided to evaluate CPUs against their memory bandwidth with the following equation:

((DP FLOPS x 8 bytes of information per operation)/CAS Latency)/Memory bandwidth
DP FLOPS are determined by the CPU architecture.
All CPUs analysed are 64 bit so 8 bytes.
Division by the CAS latency is to ensure a full use of the memory modules.

Now, I know this is all theoretical but to a certain degree, it should give an estimate of the system’s efficiency. Now, through my research to learn about all this I’ve stumbled upon something called memory interleaving. In some Supermicro motherboard user guides, I read that interleaved memory will be 128 bits instead of 64 bits. See the following link in the motherboard manual page 2-8 under support:
http://www.supermicro.com/aplus/moth...dgt-hlibqf.cfm

If this is the way I understand it, it would negatively affect about 20% of the CPUs I analysed because they would not be able to max out their memory bandwidth. So here are my questions:
1. Is the equation sound or is there a major flaw that I’m not aware of in it?
2. Does interleaved memory really yield 128 bits per cycle per channel?
3. If the answer to question 2 is no, how can we quantify (in bits per cycle per channel) the effect of interleaving whether it is 2,4, 8-way rank interleaving or node interleaving (for multisocket boards)?

Daveo643 · February 18, 2015, 15:55

Interesting question. I cannot say I can help answer your question, but maybe we can bounce ideas off one another and exchange notes offline.

As a result of this thread, I quickly made a spreadsheet where I put your equation and also played with various memory frequencies and CAS latencies. I'd like to know from where you got the equation and understand what parameter it is in fact trying to describe (I'm a mechanical engineer not a computer scientist). If I understand it correctly and analyze the dimensions of the terms in your equation, it's sort of like the bandwidth or time it takes to service each FP operation. I suppose you want to maximize the bandwidth and minimize the time to optimize the system. But I can't reconcile this definition with your equation with commonly used parameters when it comes to memory.

I am familiar with MHz/CL (performance index)
http://www.anandtech.com/print/8959/...ta-and-crucial

and 1/MHz (latency time)

To optimise performance, it is desirable to increase the frequency and reduce CAS latency where possible. Based on the index defined by your equation, since it is in the quotient, if you increase the memory bandwidth (the accepted definition is frequency x word size or bus width) the result goes down, but in the same vein if your increase the CAS latency, the index value also goes down, which directionally contradicts an improvement in performance. I think some reworking of your equation is in order.

See the below picture and ignore the stuff in the upper part.

Sly · February 19, 2015, 13:41

Ok, when I wrote this, it made a lot of sense in my head but now that you answered me I had to read myself again and realised what I said doesn’t work at all. The initial idea was to compare the amount of data the CPU could generate against the amount the memory could via a ratio. The (very wrong) reason I put the CAS latency in there was because I thought the memory wasn’t fully used until the data generated by the CPU was bigger than the data generated by the memory by a factor equal to the CAS latency. That would have implied some sort of multiphase memory controller with the number of phases equal to the CAS latency and, well, this was all a figment of my imagination.

January 25, 2015, 17:10	Memory bandwidth and memory interleaving	#1
Sly New Member Sylvain Boulanger Join Date: Nov 2014 Posts: 17 Rep Power: 11	I’m currently in the process of assessing the hardware requirements for a small cluster I want to build. This will be my first build and I would like some input on questions I have. From what I’ve gathered in this forum, a CFD computer will generally have its bottleneck on memory bandwidth. That being said, I decided to evaluate CPUs against their memory bandwidth with the following equation: ((DP FLOPS x 8 bytes of information per operation)/CAS Latency)/Memory bandwidth DP FLOPS are determined by the CPU architecture. All CPUs analysed are 64 bit so 8 bytes. Division by the CAS latency is to ensure a full use of the memory modules. Now, I know this is all theoretical but to a certain degree, it should give an estimate of the system’s efficiency. Now, through my research to learn about all this I’ve stumbled upon something called memory interleaving. In some Supermicro motherboard user guides, I read that interleaved memory will be 128 bits instead of 64 bits. See the following link in the motherboard manual page 2-8 under support: http://www.supermicro.com/aplus/moth...dgt-hlibqf.cfm If this is the way I understand it, it would negatively affect about 20% of the CPUs I analysed because they would not be able to max out their memory bandwidth. So here are my questions: 1. Is the equation sound or is there a major flaw that I’m not aware of in it? 2. Does interleaved memory really yield 128 bits per cycle per channel? 3. If the answer to question 2 is no, how can we quantify (in bits per cycle per channel) the effect of interleaving whether it is 2,4, 8-way rank interleaving or node interleaving (for multisocket boards)?

February 19, 2015, 13:41		#3
Sly New Member Sylvain Boulanger Join Date: Nov 2014 Posts: 17 Rep Power: 11	Ok, when I wrote this, it made a lot of sense in my head but now that you answered me I had to read myself again and realised what I said doesn’t work at all. The initial idea was to compare the amount of data the CPU could generate against the amount the memory could via a ratio. The (very wrong) reason I put the CAS latency in there was because I thought the memory wasn’t fully used until the data generated by the CPU was bigger than the data generated by the memory by a factor equal to the CAS latency. That would have implied some sort of multiphase memory controller with the number of phases equal to the CAS latency and, well, this was all a figment of my imagination. Arnold32 likes this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Advice on the technical requirements for a new Fluent Workstation	Dorit	Hardware	32	July 17, 2018 02:48
mpirun, best parameters	pablodecastillo	Hardware	18	November 10, 2016 12:36
Optimal memory configuration	bindesboll	Hardware	3	October 23, 2013 06:42
Lenovo C30 memory configuration and discussions with Lenovo	matthewe	Hardware	3	October 17, 2013 10:23
Memory bandwidth vs number of GB	BrainPop	Hardware	2	August 22, 2011 01:40

February 18, 2015, 15:55		#2
Daveo643 New Member Join Date: Mar 2013 Location: Canada Posts: 22 Rep Power: 13	Interesting question. I cannot say I can help answer your question, but maybe we can bounce ideas off one another and exchange notes offline. As a result of this thread, I quickly made a spreadsheet where I put your equation and also played with various memory frequencies and CAS latencies. I'd like to know from where you got the equation and understand what parameter it is in fact trying to describe (I'm a mechanical engineer not a computer scientist). If I understand it correctly and analyze the dimensions of the terms in your equation, it's sort of like the bandwidth or time it takes to service each FP operation. I suppose you want to maximize the bandwidth and minimize the time to optimize the system. But I can't reconcile this definition with your equation with commonly used parameters when it comes to memory. I am familiar with MHz/CL (performance index) http://www.anandtech.com/print/8959/...ta-and-crucial and 1/MHz (latency time) To optimise performance, it is desirable to increase the frequency and reduce CAS latency where possible. Based on the index defined by your equation, since it is in the quotient, if you increase the memory bandwidth (the accepted definition is frequency x word size or bus width) the result goes down, but in the same vein if your increase the CAS latency, the index value also goes down, which directionally contradicts an improvement in performance. I think some reworking of your equation is in order. See the below picture and ignore the stuff in the upper part.