Latency vs bandwidth

Simbelmynë · August 1, 2018, 11:19

So I have gone balls to the wall with my old 7600k (@4.7 GHz) and purchased a 4000 MHz Samsung b-die memory kit. Obviously I forgot to check if my ageing motherboard managed that frequency (it did not). The positive aspect is that I managed to run it at very aggressive timings @ 3466 MHz

@3333 MHz (13-13-13-33) my system managed to run the benchmark at 265 seconds.
@3466 MHz (14-14-14-34) it finished at 264 seconds.

(both a rather sizable difference compared to the 2400 MHz (15-15-15-36) memory I hade before when it finished at 321 seconds)

My question is what we should try to go for in terms of CFD? Is bandwidth the king or do we have enough random reads that latency is important as well? I am a bit confused right now by the results.

flotus1 · August 1, 2018, 13:48

Remember that access time -the quantity we are interested in here- is a product of memory frequency and latency.

For DDR4-3333 CL13 access time is: 13 cycles / 3333e6 cycles/s x 2 = 7.8 ns
For DDR4-3466 CL14: 14 cycles / 3466e6 cycles/s x 2 = 8.1 ns

The factor 2 comes from DDR memory running at half the advertised frequency.
So only about 3.5% slower access times and about 4% more bandwidth for the faster memory.

You could run your own tests. Fix memory frequency and change latencies in a wider range.
In my experience, latency is less important than frequency. At least with optimized codes. You will see some scaling for latency, but not nearly as much as for memory frequency. A good CFD code should be optimized towards sequential memory access.
So when doing manual overclocking and given the choice between tight latencies and lower frequency vs higher frequency with looser timings, I would opt for the latter. Again, higher frequencies mean that looser timings will result in similar access times.
Then again, memory latencies are a rabbit hole. There are the primary -most important- latencies you see printed on the DIMMs, but there are also secondary and tertiary timings that can affect performance and depend on each other. Dozens of them...So without controlling them explicitly, setting very aggressive primary latencies could result in some very poor secondary and tertiary latencies that cancel out any improvement or even result in worse overall performance.

Btw. it is probably not so much the motherboard that is limiting your overclocking results but the IMC. See if you can find some overclocking guides for your specific platform, I am sure that it can be tweaked further if you adjust the right voltages. Probably VCCIO and VCCSA.

Simbelmynë · August 2, 2018, 10:57

AnandTech lists an aggregate performance index between latency and frequency. So for general usage I think 3333 with lower latency is better (I might go with 3200 CL12 even). It seems CFD cases need more testing though. Right now it looks like latency does have an impact considering the two cases (perhaps a structured mesh would give a different result?).

I will continue to try to cream out more of the memory kits, but so far it has been impossible to go over 3466 even with extra voltage on VCCIO etc. The problematic part is that I have a remote setup with my computer in the basement and my monitor etc. two levels up. While it is very nice to have a completely silent work-space, it is not as nice to do overclocking where you constantly need to run up and down several stairs to reboot a crashed machine

flotus1 · August 2, 2018, 11:01

And their performance index is based on what? Seems like the applied a pretty arbitrary formula here: Frequency divided by CL, thus assuming linear scaling for both latency and frequency. It may be convenient, but at the same time pretty irrelevant and misleading. I am trying hard to avoid calling BS on that, but that's what it is

Simbelmynë · August 2, 2018, 11:16

lol, yes perhaps it is. However, given that the access time differed by 3.5% and the bandwidth by 4% we have a 0.5% difference in favor of the faster memory. This is in fact exactly what we see in the calculations. Obviously I need to do much more experiments to be able to state anything with confidence.

evcelica · August 2, 2018, 13:25

I've always seen bandwidth being more important than latency. So frequency is more important than CL timings. Especially when you are memory bandwidth limited (overclocked processors, or high CPU core to memory channel ratio)

One test was with ANSYS mechanical:
3930K overclocked to 4.4 GHz:

2, 3, and 4 processors:
Going from 1600MHz CL 11-11-11 to CL 9-9-9 gave me an increase of:
2%, 2.7%, and 1.2%
Going from 1600MHz CL 9-9-9 to 2133 MHz CL 9-11-10-28 gave me an increase of:
8.9%, 9.9%, and 11.5%

Simbelmynë · August 2, 2018, 16:39

Access time 1600 MHz@CL11 = 13.75 ns
Access time 1600 MHz@CL9 = 11.25 ns
Access time 2133 MHz@CL9 = 8.44 ns

In your first test you decrease the access time by 18% and have no change to bandwidth. In your second test you decrease the access time by 25% while also increasing the bandwidth by 33%. To make a fair comparison of bandwidth and access time I think you should keep the access time constant (2133 MHz@CL12 is close to 11.25 ns).

Anyways it seems clear that the access time improvement is not of the same magnitude as the benefit of increasing the frequency. In your test at least.

flotus1 · August 2, 2018, 16:58

Whenever you write a CFD code and try to optimize for execution speed, one of the priorities has to be sequential memory access. Avoiding latency-bound execution as much as possible.
One of the reasons behind it: While memory bandwidth has improved over the past decades, latency remains more or less stagnant. So if your code was latency-bound, you would be stuck with the same performance that hardware 10-20 years old could deliver. Advanced prefetching techniques can hide some poor coding, but they can only do so much.

August 1, 2018, 11:19	Latency vs bandwidth	#1
Simbelmynë Senior Member Join Date: May 2012 Posts: 546 Rep Power: 15	So I have gone balls to the wall with my old 7600k (@4.7 GHz) and purchased a 4000 MHz Samsung b-die memory kit. Obviously I forgot to check if my ageing motherboard managed that frequency (it did not). The positive aspect is that I managed to run it at very aggressive timings @ 3466 MHz @3333 MHz (13-13-13-33) my system managed to run the benchmark at 265 seconds. @3466 MHz (14-14-14-34) it finished at 264 seconds. (both a rather sizable difference compared to the 2400 MHz (15-15-15-36) memory I hade before when it finished at 321 seconds) My question is what we should try to go for in terms of CFD? Is bandwidth the king or do we have enough random reads that latency is important as well? I am a bit confused right now by the results.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
CFD workstation - memory bandwidth	fusij	Hardware	17	December 20, 2016 15:10
Memory bandwidth and memory interleaving	Sly	Hardware	2	February 19, 2015 13:41
Choosing RAM frequency vs latency ?	TMC	Hardware	2	July 12, 2013 19:02
how to do bandwidth reduction using gambit	gambituser	FLUENT	9	January 31, 2010 12:52
Bandwidth and Iterative Solvers	cfd101	Main CFD Forum	0	September 9, 2005 12:05

August 1, 2018, 13:48		#2
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,399 Rep Power: 46	Remember that access time -the quantity we are interested in here- is a product of memory frequency and latency. For DDR4-3333 CL13 access time is: 13 cycles / 3333e6 cycles/s x 2 = 7.8 ns For DDR4-3466 CL14: 14 cycles / 3466e6 cycles/s x 2 = 8.1 ns The factor 2 comes from DDR memory running at half the advertised frequency. So only about 3.5% slower access times and about 4% more bandwidth for the faster memory. You could run your own tests. Fix memory frequency and change latencies in a wider range. In my experience, latency is less important than frequency. At least with optimized codes. You will see some scaling for latency, but not nearly as much as for memory frequency. A good CFD code should be optimized towards sequential memory access. So when doing manual overclocking and given the choice between tight latencies and lower frequency vs higher frequency with looser timings, I would opt for the latter. Again, higher frequencies mean that looser timings will result in similar access times. Then again, memory latencies are a rabbit hole. There are the primary -most important- latencies you see printed on the DIMMs, but there are also secondary and tertiary timings that can affect performance and depend on each other. Dozens of them...So without controlling them explicitly, setting very aggressive primary latencies could result in some very poor secondary and tertiary latencies that cancel out any improvement or even result in worse overall performance. Btw. it is probably not so much the motherboard that is limiting your overclocking results but the IMC. See if you can find some overclocking guides for your specific platform, I am sure that it can be tweaked further if you adjust the right voltages. Probably VCCIO and VCCSA.

August 2, 2018, 10:57		#3
Simbelmynë Senior Member Join Date: May 2012 Posts: 546 Rep Power: 15	AnandTech lists an aggregate performance index between latency and frequency. So for general usage I think 3333 with lower latency is better (I might go with 3200 CL12 even). It seems CFD cases need more testing though. Right now it looks like latency does have an impact considering the two cases (perhaps a structured mesh would give a different result?). I will continue to try to cream out more of the memory kits, but so far it has been impossible to go over 3466 even with extra voltage on VCCIO etc. The problematic part is that I have a remote setup with my computer in the basement and my monitor etc. two levels up. While it is very nice to have a completely silent work-space, it is not as nice to do overclocking where you constantly need to run up and down several stairs to reboot a crashed machine

August 2, 2018, 11:01		#4
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,399 Rep Power: 46	And their performance index is based on what? Seems like the applied a pretty arbitrary formula here: Frequency divided by CL, thus assuming linear scaling for both latency and frequency. It may be convenient, but at the same time pretty irrelevant and misleading. I am trying hard to avoid calling BS on that, but that's what it is

August 2, 2018, 11:16		#5
Simbelmynë Senior Member Join Date: May 2012 Posts: 546 Rep Power: 15	lol, yes perhaps it is. However, given that the access time differed by 3.5% and the bandwidth by 4% we have a 0.5% difference in favor of the faster memory. This is in fact exactly what we see in the calculations. Obviously I need to do much more experiments to be able to state anything with confidence.

August 2, 2018, 13:25		#6
evcelica Senior Member Erik Join Date: Feb 2011 Location: Earth (Land portion) Posts: 1,167 Rep Power: 23	I've always seen bandwidth being more important than latency. So frequency is more important than CL timings. Especially when you are memory bandwidth limited (overclocked processors, or high CPU core to memory channel ratio) One test was with ANSYS mechanical: 3930K overclocked to 4.4 GHz: 2, 3, and 4 processors: Going from 1600MHz CL 11-11-11 to CL 9-9-9 gave me an increase of: 2%, 2.7%, and 1.2% Going from 1600MHz CL 9-9-9 to 2133 MHz CL 9-11-10-28 gave me an increase of: 8.9%, 9.9%, and 11.5%

August 2, 2018, 16:39		#7
Simbelmynë Senior Member Join Date: May 2012 Posts: 546 Rep Power: 15	Access time 1600 MHz@CL11 = 13.75 ns Access time 1600 MHz@CL9 = 11.25 ns Access time 2133 MHz@CL9 = 8.44 ns In your first test you decrease the access time by 18% and have no change to bandwidth. In your second test you decrease the access time by 25% while also increasing the bandwidth by 33%. To make a fair comparison of bandwidth and access time I think you should keep the access time constant (2133 MHz@CL12 is close to 11.25 ns). Anyways it seems clear that the access time improvement is not of the same magnitude as the benefit of increasing the frequency. In your test at least.

August 2, 2018, 16:58		#8
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,399 Rep Power: 46	Whenever you write a CFD code and try to optimize for execution speed, one of the priorities has to be sequential memory access. Avoiding latency-bound execution as much as possible. One of the reasons behind it: While memory bandwidth has improved over the past decades, latency remains more or less stagnant. So if your code was latency-bound, you would be stuck with the same performance that hardware 10-20 years old could deliver. Advanced prefetching techniques can hide some poor coding, but they can only do so much.