|
[Sponsors] |
March 31, 2016, 06:29 |
Underperforming HW
|
#1 |
New Member
Join Date: Mar 2016
Posts: 4
Rep Power: 10 |
Hi guys, I'm trying my new hardware configuration for CFX (32 core license) that is:
Double processor E5-2697 v3 (14x2=28 cores) 128 GB RAM DDR4 (4x32GB, 2 x cpu) SSD I applied the reccomended tuning option: Hyperthreading off, NUMA on, Turbo on, QPI home snoop. During the first tests it's underperforming and it's slower than the old machine, what am I doing wrong? Can be an issue with the RAM bandwidth due to only 2 slots for each CPU used instead of 4? Can be a software setting issue on CFX? I'm using the default configuration. Thank you |
|
April 1, 2016, 03:34 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
I cannot tell If the memory misconfiguration is your root cause of low performance. But it is definitely part of the problem. According to the manual of your mainboard, did you populate the DIMMs correctly to use at least 2 memory channels per CPU?
What are the specs of the "old machine" you are referring to? And what are the parameters of the benchmark you are using, especially the number of cells in the mesh and the number of cores used. |
|
April 1, 2016, 05:30 |
|
#3 |
New Member
Join Date: Mar 2016
Posts: 4
Rep Power: 10 |
I populated 2 channels for each processor (32GB each slot), the mesh is about 20kk objects, 26 cores over 28 used.
|
|
April 1, 2016, 05:54 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Sorry if this is what you meant, but I have to ask one more time very specifically:
Are you aware of the fact that even with 2 DIMM slots populated per processor, you can still have only one memory channel if the DIMMs are not distributed correctly? Refer to the manual of the mainboard which slots belong to which memory channel. But even with dual-channel memory working correctly, you could get about twice the processing speed in the CFD benchmark you describe by using the full quad-channel interfaces instead. A CFX simulation on 24 cores is definitely bandwidth-limited. Let me ask once more about the specifications of the old machine you are referring to. And your new workstation, did you build it yourself or did you buy it off the shelf? If you built it yourself, It would be interesting to know the rest of the components: case, PSU, CPU-coolers... |
|
April 1, 2016, 10:43 |
|
#5 |
New Member
Join Date: Mar 2016
Posts: 4
Rep Power: 10 |
It's a machine HP Z640 and it should have 4 channels for each CPU, the old machine was improvised with 2 machines in parallel with an old E5-2670 first version and an even older X5675.
So, for the future, if I fill the 4 RAM slots for each CPU the bandwith will almost double and improve the calculation. Thank you |
|
April 1, 2016, 12:14 |
|
#6 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
It would not be the first time that a pre-built workstation from one of the big computer manufacturers was shipped with misconfigured memory.
If you are running a Windows operating system you can use the freeware tool "CPU-Z" to find out if it is at least running in dual-channel mode. And it even comes with a rudimentary CPU benchmark option. One of our workstations with 2 Xeon E5-2697v2 scored around 30000 points in the multi-threaded benchmark, just in case you want to compare the results. |
|
April 4, 2016, 02:18 |
|
#7 |
New Member
Join Date: Mar 2016
Posts: 4
Rep Power: 10 |
Thank you, what tool did you used for the benchmark? I can try the same
|
|
April 4, 2016, 05:10 |
|
#8 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
CPU-Z. I am not implying that it is a proper benchmarking tool, but a very quick method to find out if there is something completely wrong with the system other than the memory configuration.
|
|
April 4, 2016, 16:06 |
|
#9 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23 |
I use Aida64 for benchmarking / troubleshooting. There are memory benchmarks, (read/write/copy) and you can see which memory slots are filled for each CPU, to see if they are populated correctly.
Only populating 2 memory channels for each CPU, even if don correctly, is going to hurt performance a lot. Yes, populating all 4 channels should theoretically double your performance. The "old" E5-2670 (if dual CPU with all 4 memory channels populated) would be much faster than your new machine. |
|
|
|