CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   Hardware (http://www.cfd-online.com/Forums/hardware/)
-   -   Workstation with new E5 Xeon Ivy-bridge (http://www.cfd-online.com/Forums/hardware/123525-workstation-new-e5-xeon-ivy-bridge.html)

Manuelo September 15, 2013 04:00

Workstation with new E5 Xeon Ivy-bridge
 
I'm involved in a coal combustion simulation in Fluent. As Intel has released its new E5 ivy-bridge Xeon v2 processors last week, I thought it was a good idea to make some hardware upgrading with that stuff. But what I have read in this forum has confused me a bit, and I'm not sure if I'm selecting the best hardware configuration to put my money in.

I would appreciate if someone could give me some advice. I appologize in avance because there may be other threads in the forum that could answer to my question but I've done some searching and I couldn't completely solve my doubt.

Here is the workstation I'd firstly selected:

- 2 x Intel Xeon E5-2697 v2 (12 cores each proccesors, 24 cores total, 30 mb cache). This micro is the new flagship of Intel Xeon two sockets line and is available just from a few days ago.
- 8 x 8 gb = 64 gb RAM memory at 1866 Mhz (max speed allowed by Xeon Ivy-bridge). Eight channel memory as there are two sockets.
- Nvidia 650 Ti graphic card.
- ASUS Z9PE-D8 WS motherboard.
- A good heat dissipation set.

This computer would cost around 8.000 €. For that money or even less I could probably buy two of the following computers, which would be linked by gigabit ethernet to work in distributed parallel:

- Intel i7-4960X (15 mb of cache, 6 cores). This has also been released last week and I think it could probably beat the previous top i7 desktop processor of Intel (3970X).
- 8 x 4 gb = 32 gb RAM memory at 2400 Mhz (overclocked). Four channel memory.
- The rest would be similar to the Xeon configuration.

The first configuration has more cores (24 vs 12, and every core from Xeon may be better than i7 cores), the same cache size (30 vs (15 + 15)), the same memory channels (4 for every processor) but less memory speed (1866 vs 2400). So in terms of processing power I think Xeon is the best choice but in terms of memory bandwidth it would probably be on the contrary. With regard to connection between processes, Xeon (with high speed qpi link) would probably beat to gigabit ethernet link.

I thought that Xeon was the best choice but after looking at some threads across this forum I've seen that the most important parameter form the CFD point of view is the memory bandwidth, so I wonder if the couple of i7 would beat to the Xeon one. I'm stating this because Xeon can't go beyond 1866 Mhz, but i7 can go up to 2400 Mhz easily and work stable at that speed.

Another point would be the net speed and latency. Do you think that conventional gigabit ethernet would perform properly for interconnect or should I link them with 10gb Ethernet?

Here is a outline of the simulation Im involved in:
- Coal combustion in a multiburner boiler with dpm, eddy-dissipation/finite-rate and DO-radiation (around 10 millions cells)
- Steady simulation.
- RSM turbulence model.

I have up to 32 proccesses license, so there wouldn't be any problem with the software.

Thanks.

Manuelo September 15, 2013 12:23

Another doubt. Is it worth buying the top E5 ivy processor in the sense that the system could be bottlenecked by the memory bandwidth? Would I roughly get the same performance using a proccesor with lower cores (e.g. E5 2787W v2 with 8 cores instead of 12)?

The cost of the 12-cores one is 50% higher tan the one with 8-cores.

jdrch September 15, 2013 18:51

Quote:

Originally Posted by Manuelo (Post 451766)
The most important parameter form the CFD point of view is the memory bandwidth

True, but this only if all of said memory is on the same machine. Otherwise you run into latency and bandwidth bottlenecks having to transfer data over Ethernet.

I would get the Xeon machine, but I would get a Quadro Tesla GPU instead of the 650 Ti. That way, you'll have ECC (error correction) at the CPU, RAM, and GPU levels. This will prevent corruption of data in RAM, the likelihood of which increases with RAM size.

Judging from the fact that your license covers 32 cores, you should optimize for performance, not cost :)

Manuelo September 16, 2013 01:38

Thanks jdrch,

I thought there wasn't no point on investing money in such powerful graphics card because my software (Fluent) doesn't support gpucomputing.

And the memory isn`t ECC because I couldn't find it rated at 1866 MHz. At least my hardware supplier says they can't find it.

jdrch September 16, 2013 02:01

Quote:

Originally Posted by Manuelo (Post 451916)
I thought there wasn't no point on investing money in such powerful graphics card because my software (Fluent) doesn't support gpucomputing.

Fair enough. I guess I'm biased because Quadros are the only GPUs I've ever run FLUENT on. Quadros are tested and certified for engineering analysis applications, which means you're less likely to run into rendering problems with them. Given the nonstandard graphics implementations found in many engineering packages, a certified card is a good thing to have.

Quote:

Originally Posted by Manuelo (Post 451916)
And the memory isn`t ECC because I couldn't find it rated at 1866 MHz. At least my hardware supplier says they can't find it.

This is totally your decision. I'm just saying that with that much RAM memory errors actually become a significant risk, which is problematic given CFD's large in-memory data requirements.

bindesboll September 16, 2013 06:03

I have been working for some months now to specify a hardware solution for optimal utilisation of licenses to 32 cores. We are about to order below system:

2 machines:
Dual CPU motherboard
2 x Xeon E5-2667 v2 3,3 GHz, 8-cores
32 GB: 8 x 4 GB RAM 1866 MHz
HDD: 2 pcs 140 GB, 15.000 rpm SATA-600, RAID 1
Infiniband networkadapter (cluster interconnect)
Gigabit Ethernet networkadapter (system network)

One machine used for data storrage has additionally:
3 x 500 GB 10.000 rpm HDD, RAID 5.

As the 32 cores cannot be in a single machine, two dual sockets machines are chosen.
This ends up being 8 cores per CPU (2 x 2 x 8 = 32). There is no benefit in more cores per CPU as the system is most likely memory bandwidth limited.

Infiniband is choosen for cluster interconnect as the latency is low. Bandwidth is not that important at the amount of data exchange between the cluster nodes should not be that huge, when the solving is up running. But the speed or the individual data requests is important. Infiniband switches are costly, but can be ommited as there is only 2 nodes in the system, these can be connected directly without a switch.

The Pre and Post work will be done on my present CFD workstation (AMD Firepro 7900). So no special grafics are required for the cluster nodes.

Manuelo September 16, 2013 07:07

Hi bindesboll,

Thanks for your advices. Just a quick questions:

- Are you sure that more powerful processors would lead you to a memory bandwidth limited system?

- I've never dealt with infiniband parts. How much does it cost an infiniband adapter approximately?

- What about Ethernet 10 Gb? Could it be also considered for the two machines cluster that you've proposed?

- Why don`t you go to SSD hard disk solutions?

bindesboll September 16, 2013 07:58

Quote:

Originally Posted by Manuelo (Post 451988)
- Are you sure that more powerful processors would lead you to a memory bandwidth limited system?

As it appears from below scaling efficiencies, the scaling from 4 to 8 cores are poor. Thus more CPU power will not improve the speed, indicating that the system is otherwise limited, thats is: if the data cannot get in/out of the CPU it does not matter how fast the CPU performes (how many cores it has).
Actually you could consider to go for even quad-core or six-cores CPUs, to get a better utilization of each core. However, then you would need more than 2 cluster nodes (increased cost), which would also decrease the scaling efficiency and imply a costly Infiniband switch (5-6.000 ).

Quote:

Originally Posted by Manuelo (Post 451988)
- I've never dealt with infiniband parts. How much does it cost an infiniband adapter approximately?

40 Gb/s Infiniband networkadapter (PCI Express) costs 300-400 .

Quote:

Originally Posted by Manuelo (Post 451988)
- What about ethernet 10 Gb? Could it be also consider for the two machines cluster?

My ANSYS reseller has done some testing for me on a 32 core system based on Xeon E5 2670 (8 core CPU). This system showed the following scaling efficiency (1 = linear scaling):
From 2 to 4 cores: 0.91
From 4 to 8 cores: 0.62 (clearly memorybandwidth limited, more cores will not increase speed).
From 8 to 16 cores (Motherboard CPU interconnect): 0.92
From 16 to 32 cores (cluster interconnect, GigE): 0.82
As it appears the efficiency of using 8 cores instead of 4 cores is poor indicating that the CPU performance of 8 cores is limited by the memory bandwidth.
Also the effiency of using 32 cores instead of 16 cores is not optimal, indicating that the cluster interconnect (in this case 1 Gb/s Gigabyte Ethernet) is limiting. Thus our decision to go for Infiniband.

Quote:

Originally Posted by Manuelo (Post 451988)
- Why don`t you go with SSD hard disk solution?

SSD will only increase the calculation speed if heavy disk-write operations are done during solving (frequent write of transient data). Otherwise SSD will do no difference for the solving time. Maybe the load time of CFD cases will improve a few seconds but as we have 10.000 rpm in RAID 5 this will be insignificant.

Manuelo September 16, 2013 10:38

Ok, thanks for your explanations. Really interesting.

As memory speed has been increased in ivy-bridge (+16.6%) there will be higher memory bandwidth. Hopefully this will help us.

Why didn't you choose 2687W? It also has 8 cores. Maybe for power saving?

bindesboll September 17, 2013 02:48

Quote:

Originally Posted by Manuelo (Post 452037)
Why didn't you choose 2687W? It also has 8 cores. Maybe for power saving?

I would expect the performance of 2687W v2 and 2667 v2 to be very similiar. Dependant on to what degree the memory bandwidth limits the system even the 2650 v2 could be interesting - especially as the price of the CPU is only half of the other two.

Manuelo September 25, 2013 17:19

Bindesboll, thanks for your comments.

I've found an Intel report stating significant performance increase between 2687W v2 and 2697 v2 processors. They don't give many details...

Here is the link:

http://www.intel.com/content/www/us/...ys-fluent.html

What's your opinion about it?

bindesboll September 26, 2013 02:12

Its scales supprisingly well. The scaling efficiency is 0.77: For linear scaling it ought to scale 12/8 (cores) = 1.5, but it only scales 1.41/1.22 = 1.16 (performance), thus 1.16/1.5 = 0.77. This is better than the scaling efficiency of 0.62 going from 4 to 8 cores on the Xeon E5 2670 I presented above. So it seems that the scaling for many cores is significantly better on the new Xeon E5 2600 v2 generation.

Still I would look into the price vs. performance before buying the 2697 v2 processor as usually the price doesnt scale well on the top-model CPUs :-)

And still if your license is limited to 32 cores, you would still end up with a system with 2 nodes, double CPU motherboard and 8 core CPUs (2687W v2 or 2667 v2) . Using the 2697 v2 you could configure a system with a single node, double CPU motherboard and 12 core CPU, but that would only give you 24 cores. I wouldnt expect that to outperform a 32 cores system.

jdrch September 26, 2013 02:23

FWIW, don't forget about Opteron options: http://www.padtinc.com/blog/the-focu...-vs-intel-xeon <- Article here has a 16 core dual Xeon machine being absolutely trounced in FLUENT benchmarks by a 16 core Opteron one costing much less.

Manuelo September 26, 2013 03:41

Bindesboll, I agree with you with regard to license aspect. But I'm not sure about the pricing point. Price for a workstation with 2 x E5-2687W v2 is about 7000 € and the one with 2 x E5-2697 v2 is 8000 €. So the ratio is 8/7 = 1.14. That fit the performance ratio pretty well.

jdrch, thanks for your advice I'll have a look on it. But it's not easy to find amd opteron for me and xeon is everywhere

jdrch September 26, 2013 03:53

Quote:

Originally Posted by Manuelo (Post 453648)
But it's not easy to find amd opteron for me and xeon is everywhere

I was just saying so for argument's sake. In reality given your situation you really should pick a Xeon machine. Opteron only if you're budget conscious.

jmcentee September 26, 2013 03:56

I thought I would add in some information of interest, as I have been heavily researching a CFD cluster recently.

Infiniband switches have a very big price difference, I have quotes ranging from 2,000 (intel) to 18,000 (re-badged mellonex) for a 36 port switch.

Infiniband cables also range from the sensible 50 to 550

Infiniband cards also range from 250 to 950

The latency of QDR and FDR is very similar so for CFD the QDR is more cost effective.

Intel also do a compute server platform, H2312XXKR, that is a 2u rackmount. That takes 4 x dual cpu server nodes, with the right model can have onboard infinband for about the 12,000 + VAT price (80 cores using 10 core cpus) so just 2 nodes in it for 32 cores would be 6,000 but you would need a good desktop, e.g. 1000. for pre and post processing

What looks like a good guide to setting up infiniband is at.
http://pkg-ofed.alioth.debian.org/ho...owto.html#toc4

John

Manuelo September 26, 2013 05:02

I've never dealt with servers but I'll check it because it looks quite interesting.

Thanks for your contribution.

Manuelo September 26, 2013 12:09

bindesboll, check these figures:

2697 v2: 12*3,0 GHz (turboboost on 12 cores) = 36,0 GHz
2687W v2: 8*3,6 GHz (turboboost on 8 cores) = 28,8 GHz

So the ratio between both proccessors is 36/28.8 = 1,25

So the scaling efficiency is = 1.16/1.25 = 0.928

Am I right? What's your opinion? Do you trust on that benchmark

bindesboll September 27, 2013 03:18

Yes, you can calculate the scaling efficiency that way. However, it doesnt change the conclusion that you gain 16% performance by using 50% more licenses (number of cores).

RodriguezFatz November 19, 2013 09:18

Quote:

Originally Posted by bindesboll (Post 452000)
As it appears from below scaling efficiencies, the scaling from 4 to 8 cores are poor. Thus more CPU power will not improve the speed, indicating that the system is otherwise limited, thats is: if the data cannot get in/out of the CPU it does not matter how fast the CPU performes (how many cores it has).

What about one motherboard with two CPUs? Do they have twice the memory bandwidth, compared with a single CPU board? Do you get any improvement for 8-core license, when using two CPUs in one workstation?


All times are GMT -4. The time now is 04:10.