CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   Xeon workstation: suggestions needed (https://www.cfd-online.com/Forums/hardware/172148-xeon-workstation-suggestions-needed.html)

Blanco May 24, 2016 19:59

Xeon workstation: suggestions needed
 
Hi all,

I'm looking for a new workstation in order to speed up my CFD simulations, which are basically very processor-intensive (SW used: Star-ccm+, Converge) other than RAM intensive. Let's consider that I could use 30 cores simultaneously as parallel licenses will be available.

I already have a E5-2667 v3 (8 cores, 3.2/3.6 GHz, 8x4 Gb RAM 2133 MHz) workstation available and I would add another workstation in order to perform both distributed simulations and single-machine simulations. Std ethernet connection will be used for this purpose (any suggestion/advice about this? I think it is not necessary to use Infiniband as total cores number is quite low...please correct me if I'm wrong).

I am considering the following alternative configuration for the new workstation:

1- 2xE5-2667 v4 (8 cores each, 3.2/3.6 GHz, 25 MB L1, 16x4 Gb RAM 2400 MHz). Pro: high peak frequency. Cons: no update possible, most expensive

2- 2xE5-2687Wv4 (12 cores each, 3/3.5 GHz, 30 MB L1, 16x8 Gb RAM 2400 MHz). Pro: more cores available, cheaper than config. 1 (even if RAM is double). Cons: slightly lower peak freq. than config 1, no update possible

3- 1xE5-2683 v4 (16 cores, 2.1/3 GHz, 40 MB L1, 8x8 Gb RAM 2400 MHz). Pro: same cores as 1, cheaper config, single processor therefore is possible a future upgrade to dual cpu. Cons: lower peak freq.

4- 1xE5-2697 v4 (18 cores, 2.3/3.6 GHz, 45 MB L1, 8x8 Gb RAM 2400 MHz). Pro: highest core number on a single processor, good for future update. Cons: low base frequency but peak freq. comparable to config 1, more expensive than config. 3 and there is not 1 memory channel for each cores...so I suppose that this config will behave only "slightly" better than config. 3

Personally I think config 2 is the right trade-off because I'm a little bit concerned about peak frequency, but I don't have experience with Xeon "W" series: is there any known issue with the higher power consumption/thermal dissipation? Do you have any suggestion, considering that distributed simulation will be also performed (but also single-machine simulations)?

Any advice is welcome.
Thank you in advance.

Best

flotus1 May 25, 2016 10:51

Just a hint: many dual-socket motherboards have serious limitations or might not work at all if only one CPU is installed. Have a close look at the manual or better ask the manufacturer if your setup runs with only one CPU.

From your 4 alternatives, option 2 will be fastest in CFD simulations. I don't see how it could be cheaper than option 1, but if it is that is even better for you. The high core count of options 3 and 4 may seem like a good idea, but with low clock speed and only 4 memory channels in total they will be significantly slower.
If you buy your workstations from one of the usual suspects like DELL, HP, Fujitsu... they have already taken care of the high TDP of the processors. If you are building the workstation yourself, make sure to use a proper aftermarket CPU-cooler (e.g. Noctua NH-D15), a high-quality power-supply (e.g. BeQuiet dark power pro P11 550W or more depending on the graphics card) and provide enough ventilation in the case.

BTW: you might find this site helpful. I did.
https://www.microway.com/knowledge-c...ep-processors/

Blanco May 25, 2016 18:49

Thanks a lot Alex for your suggestion and for the useful link, very interesting! I think I'll go further with option 2 as you also confirmed that it will be the fastest. Do you have any other advice on the Ethernet/infiniband connection?

bindesboll May 26, 2016 05:21

No doubt that solution 2 is the best for CFD - actually the configuration I would buy if I had to buy a new cluster tomorrow.
The trick is to have the right balance between CPU performance and RAM bandwidth, as there is no need for more CPU performance than the RAM bandwidth can handle.
I have a system of nodes 2 x E5-2667 v2 3.3 GHz and 8 x 8 GB DDR3 1867 MHz RAM. At that time this balance between CPU performance and RAM bandwidth.
If you compare CPU performance (CFP2006 Rates - Base, http://www.spec.org/cpu2006/results/rfp2006.html) you will find that the upgrade to RAM frequency 2400 MHz fits the performance of E5-2687W v4. This means that E5-2683 v4 and E5-2697 v4 will be vaste of CPU cores, but E5-2667 v4 will be vaste of RAM bandwidth.

CPU peak frequency is not relevant for CFD, as it is only used when not all cores are loaded.

BR
Kim

RobertB May 26, 2016 12:20

Did you think about the 2680/2683?

Specrate FP shows then a bit faster and my experience with CCM+ is that it scales with this benchmark. They are also a little cheaper and use less power.

Blanco May 26, 2016 17:50

Hi all, thanks Kim for the useful hints and for the precise explanation. So option 2 is really the best option amongs the other considered, I'll proceed with that one.
Robert e5 2683 was in option 3 but base frequency seems low compared to others, even if it has higher cores number. In the spec I see 2687W gives better results...

RobertB May 26, 2016 18:55

specfprate

https://www.spec.org/cpu2006/results...2/#SPECfp_rate

2667 V4 724
2687W v4 888
2683 V4 933

Why were you buying only 1 - they are cheaper than the 2687W on a per processor basis?

http://ark.intel.com/products/family...v4-Family#@All

CPU clock speed is irrelevant it is throughput that counts, at least for the solver.

Blanco May 26, 2016 19:39

Hi Robert, yes you're right about the spec and the throughput, I keep thinking of clock speed because throughput is proportional to clock frequency and memory bandwidth, isn't it? That's why I was concerned about the low clock characterizing the e5 2683.

In any case, yes the e5 2683 is cheaper than 2687W, which in turns is cheaper than the 2667 on a per processor basis . I was considering only 1 proc because of the 30 cores license constraint, where 30 licenses have to be shared between two workstations. I would prefer to do a future upgrade of the 2683 workstation in that case.

Best

Blanco May 26, 2016 20:07

In any case from the spec benchmark it seems to me that Kim is right. I mean, looking a the differences in base performance index and considering the number of cores in each processor, the e5 2687W seems the best balanced choice, where only 4 cores/proc more than e5 2667 gives a big improvement in performance index, compared to e5 2683

bindesboll May 27, 2016 04:06

The most feasible choice of CPU is highly dependant on license costs. Typically for commercially licenses, the license costs are significantly higher than the hardware costs, which means that you should buy the highest performing hardware solution for the number of cores that your licenses allow.

So if you e.g. have licenses for running 32 cores, you should compare price per perfomance of 1 workstation with 2 x 16 cores E5-2690 v4, compared to 2 workstations 2 x 8 cores E2667 v4 + Infiniband interconnect.

If you are running OpenFOAM and have no license cost it is simple to compare price/performance of each solution.
If you pay license per core and are not fixed by core count, you should just include the license cost in the total cost of the solution.

BR
Kim

Blanco May 27, 2016 08:28

Yes, license cost is higher than hw cost, therefore it is straightforward to buy the best hw configuration for a new workstation or cluster. I would probably go with a 2x16 cores workstation if I have to start from scratch, but my case is different because I already have a 1xe5 2667 v3 workstation, which gives good performance actually, and I have to add another One to reach 30 cores max. Option 2, 2xe5 2687w v4 will give me 24 cores on a single workstation, and I will reach 32 cores if needed. Moreover I could add another proc to actual workstation in future upgrade. Option 1 is undersized but I could add another 2667 v3 to the actual workstation. I think option 2 is still best in this perspective, considering that not all the simulations will be run in distributed config. Option 3 and 4 are more difficult to match with license constraint...but will increase cores number for the future. I still think that option 2 is the way to go for performance and scalability in my case.

Micael June 2, 2016 15:43

How many solver licence do you have? If I understand you will ends up with 2 systems, allowing to run 2 simulations simultaneously. Or do you want to connect them together? When connecting nodes together it is best practice to have those nodes the same configuration, which would not be your case if you connect your old system with a new one. In any case, I agree that E5-2687w v4 should be the most balanced and cost effective system, I'm sure for FLUENT but I guess those other software you mentioned will do similarly.

Blanco June 3, 2016 04:25

Yes Michael, thanks for your feedback.
I will have 3 "standard" licenses available + parallel licenses, therefore I would do 3 multicore simultaneous simulations on single machines, but sometimes I will do also 1 simulation in a cluster configuration using both workstations. I know that an heterogeneous cluster is not the best, but I think the "old" workstation can still speed up the simulation if used in a cluster configuration with the new workstation, am I wrong? Do you have any experience in cluster computing, e.g. if infiniband is REALLY needed when dealing with such a small number of cores? Thanks

Micael June 3, 2016 12:24

With only 2 nodes I don't know what would be the gain with infiniband (IB) vs 10GbE, but may be IB it would not be that expensive if you can connect both nodes directly together, without the switch. Something to verify.

Also, you may be able to upgrade your old system with a V4 CPU and 2400MHz ram so you get 2 nodes quite similar and with higher performance. To be verify as well for your specific current system.

digitalmg July 29, 2016 13:05

Quote:

Originally Posted by bindesboll (Post 601913)
No doubt that solution 2 is the best for CFD - actually the configuration I would buy if I had to buy a new cluster tomorrow.
The trick is to have the right balance between CPU performance and RAM bandwidth, as there is no need for more CPU performance than the RAM bandwidth can handle.
I have a system of nodes 2 x E5-2667 v2 3.3 GHz and 8 x 8 GB DDR3 1867 MHz RAM. At that time this balance between CPU performance and RAM bandwidth.
If you compare CPU performance (CFP2006 Rates - Base, http://www.spec.org/cpu2006/results/rfp2006.html) you will find that the upgrade to RAM frequency 2400 MHz fits the performance of E5-2687W v4. This means that E5-2683 v4 and E5-2697 v4 will be vaste of CPU cores, but E5-2667 v4 will be vaste of RAM bandwidth.

CPU peak frequency is not relevant for CFD, as it is only used when not all cores are loaded.

BR
Kim

Dear Kim
Would you please explain why choosing Xeon E5-2667 V4 is the waste of memory bandwidth ?
In that CPU, maximum allowable memory bandwidth is 76.8 GB/s for 8 cores. then we would have 9.6 GB/s for each core when is fully utilized. That is near the bandwidth of most DDR4 RAM modules in the market.
Please explain more.
Best Regards

Micael July 29, 2016 13:28

Putting aside theorical calculation, I did some benchmarks with FLUENT, which I guess could be similar to Star-ccm+ and I found that the best balanced system was with the "W" xeon. Currently E5-2687W v4, though I didn't benchmark this specific one, only V3 ones. The scaling was quite good (not perfect) from the 4-cores E5-2637 v3 up to the 10-cores E5-2687W v3. That was true for different physics and mesh size. So I should aggree that Xeon E5-2667 V4 is somewhat a waste of memory bandwidth, at least for FLUENT. That being said, a 3-node E5-2637 v4 (24-cores) would still be faster than a dual E5-2687W v4 (24-cores). Given there is a limited amount of HPC licence and application is for industry, then the E5-2637 v4 up to E5-2667 v4 still makes sence.

Yanni August 3, 2016 18:31

Similar decision to make
 
Hi everybody,

I have a similar decision to make. My applications are mainly ANSYS mechanical but also some Fluent simulations.
I just configured a workstation and would like to get your opinion on my selection. My ANSYS license supports 16HPC. I’m especially interested if you have any experience with a Tesla GPU. I heard different opinions about using a GPU or better spending more money on the CPUs :confused:.

- CPU: (2x) Intel Xeon E5-2630 V4 2.2GHz (3.1GHz Turbo) 50MB 20-Cores/40-Threads Total
- Motherboard: SuperMicro Dual-Socket Motherboard - Supports 3 GPUs, 1TB ECC DDR4 RAM, USB 3.0. Intel C612 Chipset
- RAM: 256GB DDR4 Registered ECC RAM - 2133MHz - CL15 - Samsung - Lifetime Warranty 8x32GB
- Video Card: Nvidia Quadro M2000 4GB GDDR5 768 CUDA Cores
- Solid State Drive: Intel 750 400GB NVME SSD - 2,200MB/S
- Hard Drive (HDD): Seagate Barracuda HDD 4TB 5400rpm 64MB Cache
- Tesla GPGPU: Optional K40

Thanks for your help! :)

RobertB August 7, 2016 11:08

You would almost certainly do better with a faster 8 core, 2667, although it is significantly more expensive.

It might be worth looking at a third party cooling solution as if you can keep it cool enough it may spin up all the cores to the turbo speed.

Unless Ansys claim good GPU performance for the physics you need, I would put the money into the CPUs.

Yanni August 7, 2016 15:43

Thank you Robert:)! That sounds reasonable; the 2667 also seems to have a high clock speed. Do you recommend air or water cooling?

RobertB August 7, 2016 16:42

We have always bought clusters so I've never looked at the cooling issue. I assume that if the processor is rated at a certain level then the stock cooler should be able to keep it cool.

Looking at the first processor you had it was an 85W one so I assume it probably couldn't run all its cores at turbo speed. The 2667 is rated at 135W.

Since you will have two of these adequate case ventilation is necessary. From experience you don't want to put the machine under a desk you are sitting at.

Another thing to do, if possible in Ansys, is to lock the core affinity - i.e. lock each thread to a certain cpu core. This helps to speed things up.


All times are GMT -4. The time now is 08:28.