CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Hardware

128-core cluster for Ansys Fluent: E5-2697A vs E5-2667

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Display Modes
Old   November 30, 2017, 10:30
Default 128-core cluster for Ansys Fluent: E5-2697A vs E5-2667
  #1
New Member
 
Join Date: Nov 2017
Posts: 4
Rep Power: 2
mihei is on a distinguished road
Hello dear fellows,

I'm a member of aerodynamic department in an engineering company. We perform CFD calculations using Ansys Fluent. Typically, our computational models are about 10-40 million cells, but sometimes are up to 80 mln. Nowadays we are looking for a new server to speed up our calculations using up to 128 cores (so, 3 HPC packs).
I have chosen the following configuration for our request:
CPU: 4 nodes with 2 Xeon E5-2697A v4 (16 cores 2.6 GHz each)
RAM: 4 x 32 GB per node, 512 GB total
SSD 480 GB on each node + HDD 2 TB
Interconnection: 56 Gb/s InfiniBand
Price: around 50k USD

But then I have found a topic on this forum with a similar problem. 128 core cluster E5-26xx V4 processor choice for Ansys FLUENT
The final choice was an 8 node cluster with E5-2667 CPU (8 cores 3.2 GHz each). As far as I understood the choice is based on a fact that in this case 1 core uses more memory lanes than in 16-cores 2697A. The price of an 8-node cluster with E5-2667 is around 68k USD, so 36% more expensive.
So, Iím wondering if the speed up of the 8 core configuration really worth these extra money? Is there any test results or maybe someone has his own experience in using different clusters?

Any help would be appreciated indeed. Thank you in advance!
Kind regards,
Mike.
mihei is offline   Reply With Quote

Old   November 30, 2017, 12:02
Default
  #2
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,621
Rep Power: 26
flotus1 will become famous soon enoughflotus1 will become famous soon enough
No need to spend extra money, at least not for "outdated" CPUs. Intel already released their new "Skylake-SP" processors and they should be available from all vendors.
Their main advantage for CFD compared to "Broadwell" predecessors: 6 instead of 4 memory channels with support for DDR4-2666 instead of DDR4-2400. And the price premium for quad-socket processors was lowered. All Xeon gold 6xxx processors have 3 UPI links which makes them suitable for quad-socket setups.

The current "high-end" recommendation for an Intel-based 128-core cluster would be:
Four nodes with 4x Intel Xeon Gold 6144 (8 cores)

Cheaper (slower) solutions are always possible. E.g.
Xeon Gold 6134 (8 cores) instead of 6144.
Three nodes with 4x 12-core processors
Two nodes with 4x 16-core processors
Four nodes with 2x 16-core processors...

You get the picture. Which one is best for you depends on your budget and the prices your hardware vendor quotes. But to make my point once more: Do not buy Xeon v4 processors any more.

Btw: Ansys publishes benchmark data for various test cases and hardware: http://www.ansys.com/solutions/solut...ent-benchmarks
Blanco likes this.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   December 1, 2017, 07:54
Default
  #3
New Member
 
Join Date: Nov 2017
Posts: 4
Rep Power: 2
mihei is on a distinguished road
flotus1, thank you for your reply.

Unfortunately, in my region (I live in Russia) hardware suppliers do not yet sell servers with 4x Xeon Gold CPU. So, I think that I will look for a 2x Xeon Gold 6144 servers, it looks like a good scalable solution in my case.
mihei is offline   Reply With Quote

Old   December 1, 2017, 08:26
Default
  #4
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,621
Rep Power: 26
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Since you have a decent Infiniband interconnect dual-socket nodes will work just fine. Yet in this case you could settle for the cheaper Xeon gold 5xxx processors since you don't need that many UPI links. The only downside is that these CPUs also have lower clock speeds.
Make sure to get a correct memory configuration for those Skylake-SP processors, so 6 DIMMs per CPU. Apparently, not all vendors recommend this by default.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   December 1, 2017, 09:20
Default
  #5
New Member
 
Join Date: Nov 2017
Posts: 4
Rep Power: 2
mihei is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Yet in this case you could settle for the cheaper Xeon gold 5xxx processors since you don't need that many UPI links. The only downside is that these CPUs also have lower clock speeds.
Xeon Gold 5xxx series has significantly lower frequency. Maybe, the 6134 model (8 cores, 3.2 GHz) worth to be considered, because 6144 has only extra 0.3 GHz while significantly more expensive (800$ per CPU).

Quote:
Originally Posted by flotus1 View Post
Make sure to get a correct memory configuration for those Skylake-SP processors, so 6 DIMMs per CPU. Apparently, not all vendors recommend this by default.
Concerning the memory configuration, as far as I understood, itís better to have as many memory controllers as possible, to increase total memory bandwidth. For example, with two Xeon Gold 61xx I need to buy 24 DDR4-2666 controllers (per 8 GB, 192 GB per node would be sufficient, I guess). Please, correct me if Iím wrong.
mihei is offline   Reply With Quote

Old   December 1, 2017, 09:47
Default
  #6
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,621
Rep Power: 26
flotus1 will become famous soon enoughflotus1 will become famous soon enough
You are partly correct: for CFD you want as many memory controllers - memory channels to be more precise. But a single Skylake-SP CPU only has one memory controller with 6 channels. Populating more than one DIMM per channel is useless. In fact, this might even lower the maximum supported memory speed and thus bandwidth compared to one DIMM per channel. Depends on the type of memory and motherboard you use.
So for a dual-socket node, you want 12 DIMMs in total. Preferably dual-rank RDIMMs if possible.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   December 1, 2017, 10:59
Default
  #7
New Member
 
Join Date: Nov 2017
Posts: 4
Rep Power: 2
mihei is on a distinguished road
flotus1, Thank you! It's much clearer after your explanation.
mihei is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
looking for a smart interface matlab fluent chary FLUENT 20 November 6, 2017 10:12
128 core cluster E5-26xx V4 processor choice for Ansys FLUENT F1aerofan Hardware 28 October 27, 2017 11:18
Problem in using parallel process in fluent 14 Tleja FLUENT 3 September 13, 2013 10:54
problem in using parallel process in fluent 14 aydinkabir88 FLUENT 1 July 10, 2013 02:00
Fluent on a Windows cluster Erwin FLUENT 4 October 22, 2002 11:39


All times are GMT -4. The time now is 18:30.