CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Dual cpu workstation VS 2 node cluster single cpu workstation

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 27, 2011, 06:53
Default Dual cpu workstation VS 2 node cluster single cpu workstation
  #1
New Member
 
anonymous
Join Date: Apr 2011
Posts: 8
Rep Power: 15
Verdi is on a distinguished road
I want to run open foam, ansys cfx and ansys mechanical using linux as operating system. I have to options for the hardware set up. Which of the two will perform the best?

For pre processing and post processing I have another workstation available. The main goal for the new machines is to provide some computational power.

The two options I have are;
-Option 1
Dual cpu workstation with Xeon X5690.

-Option 2
Two workstation with I7 990X or W3690 in a 2 node cluster.

For the price I think it does not make a big difference. But how about the performance?

Looking at the cpu benchmark (http://www.cpubenchmark.net/high_end_cpus.html) the W3690 seems to be the fastest cpu. But when I use two of these in a 2 node cluster, will this still be faster than when I use dual X5690 in a single workstation? My feeling tells me that a dual cpu would have a more efficient communication between the two cpu. But will there be a significant difference when looking at a real life scenario?
Verdi is offline   Reply With Quote

Old   April 27, 2011, 06:57
Default
  #2
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 516
Rep Power: 20
JBeilke is on a distinguished road
The i7-2600 is probably a lot faster than the xeons. So get 4 of them and just use 2 cores per processor.
JBeilke is offline   Reply With Quote

Old   April 28, 2011, 19:29
Default
  #3
New Member
 
anonymous
Join Date: Apr 2011
Posts: 8
Rep Power: 15
Verdi is on a distinguished road
Quote:
Originally Posted by JBeilke View Post
The i7-2600 is probably a lot faster than the xeons. So get 4 of them and just use 2 cores per processor.
According to the passmark benchmark the I7 2600 is slower... But the price is also much lower. I think I can get a 6 node cluster with the 2600 for the same price as what i would pay for a dual xeon x5690 workstation. This could be option 3 ??

But still i would like to know what is the best way to go looking at the two option from my first post. My feeling tells me that 2 cpu's on one motherboard with a direct connection between the cpu's is more efficient when the cpu's have to work together. Is my feeling correct? And if there is a difference, would you notice it in a real world case running parallel cfd aplications?
Verdi is offline   Reply With Quote

Old   April 29, 2011, 04:18
Default
  #4
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 516
Rep Power: 20
JBeilke is on a distinguished road
Forget the usual benchmarks. CFD calculations require a lot of memory bandwidth. Even if you use only two cores of an modern intel cpu your speedup is not linear getting worse when using more cores.

And when you use only 2 cores per processor the i7-2600 is the fastest one:
http://www.xbitlabs.com/articles/cpu...600k-990x.html

I'm not sure about 2 cpus on one board.
JBeilke is offline   Reply With Quote

Old   May 1, 2011, 11:26
Default
  #5
Senior Member
 
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22
abdul099 is on a distinguished road
It depends much on the connection between the different workstations when using the i7. For communication, the main issue is not the link bandwith but latency. When using a simple ethernet, my feeling would tell me it's slower - but I don't have any measurement values.
abdul099 is offline   Reply With Quote

Old   May 2, 2011, 23:57
Default
  #6
Senior Member
 
Martin Hegedus
Join Date: Feb 2011
Posts: 500
Rep Power: 19
Martin Hegedus is on a distinguished road
I agree with Joern Belike, the main bottle neck is the memory bandwidth, especially for unstructured solvers where the memory distribution is randomish. Structured solvers have an advantage. If possible, you should size the cpus to your memory bandwidth. Then decide how many machines you need.
Martin Hegedus is offline   Reply With Quote

Old   May 4, 2011, 04:00
Default
  #7
Senior Member
 
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22
abdul099 is on a distinguished road
I agree only partially. You can't compare requirements for a stand-alone machine and a machine which will be part of a small "cluster".

Memory bandwidth is important when running the whole case on a single machine, because the full case needs data from the same memory and processes might be blocking each other while reading or writing.

But when running the case in parallel on more than one machine, every machine has to handle only a part of the full model and needs less memory. And much of memory access can be performed at the same time on different machines. Therefore the memory bandwidth becomes less critical with an increasing number of different machines, but latency of the communication interface will become very important. That's how every cluster is build: Infiniband connection to get low latency, but no special memory.
abdul099 is offline   Reply With Quote

Old   May 4, 2011, 22:58
Default
  #8
New Member
 
VLKOH
Join Date: Mar 2009
Location: Malaysia
Posts: 20
Rep Power: 17
lalula2 is on a distinguished road
I agree with what abdul099 said. You will need a very good communication interface between each nodes.
My dual Xeon E5620 processor run faster than my small cluster (2 pc of AMD X6 1090T), even through my AMD have higher clock speed than my Xeon. I suspect is due to the communication between the 2 PC is not good enough to handle such large bandwidth.
I think cluster is effective when you wish to solve a case with very large mesh size where single workstation memory is insufficient to carry out the calculation. If the mesh size for a case is small enough that a single workstation can handle, I still prefer to perform it on single machine.
lalula2 is offline   Reply With Quote

Old   May 6, 2011, 17:38
Default
  #9
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18
kyle is on a distinguished road
lalula2, the speed difference is not because of limited bandwidth between the two machines, it is because of limited bandwidth from the AMD CPU's to their memory. Your Xeon processors are much, much bettor for CFD than your AMD processors simply because they have faster access to the system memory.

Unless your decomposition method is very poor... with just 2 nodes, your simulation is not going to be bottlenecked by gigabit network speeds.
kyle is offline   Reply With Quote

Old   May 11, 2011, 07:24
Default
  #10
New Member
 
anonymous
Join Date: Apr 2011
Posts: 8
Rep Power: 15
Verdi is on a distinguished road
Thank you all for your replies! It becomes a bit more clear for me.

So I have two different cases with different potential bottlenecks... For the dual CPU single workstation the memory bandwidth is the important factor.
For a cluster the memory bandwidth per machine is less important, but the network connection will determine the overall performance.

When I look at the two options from my first post, I think I will go for the dual CPU workstation. This is easier to set up and to maintain.
Only when I want to scale up the number of CPU’s then the cluster with cheap CPU and memory and good network connection this option can be cheaper for the same performance.
Verdi is offline   Reply With Quote

Old   May 22, 2011, 08:17
Default
  #11
Senior Member
 
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22
abdul099 is on a distinguished road
kyle, the amount of data to be exchanged is not very much. Therefore you are right, the network bandwidth is not the bottleneck. But don't forget the poor latency of an ethernet. There is a good reason why nearly all clusters have an infiniband connection between the nodes.

And to be honest, the Xeon E-series cpu's are just crap compared to the X-series. And that is because the E-series does NOT have a fast memory access. I haven't tested it, but just from the cpu architecture, the Phenom X6 should have a faster memory access than a Xeon E (and of course is much slower than a Xeon X).
abdul099 is offline   Reply With Quote

Old   May 22, 2011, 21:17
Default
  #12
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18
kyle is on a distinguished road
abdul,

This thread was strictly about filesystem bandwidth. Of course both memory bandwidth and network bandwidth are extremely important. The highest memory bandwidth per core, as well as lowest memory latency, is a Sandy Bridge i5 or i7 with 2133mhz memory. If you run a dual socket Xeon X series system, you can get a much higher memory bandwidth per system, but that isn't really meaningful. Memory bandwidth per dollar is the much more important number.

Check out the CPU benchmarks on http://techreport.com. They run memory bandwidth and latency, as well as a CFD benchmarks for every CPU right after it comes out.
kyle is offline   Reply With Quote

Old   May 31, 2011, 03:41
Default
  #13
New Member
 
VLKOH
Join Date: Mar 2009
Location: Malaysia
Posts: 20
Rep Power: 17
lalula2 is on a distinguished road
abdul,

There isnt much different between X and E series (E5620 above) of Xeon processor. They are both triple channel and in term of price, X series is more expensive. What you may get is faster in clock speed, others are pretty much the same. You said it crap just because it slower 100-200mhz in clock speed?? But have you compared it in term of mhz/dollar?

Phenom X6 is only run @ 21 GB/s dual channel memory bandwidth compared to to Xeon E series which is running 25.6 GB/s triple channel memory.
lalula2 is offline   Reply With Quote

Old   June 3, 2011, 13:37
Default
  #14
Senior Member
 
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22
abdul099 is on a distinguished road
lalula,

you're right. I've mixed it up because in our company, all E-series xeon processors are based on the Core architecture while the X-series processors are the newer ones based on the Nehalem architecture. Therefore there is a huge difference due to the integrated memory controller of the Nehalem cpu's and the memory access through a frontside bus of the Core cpu's.
As the Phenom is more similar to a Nehalem cpu (AMD had a integrated memory controller much earlier than Intel), the old Core-based Xeons should be easily beaten even by a Phenom X6, like I've written before.

Anyway, Mhz per $ doesn't mean all. When I pick 200 Pentium I 120Mhz out of the trash bin, I will get a lot of "Mhz / $" - but it's not fast and no good choice although it's cheap.
It depends much from the specific case whether a system can beat another one when comparing not "Mhz / $" but "performance / $".
abdul099 is offline   Reply With Quote

Old   September 26, 2011, 20:13
Default
  #15
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,183
Rep Power: 23
evcelica is on a distinguished road
Somewhat off this topic but also interesting. I did some benchmarking with my 4.8GHz overclocked i7-2600K system and a dual Xeon X5675 system both running two cores.
I was running a non-linear buckling analysis.
The i7 system ran nearly twice as fast as the dual XEON system. So per core performance of the i7 showed to be much much better than the XEONs.
evcelica is offline   Reply With Quote

Old   March 9, 2012, 08:27
Default
  #16
Member
 
Join Date: Mar 2009
Posts: 90
Rep Power: 17
aerogt3 is on a distinguished road
This thread has been very helpful so far. Does anyone know the difference between Sandy bridge and sandy bridge-E as far as CFD goes? For example, these two similarly priced CPU's:

http://www.newegg.com/Product/Produc...82E16819115082

http://www.newegg.com/Product/Produc...82E16819117270

Is the first one a 1P CPU and the second intended for 2P system use?
aerogt3 is offline   Reply With Quote

Old   March 21, 2012, 20:16
Default
  #17
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,183
Rep Power: 23
evcelica is on a distinguished road
The number located at the pound sign: EX-#XXX, states how many cpus can be put on a single board. Sandy bridge are all single socket, Sandy Bridge E have both single and dual socket processors, and 4 in the future.
Sandy-bridge has dual channel memory, Sandy-Bridge-E has quadruple.
between those two CPUs the E3 would probably blow away the E5 in most everyday tasks, but the quadruple channel memory might make it a little closer race for CFD. But I would still put my money on the E3 since it has a much higher clock speed.
If your going dual socket, then the E5 would be the way to go.

There may be more differences but those are the ones I know of.
evcelica is offline   Reply With Quote

Old   March 22, 2012, 04:43
Default
  #18
Member
 
Join Date: Mar 2009
Posts: 90
Rep Power: 17
aerogt3 is on a distinguished road
Quote:
Originally Posted by evcelica View Post
The number located at the pound sign: EX-#XXX, states how many cpus can be put on a single board. Sandy bridge are all single socket, Sandy Bridge E have both single and dual socket processors, and 4 in the future.
Sandy-bridge has dual channel memory, Sandy-Bridge-E has quadruple.
between those two CPUs the E3 would probably blow away the E5 in most everyday tasks, but the quadruple channel memory might make it a little closer race for CFD. But I would still put my money on the E3 since it has a much higher clock speed.
If your going dual socket, then the E5 would be the way to go.

There may be more differences but those are the ones I know of.
Great info! I need a dual socket processor, so you've settled it for me. Thanks a bunch!
aerogt3 is offline   Reply With Quote

Old   September 2, 2013, 04:09
Default Xeon X5690 has a problem.
  #19
Member
 
Jinwhan Ryuk
Join Date: Feb 2013
Location: South Korea
Posts: 91
Rep Power: 13
Whitebear is on a distinguished road
ANSYS Workbench is very slow in Xeon X5690 CPU.
Whitebear is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Dual Nodes is Slower Than Single Node (Reposting) Mrxlazuardin Hardware 1 May 26, 2010 11:25
Dual Nodes is Slower Than Single Node Mrxlazuardin FLUENT 0 May 21, 2010 02:48
OpenFOAM 13 Intel quadcore parallel results msrinath80 OpenFOAM Running, Solving & CFD 13 February 5, 2008 06:26
P4 1.5 or Dual P3 800EB on Gibabyte board Danial FLUENT 4 September 12, 2001 12:44


All times are GMT -4. The time now is 08:26.