|
[Sponsors] |
Dual cpu workstation VS 2 node cluster single cpu workstation |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
April 27, 2011, 06:53 |
Dual cpu workstation VS 2 node cluster single cpu workstation
|
#1 |
New Member
anonymous
Join Date: Apr 2011
Posts: 8
Rep Power: 15 |
I want to run open foam, ansys cfx and ansys mechanical using linux as operating system. I have to options for the hardware set up. Which of the two will perform the best?
For pre processing and post processing I have another workstation available. The main goal for the new machines is to provide some computational power. The two options I have are; -Option 1 Dual cpu workstation with Xeon X5690. -Option 2 Two workstation with I7 990X or W3690 in a 2 node cluster. For the price I think it does not make a big difference. But how about the performance? Looking at the cpu benchmark (http://www.cpubenchmark.net/high_end_cpus.html) the W3690 seems to be the fastest cpu. But when I use two of these in a 2 node cluster, will this still be faster than when I use dual X5690 in a single workstation? My feeling tells me that a dual cpu would have a more efficient communication between the two cpu. But will there be a significant difference when looking at a real life scenario? |
|
April 27, 2011, 06:57 |
|
#2 |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 516
Rep Power: 20 |
The i7-2600 is probably a lot faster than the xeons. So get 4 of them and just use 2 cores per processor.
|
|
April 28, 2011, 19:29 |
|
#3 | |
New Member
anonymous
Join Date: Apr 2011
Posts: 8
Rep Power: 15 |
Quote:
But still i would like to know what is the best way to go looking at the two option from my first post. My feeling tells me that 2 cpu's on one motherboard with a direct connection between the cpu's is more efficient when the cpu's have to work together. Is my feeling correct? And if there is a difference, would you notice it in a real world case running parallel cfd aplications? |
||
April 29, 2011, 04:18 |
|
#4 |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 516
Rep Power: 20 |
Forget the usual benchmarks. CFD calculations require a lot of memory bandwidth. Even if you use only two cores of an modern intel cpu your speedup is not linear getting worse when using more cores.
And when you use only 2 cores per processor the i7-2600 is the fastest one: http://www.xbitlabs.com/articles/cpu...600k-990x.html I'm not sure about 2 cpus on one board. |
|
May 1, 2011, 11:26 |
|
#5 |
Senior Member
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22 |
It depends much on the connection between the different workstations when using the i7. For communication, the main issue is not the link bandwith but latency. When using a simple ethernet, my feeling would tell me it's slower - but I don't have any measurement values.
|
|
May 2, 2011, 23:57 |
|
#6 |
Senior Member
Martin Hegedus
Join Date: Feb 2011
Posts: 500
Rep Power: 19 |
I agree with Joern Belike, the main bottle neck is the memory bandwidth, especially for unstructured solvers where the memory distribution is randomish. Structured solvers have an advantage. If possible, you should size the cpus to your memory bandwidth. Then decide how many machines you need.
|
|
May 4, 2011, 04:00 |
|
#7 |
Senior Member
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22 |
I agree only partially. You can't compare requirements for a stand-alone machine and a machine which will be part of a small "cluster".
Memory bandwidth is important when running the whole case on a single machine, because the full case needs data from the same memory and processes might be blocking each other while reading or writing. But when running the case in parallel on more than one machine, every machine has to handle only a part of the full model and needs less memory. And much of memory access can be performed at the same time on different machines. Therefore the memory bandwidth becomes less critical with an increasing number of different machines, but latency of the communication interface will become very important. That's how every cluster is build: Infiniband connection to get low latency, but no special memory. |
|
May 4, 2011, 22:58 |
|
#8 |
New Member
VLKOH
Join Date: Mar 2009
Location: Malaysia
Posts: 20
Rep Power: 17 |
I agree with what abdul099 said. You will need a very good communication interface between each nodes.
My dual Xeon E5620 processor run faster than my small cluster (2 pc of AMD X6 1090T), even through my AMD have higher clock speed than my Xeon. I suspect is due to the communication between the 2 PC is not good enough to handle such large bandwidth. I think cluster is effective when you wish to solve a case with very large mesh size where single workstation memory is insufficient to carry out the calculation. If the mesh size for a case is small enough that a single workstation can handle, I still prefer to perform it on single machine. |
|
May 6, 2011, 17:38 |
|
#9 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
lalula2, the speed difference is not because of limited bandwidth between the two machines, it is because of limited bandwidth from the AMD CPU's to their memory. Your Xeon processors are much, much bettor for CFD than your AMD processors simply because they have faster access to the system memory.
Unless your decomposition method is very poor... with just 2 nodes, your simulation is not going to be bottlenecked by gigabit network speeds. |
|
May 11, 2011, 07:24 |
|
#10 |
New Member
anonymous
Join Date: Apr 2011
Posts: 8
Rep Power: 15 |
Thank you all for your replies! It becomes a bit more clear for me.
So I have two different cases with different potential bottlenecks... For the dual CPU single workstation the memory bandwidth is the important factor. For a cluster the memory bandwidth per machine is less important, but the network connection will determine the overall performance. When I look at the two options from my first post, I think I will go for the dual CPU workstation. This is easier to set up and to maintain. Only when I want to scale up the number of CPU’s then the cluster with cheap CPU and memory and good network connection this option can be cheaper for the same performance. |
|
May 22, 2011, 08:17 |
|
#11 |
Senior Member
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22 |
kyle, the amount of data to be exchanged is not very much. Therefore you are right, the network bandwidth is not the bottleneck. But don't forget the poor latency of an ethernet. There is a good reason why nearly all clusters have an infiniband connection between the nodes.
And to be honest, the Xeon E-series cpu's are just crap compared to the X-series. And that is because the E-series does NOT have a fast memory access. I haven't tested it, but just from the cpu architecture, the Phenom X6 should have a faster memory access than a Xeon E (and of course is much slower than a Xeon X). |
|
May 22, 2011, 21:17 |
|
#12 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
abdul,
This thread was strictly about filesystem bandwidth. Of course both memory bandwidth and network bandwidth are extremely important. The highest memory bandwidth per core, as well as lowest memory latency, is a Sandy Bridge i5 or i7 with 2133mhz memory. If you run a dual socket Xeon X series system, you can get a much higher memory bandwidth per system, but that isn't really meaningful. Memory bandwidth per dollar is the much more important number. Check out the CPU benchmarks on http://techreport.com. They run memory bandwidth and latency, as well as a CFD benchmarks for every CPU right after it comes out. |
|
May 31, 2011, 03:41 |
|
#13 |
New Member
VLKOH
Join Date: Mar 2009
Location: Malaysia
Posts: 20
Rep Power: 17 |
abdul,
There isnt much different between X and E series (E5620 above) of Xeon processor. They are both triple channel and in term of price, X series is more expensive. What you may get is faster in clock speed, others are pretty much the same. You said it crap just because it slower 100-200mhz in clock speed?? But have you compared it in term of mhz/dollar? Phenom X6 is only run @ 21 GB/s dual channel memory bandwidth compared to to Xeon E series which is running 25.6 GB/s triple channel memory. |
|
June 3, 2011, 13:37 |
|
#14 |
Senior Member
Join Date: Oct 2009
Location: Germany
Posts: 636
Rep Power: 22 |
lalula,
you're right. I've mixed it up because in our company, all E-series xeon processors are based on the Core architecture while the X-series processors are the newer ones based on the Nehalem architecture. Therefore there is a huge difference due to the integrated memory controller of the Nehalem cpu's and the memory access through a frontside bus of the Core cpu's. As the Phenom is more similar to a Nehalem cpu (AMD had a integrated memory controller much earlier than Intel), the old Core-based Xeons should be easily beaten even by a Phenom X6, like I've written before. Anyway, Mhz per $ doesn't mean all. When I pick 200 Pentium I 120Mhz out of the trash bin, I will get a lot of "Mhz / $" - but it's not fast and no good choice although it's cheap. It depends much from the specific case whether a system can beat another one when comparing not "Mhz / $" but "performance / $". |
|
September 26, 2011, 20:13 |
|
#15 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,183
Rep Power: 23 |
Somewhat off this topic but also interesting. I did some benchmarking with my 4.8GHz overclocked i7-2600K system and a dual Xeon X5675 system both running two cores.
I was running a non-linear buckling analysis. The i7 system ran nearly twice as fast as the dual XEON system. So per core performance of the i7 showed to be much much better than the XEONs. |
|
March 9, 2012, 08:27 |
|
#16 |
Member
Join Date: Mar 2009
Posts: 90
Rep Power: 17 |
This thread has been very helpful so far. Does anyone know the difference between Sandy bridge and sandy bridge-E as far as CFD goes? For example, these two similarly priced CPU's:
http://www.newegg.com/Product/Produc...82E16819115082 http://www.newegg.com/Product/Produc...82E16819117270 Is the first one a 1P CPU and the second intended for 2P system use? |
|
March 21, 2012, 20:16 |
|
#17 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,183
Rep Power: 23 |
The number located at the pound sign: EX-#XXX, states how many cpus can be put on a single board. Sandy bridge are all single socket, Sandy Bridge E have both single and dual socket processors, and 4 in the future.
Sandy-bridge has dual channel memory, Sandy-Bridge-E has quadruple. between those two CPUs the E3 would probably blow away the E5 in most everyday tasks, but the quadruple channel memory might make it a little closer race for CFD. But I would still put my money on the E3 since it has a much higher clock speed. If your going dual socket, then the E5 would be the way to go. There may be more differences but those are the ones I know of. |
|
March 22, 2012, 04:43 |
|
#18 | |
Member
Join Date: Mar 2009
Posts: 90
Rep Power: 17 |
Quote:
|
||
September 2, 2013, 04:09 |
Xeon X5690 has a problem.
|
#19 |
Member
Jinwhan Ryuk
Join Date: Feb 2013
Location: South Korea
Posts: 91
Rep Power: 13 |
ANSYS Workbench is very slow in Xeon X5690 CPU.
|
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
Dual Nodes is Slower Than Single Node (Reposting) | Mrxlazuardin | Hardware | 1 | May 26, 2010 11:25 |
Dual Nodes is Slower Than Single Node | Mrxlazuardin | FLUENT | 0 | May 21, 2010 02:48 |
OpenFOAM 13 Intel quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 13 | February 5, 2008 06:26 |
P4 1.5 or Dual P3 800EB on Gibabyte board | Danial | FLUENT | 4 | September 12, 2001 12:44 |