CFD Online Discussion Forums - Dual cpu workstation VS 2 node cluster single cpu workstation

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - Dual cpu workstation VS 2 node cluster single cpu workstation (https://www.cfd-online.com/Forums/hardware/87708-dual-cpu-workstation-vs-2-node-cluster-single-cpu-workstation.html)

Verdi

April 27, 2011 05:53

Dual cpu workstation VS 2 node cluster single cpu workstation

I want to run open foam, ansys cfx and ansys mechanical using linux as operating system. I have to options for the hardware set up. Which of the two will perform the best?

For pre processing and post processing I have another workstation available. The main goal for the new machines is to provide some computational power.

The two options I have are;
-Option 1
Dual cpu workstation with Xeon X5690.

-Option 2
Two workstation with I7 990X or W3690 in a 2 node cluster.

For the price I think it does not make a big difference. But how about the performance?

Looking at the cpu benchmark (http://www.cpubenchmark.net/high_end_cpus.html) the W3690 seems to be the fastest cpu. But when I use two of these in a 2 node cluster, will this still be faster than when I use dual X5690 in a single workstation? My feeling tells me that a dual cpu would have a more efficient communication between the two cpu. But will there be a significant difference when looking at a real life scenario?

JBeilke

April 27, 2011 05:57

The i7-2600 is probably a lot faster than the xeons. So get 4 of them and just use 2 cores per processor.

Verdi

April 28, 2011 18:29

Quote:

Originally Posted by JBeilke (Post 305254)

The i7-2600 is probably a lot faster than the xeons. So get 4 of them and just use 2 cores per processor.

According to the passmark benchmark the I7 2600 is slower... But the price is also much lower. I think I can get a 6 node cluster with the 2600 for the same price as what i would pay for a dual xeon x5690 workstation. This could be option 3 ??

But still i would like to know what is the best way to go looking at the two option from my first post. My feeling tells me that 2 cpu's on one motherboard with a direct connection between the cpu's is more efficient when the cpu's have to work together. Is my feeling correct? And if there is a difference, would you notice it in a real world case running parallel cfd aplications?

JBeilke

April 29, 2011 03:18

Forget the usual benchmarks. CFD calculations require a lot of memory bandwidth. Even if you use only two cores of an modern intel cpu your speedup is not linear getting worse when using more cores.

And when you use only 2 cores per processor the i7-2600 is the fastest one:
http://www.xbitlabs.com/articles/cpu...600k-990x.html

I'm not sure about 2 cpus on one board.

abdul099

May 1, 2011 10:26

It depends much on the connection between the different workstations when using the i7. For communication, the main issue is not the link bandwith but latency. When using a simple ethernet, my feeling would tell me it's slower - but I don't have any measurement values.

Martin Hegedus

May 2, 2011 22:57

I agree with Joern Belike, the main bottle neck is the memory bandwidth, especially for unstructured solvers where the memory distribution is randomish. Structured solvers have an advantage. If possible, you should size the cpus to your memory bandwidth. Then decide how many machines you need.

abdul099

May 4, 2011 03:00

I agree only partially. You can't compare requirements for a stand-alone machine and a machine which will be part of a small "cluster".

Memory bandwidth is important when running the whole case on a single machine, because the full case needs data from the same memory and processes might be blocking each other while reading or writing.

But when running the case in parallel on more than one machine, every machine has to handle only a part of the full model and needs less memory. And much of memory access can be performed at the same time on different machines. Therefore the memory bandwidth becomes less critical with an increasing number of different machines, but latency of the communication interface will become very important. That's how every cluster is build: Infiniband connection to get low latency, but no special memory.

lalula2

May 4, 2011 21:58

I agree with what abdul099 said. You will need a very good communication interface between each nodes.
My dual Xeon E5620 processor run faster than my small cluster (2 pc of AMD X6 1090T), even through my AMD have higher clock speed than my Xeon. I suspect is due to the communication between the 2 PC is not good enough to handle such large bandwidth.
I think cluster is effective when you wish to solve a case with very large mesh size where single workstation memory is insufficient to carry out the calculation. If the mesh size for a case is small enough that a single workstation can handle, I still prefer to perform it on single machine.

kyle	May 6, 2011 16:38

lalula2, the speed difference is not because of limited bandwidth between the two machines, it is because of limited bandwidth from the AMD CPU's to their memory. Your Xeon processors are much, much bettor for CFD than your AMD processors simply because they have faster access to the system memory.

Unless your decomposition method is very poor... with just 2 nodes, your simulation is not going to be bottlenecked by gigabit network speeds.

Verdi

May 11, 2011 06:24

Thank you all for your replies! It becomes a bit more clear for me.

So I have two different cases with different potential bottlenecks... For the dual CPU single workstation the memory bandwidth is the important factor.
For a cluster the memory bandwidth per machine is less important, but the network connection will determine the overall performance.

When I look at the two options from my first post, I think I will go for the dual CPU workstation. This is easier to set up and to maintain.
Only when I want to scale up the number of CPU’s then the cluster with cheap CPU and memory and good network connection this option can be cheaper for the same performance.

abdul099

May 22, 2011 07:17

kyle, the amount of data to be exchanged is not very much. Therefore you are right, the network bandwidth is not the bottleneck. But don't forget the poor latency of an ethernet. There is a good reason why nearly all clusters have an infiniband connection between the nodes.

And to be honest, the Xeon E-series cpu's are just crap compared to the X-series. And that is because the E-series does NOT have a fast memory access. I haven't tested it, but just from the cpu architecture, the Phenom X6 should have a faster memory access than a Xeon E (and of course is much slower than a Xeon X).

kyle	May 22, 2011 20:17

abdul,

This thread was strictly about filesystem bandwidth. Of course both memory bandwidth and network bandwidth are extremely important. The highest memory bandwidth per core, as well as lowest memory latency, is a Sandy Bridge i5 or i7 with 2133mhz memory. If you run a dual socket Xeon X series system, you can get a much higher memory bandwidth per system, but that isn't really meaningful. Memory bandwidth per dollar is the much more important number.

Check out the CPU benchmarks on http://techreport.com. They run memory bandwidth and latency, as well as a CFD benchmarks for every CPU right after it comes out.

lalula2

May 31, 2011 02:41

abdul,

There isnt much different between X and E series (E5620 above) of Xeon processor. They are both triple channel and in term of price, X series is more expensive. What you may get is faster in clock speed, others are pretty much the same. You said it crap just because it slower 100-200mhz in clock speed?? But have you compared it in term of mhz/dollar?

Phenom X6 is only run @ 21 GB/s dual channel memory bandwidth compared to to Xeon E series which is running 25.6 GB/s triple channel memory.

abdul099

June 3, 2011 12:37

lalula,

you're right. I've mixed it up because in our company, all E-series xeon processors are based on the Core architecture while the X-series processors are the newer ones based on the Nehalem architecture. Therefore there is a huge difference due to the integrated memory controller of the Nehalem cpu's and the memory access through a frontside bus of the Core cpu's.
As the Phenom is more similar to a Nehalem cpu (AMD had a integrated memory controller much earlier than Intel), the old Core-based Xeons should be easily beaten even by a Phenom X6, like I've written before.

Anyway, Mhz per $ doesn't mean all. When I pick 200 Pentium I 120Mhz out of the trash bin, I will get a lot of "Mhz / $" - but it's not fast and no good choice although it's cheap.
It depends much from the specific case whether a system can beat another one when comparing not "Mhz / $" but "performance / $".

evcelica

September 26, 2011 19:13

Somewhat off this topic but also interesting. I did some benchmarking with my 4.8GHz overclocked i7-2600K system and a dual Xeon X5675 system both running two cores.
I was running a non-linear buckling analysis.
The i7 system ran nearly twice as fast as the dual XEON system. So per core performance of the i7 showed to be much much better than the XEONs.

aerogt3

March 9, 2012 07:27

This thread has been very helpful so far. Does anyone know the difference between Sandy bridge and sandy bridge-E as far as CFD goes? For example, these two similarly priced CPU's:

http://www.newegg.com/Product/Produc...82E16819115082

http://www.newegg.com/Product/Produc...82E16819117270

Is the first one a 1P CPU and the second intended for 2P system use?

evcelica

March 21, 2012 19:16

The number located at the pound sign: EX-#XXX, states how many cpus can be put on a single board. Sandy bridge are all single socket, Sandy Bridge E have both single and dual socket processors, and 4 in the future.
Sandy-bridge has dual channel memory, Sandy-Bridge-E has quadruple.
between those two CPUs the E3 would probably blow away the E5 in most everyday tasks, but the quadruple channel memory might make it a little closer race for CFD. But I would still put my money on the E3 since it has a much higher clock speed.
If your going dual socket, then the E5 would be the way to go.

There may be more differences but those are the ones I know of.

aerogt3

March 22, 2012 03:43

Quote:

Originally Posted by evcelica (Post 350771)

Great info! I need a dual socket processor, so you've settled it for me. Thanks a bunch! :D

Whitebear

September 2, 2013 03:09

Xeon X5690 has a problem.

ANSYS Workbench is very slow in Xeon X5690 CPU.

All times are GMT -4. The time now is 22:38.