CFD Online Discussion Forums

CFD Online Discussion Forums (
-   Main CFD Forum (
-   -   Xeon and Opteron architecture differences (

johndeas November 28, 2008 13:07

Xeon and Opteron architecture differences

at my laboratory, we are currently trying to figure out what would be the best configuration for a cluster that will be bought very soon (propositions have come from assemblers, and a choice has to be made). Having only a limited understanding on architectural issues (and seeing various contradicting opinions and benchmarks), I would like some information to help clarify this, and eventually know what you recommend for a cluster. The biggest constraint is that it has to run linux on a x86 architecture (due to the codes we are planning to deploy on it, namely starCCM+ and OpenFOAM), which tie down the choice to Xeon and Opteron. Here are some assertions on my current understanding of underlying problems, which are likely to be wrong, tell me if so. CPU have become much more performant than memory access, which is putting a lot of pressure on the FSB paradigm, created times ago, when memory access were comparatively faster. The CPU cache has been developped to limit the access to the memory, but, due to the large amount of data needed to solve Navier-Stokes equations on a large domain, cache performance become less important as it needs to be frequently filled with data from the RAM. Hypertransport from AMD is a solution, has it removes the FSB and allow the Opteron CPU to be connected directly to memory. However, recent Xeon provide several FSB (one per core) which also cirvumvent the problem of saturation of a single FSB, and might explain why Xeon equip a large percentage of clusters, despite its use of "old" FSB technology. Xeon equipped with QuickPath technology will be available soon in 2009, so they are not a choice for a cluster bought now.

What is your say on this ?

Charles November 29, 2008 00:06

Re: Xeon and Opteron architecture differences
The best option would certainly be to run some benchmarks using your codes on the candidate systems, and not to rely entirely on published information. It is "common" knowledge that the Intel CPU's of the last two years have a big performance advantage over the equivalent AMD CPU's. However, when you start looking at benchmarks of parallel processing floating point data, the advantage is not necessarily so obvious, as per your description. There are big advantages to the AMD system architecture, and price certainly comes into it as well. The SPECfpRATE benchmarks indicate that Opteron becomes competitive when using quad core CPU's on dual socket systems, and dominant on four socket systems.

So there are two options really - one is to generate your own relevant benchmark data that you can trust, and the other is to accept that you cannot go far wrong with either, and to rather focus on things like interconnect, delivery lead times, service from the supplier, etc.

agg December 9, 2008 20:21

Re: Xeon and Opteron architecture differences
I was at supercomputing 2008 at Austin Tx. There Intel was running starCCM on Xeon as well as Nehalem (the one to be released in 2009). Nehalem run was 2.5x faster than on the Xeon. However as you rightly said, Nehalems are currently not available as servers. Moreover Nehalems require DDR3 memory which are more expensive than DDR2.

The AMD booth had presentations comparing their Shanghai processor with the current Intel Xeons and they were up to 35% faster than the Xeon for CFD applications. So if you want to buy a cluster now, you should go for AMD Opteron.

vadim December 10, 2008 06:10

Re: Xeon and Opteron architecture differences
just a comment:

actually giving a sharp answer is difficult, theoritically cache is very important (see paper in IJHPCA published on this topic) but in paractice it is largely depend on how your code use from cache, in fact cache-aware algorithm will use full power of large cache

also i would like to draw your attention to GPU processing too. processing by graghic card. do not think that it is just used for real time rendering and computer graghic, but consider that recently there is an attend to use it for high performance computing, power of a good GPU processor easily compete with a small shared memory multiprocessor *if* your code be parallel in nature (you can decouple operations).

you like to use available codes and they may not support computation on GPU but you may enquiry about developpers. it is easy to switch to using GPU in particular for parallel codes

James December 10, 2008 09:05

Re: Xeon and Opteron architecture differences
Obviously it is not only processor speed. Network speed is obviously also important. However fast interconnects (e.g. infiband) are expensive so it should be your work load that determines whether more nodes or a faster network is more effective at getting more work done for the money.

Also the models you run make a lot of difference, a decent size cluster can eat 10-15 million steady state cell models, even with bonded gigE. Big unsteady jobs would gain more from more efficient parallelization.

One apparently little publicized feature of CCM+ (and some 3.26/4.x versions of STAR) is that because it is based on HPMPI you can lock the affinity of the threads to CPUs (-cpu_bind option).

When you have 4 threads on say a dual core dual processor node this can have huge benefits. We doubled the throughput on a 16 way (4x4) job even with a bonded gigE backbone by locking affinity. You may not always do this well but since it is free it's the best for the buck there is. I would imagine any other code that used HPMPI could be similarly helped.

All times are GMT -4. The time now is 21:50.