CFD Online Discussion Forums

CFD Online Discussion Forums (
-   Main CFD Forum (
-   -   Maximum number of nodes in cluster (

Marat Hoshim April 3, 2001 10:27

Maximum number of nodes in cluster

I'd like to know what maximum number of nodes do you use in your clusters for your cfd-jobs.

Does someone has run jobs on 32 nodes ?



Jonas Larsson April 3, 2001 11:02

Re: Maximum number of nodes in cluster
We're just now installing a new cluster with 100 nodes. To get good scaling on all 100 nodes you need a very large case though.

Marat Hoshim April 3, 2001 11:24

Re: Maximum number of nodes in cluster
Hi Jonas,

what network technology is required for 100 nodes ? I guess that 100 Mbit/s might be to slow if all 100 nodes do communicate with eachother ?

What's the reason, why small jobs do scale that well on many nodes ? Is it because the ratio of volume to surface is smaller ?



Jonas Larsson April 3, 2001 11:34

Re: Maximum number of nodes in cluster
We use std. 100 mbit Fast Ethernet. We considered using a faster network with lower latency (myrinet or equivalent), but the extra cost for this couldn't be justified. However, if you want to be able to use all 100 nodes for medium-size jobs (a million cells or so) then you will have to use something faster than fast ethernet. This isn't very critical for us though since we will only need to use all nodes for extremely large case. Most of the time we will use around 10 to 20 nodes for one case I guess.

The poor scaling for small jobs is because the sub-domains placed on each node becomes too small (high surface/volume ratio) and this gives a lot of overhead communication. For some parellelisation models it might also make the numerics more problematic.

Marat Hoshim April 4, 2001 03:46

Re: Maximum number of nodes in cluster
Hi Jonas,

will you purchase AMD or Pentium processors ? Have you compared the two processors with a real cfd job ? I'm not quite sure if the SPEC benchmarks are representative for real cfd jobs! (If that would be the case, AMD should be much faster !)



Jonas Larsson April 4, 2001 09:00

Re: Maximum number of nodes in cluster
We bought PIII 1GHz nodes this time. We got a very good deal on these though - if we would buy for "street prices" the AMD is probably more attractive, especially if you can get one of the new boards with DDR memory. We haven't benchmarked any AMD yet.

Some of the spec tests are quite represantative for CFD I think. But as always you have to be very critical of these numbers - you will, for example, never come close to the performance that spec-numbers given for the P4 since these numbers were produced by intel with special in-house compilers which optimize code for the P4.

There has been a lost of discussion about AMD/P3/P4 on these forums over the last year - if you use the main search tool (top right corner) and search for "AMD" you should get a lot of interesting opinions.

George Bergantz April 5, 2001 13:17

Re: Maximum number of nodes in cluster
The mobo's with the AMD 761 (or 760) chp set that support DDR ram are not widely available (if at all) and my reading of the Linux forums is that there is no stable version yet for this hardware configuration. The speed-up with DDR ram is, under *optimal* circumstances maybe 10% (see discussions on Tom's Hardware page). While a terrific hardware package it is rather new and one could risk spending a lot of time screwing around trying to get the thing to be stable.

I suggest going with the mature PIII systems as Jonas describes above. He is right that the SPEC-mark's are only guidelines, but the are useful is showing relative performance between platforms, not absolute expected fpu through-put.

Heck the real issues is teh degree to which you can write parallel code, compiler, and bandwidth/latency issues. Those are usually the big hang-up's.

Charles Crosby April 9, 2001 17:11

Re: Maximum number of nodes in cluster
Looking at some of the floating point benchmarks that have been run on DDR-equipped systems, it appears that the advantage to be gained from DDR is relatively small. However, the parameter that seems to be very significant to CFD work (or any floating point work on largish data sets) is the bus speed. I have experimented with the Linpack benchmark program, and the results give you more or less the same picture as real CFD analysis results (although we have encountered a Windows / Linux performance anomaly, most likely attributable to incorrect compiler settings when the Windows executable was compiled ...)

Real-world sized data sets are much larger than the AMD or Pentium cache memories, and as a result, performance is largely determined by how fast data can be fed to the CPU, through the frontside bus. 133 MHz is much better than 100, which is much better than 66. CPU clock-speed becomes less important, e.g. an 800 MHz CPU does CFD work only about 25% faster than a 400 MHz CPU, if both operate at the same FSB frequency. The AMD's have an advantage here, with their double rate FSB (100 MHz effectively gives you almost 200 MHz), even though the memory bus may only be running at 100 MHz. The VIA KT133A chip set has proved to be astonishingly effective with the AMD processor, because it supports effectively 266 MHz data transfer to the CPU, even when using only ordinary PC 133 SDRAM (see FWIW, the much-maligned Pentium 4 is a very good option to consider, the (effectively) 400 MHz FSB giving it outstanding performance when doing floating point work with big data sets. It does come at a price though ... P4 aside, the other good option must be the AMD Athlon C processors, which use the (effectively) 266 MHz bus speed. (Actual FSB is still only 133 MHz, but data are transferred on the rise and fall of the clock)

All times are GMT -4. The time now is 21:45.