CFD Online Discussion Forums

CFD Online Discussion Forums (
-   Main CFD Forum (
-   -   CFD clusters + bonded GigE? (

Joe September 17, 2006 14:49

CFD clusters + bonded GigE?
Does anyone here have any experiance with improving CFD cluster performance by linking multiple multi-core boxes with dual bonded GigE links as opposed to the traditional single GigE link approach?

Theoretically bonding two GigE links into a single logical link should ~double throughput if set up correctly ...

Myrinet would be nice but its expensive :)

andy September 18, 2006 04:05

Re: CFD clusters + bonded GigE?
I briefly used channel bonding with fast ethernet and dual Pentium 2 or 3s a few years ago on a small cluster without a switch. It worked fine.

I have not considered using it with our gigabit ethernet cluster because, for our application and our hardware, the performance gain versus cost of adding non blocking switches, cables and ethernet cards is worse than adding more nodes. Your situation may be different.

> Theoretically bonding two GigE links into a single logical link should
: ~double throughput if set up correctly ...

My experience of PC hardware is only to purchase what you have seen demonstrated to work. Far too much that should work properly doesn't. For a cluster I have found getting access to a couple of nodes to check out is usually not a problem.

Joe September 18, 2006 05:47

Re: CFD clusters + bonded GigE?
Much obliged.

I should have been a bit more specific about the CPU hardware: Linking multiple dual-core boxes with single GigE links seems to scale well to 8-16 cores. However I have heard some anecdotal views that linking multiple quad core or octa core boxes (i.e. two dual core chips or two quad core chips) though a single GigE link is problematic. This seems logical given the theoretical doubling / quadrupling of required data throughput??

So I am interested in finding out whether anyone is running something akin to the last mentioned setup. And whether they have found a single GigE link a performance constraint and investigated using dual bonded GigE to relieve this contraint.

TG September 18, 2006 07:10

Re: CFD clusters + bonded GigE?
While its true that you can create additional bandwidth by bonding, you aren't going to change the latency. Both Myrinet and Infiniband offer better bandwidth AND much better latency than GigE networks. Most codes are influenced somewhat by both bandwidth and latency. The other problem you may face is a limit in the ability of your NIC card to feed more bandwidth. Just because you bond more GigE lines together does not mean you can actually feed them at their max capacity through a single NIC.

Joe September 18, 2006 07:46

Re: CFD clusters + bonded GigE?
I use CFX which is less latency sensitive.

The advent of PCIe based NICs has also lessened the problem of actually attaining maximum throughput on a NIC i.e. [RAM - PCIe - NIC] - GigE - [NIC - PCIe - RAM] is much better than the old shared PCI bus limitation.

PCIe gives a NIC dedicated bandwidth.

PS: The big picture behind my questions is that Intel is launching quad core desktop (Kentsfield) and server chips (Clovertown) in the next 2 months which will allow quad core desktop and octa core server boxes.

andy September 18, 2006 09:26

Re: CFD clusters + bonded GigE?
What matters most when putting together a cluster is the type of simulation being performed. An explicit time stepping CFD code tends to want lots of cheap CPUs with cheap interconnects to be cost effective, a heavily implicit, steady state code tends to want a few fast cpus with fast interconnects and semi implicit time stepping codes sit somewhere in between.

The reasoning for the above follows from the relative performance/cost for CPUs and interconnects and what is limiting the performance of the simulation.

> I should have been a bit more specific about the CPU hardware: Linking
: multiple dual-core boxes with single GigE links seems to scale well to 8-16
: cores.

Performing what type of calculation?

For our purposes, a couple of years ago, cheap single processors nodes with on board gigabit ethernet and the fastest main memory buses was the most cost effective and by a long way. Scales to about 64-128 nodes before the interconnect performance makes additional nodes not worthwhile. The only consideration here was performance for minimum cost plus the acceptance that 64-128 nodes was an acceptable upper performance limit.

2/4/8 processors on a board solutions tend to be expensive but have fast interprocessor communication and are good for implicit calculations that do not need more processors than this. Running lots of crossed cables between a pair of similar boxes may also work quite well but you usually cannot scale up such machines because the interconnect performance will cripple you for simulations that need significant amounts of communications.

Joe September 18, 2006 09:48

Re: CFD clusters + bonded GigE?
"Performing what type of calculation?"

8-16 cores on CFX gives pretty linear scale-up with gige interconnects. This ties in with what you are saying given CFX being an implicit code.

I am looking at a relatively small number of total nodes e.g. 16-32. Its really down to whether connecting 3-4 octa-core server boxes or 6-8 quad core desktop boxes is the way to go ...

Joe September 18, 2006 10:16

Re: CFD clusters + bonded GigE? - Correction
I should have been a bit more explicit ('scuse the pun). Basically I have three box choices:

16 Dual core boxes 8 Quad core boxes 4 Octa core boxes

And two interconnect choices:

Single Gige interconnects Dual bonded Gige interconnects

The issue is trying to figure out which would be the best choice. Only one configuration's scaling performance can be directly extrapolated from existing common practice: 16 Dual core boxes + single GigE interconnects.

The probable scaling performance of the other configurations is a mystery (to me at least) ... I was hoping others with more experiance could comment.

andy September 19, 2006 07:58

Re: CFD clusters + bonded GigE? - Correction
Well I will have to pass because my detailed hardware knowledge is about 2-3 years old. I will add that an octa box is likely to work well for jobs using 8 or less processors. If this is how the machine is mainly to be used then it is worth considering.

A few years ago when I last looked at and bought some cluster hardware one benchmark involved a dual processor machine from a possible supplier and we only used one of the two processors on each dual node because using both was slower than using one. This was not an Opteron but some form of dual Xeon if memory serves.

I can only repeat again that benchmarking your code is a wise move. I presume you have talked to CFX who will have loads of stuff on performance using different hardware.

All times are GMT -4. The time now is 13:01.