CFD Online Discussion Forums

CFD Online Discussion Forums (
-   Hardware (
-   -   Fluent parallel computing on multiple Workstations slower than one (

peterhess April 28, 2011 08:42

Fluent parallel computing on multiple Workstations slower than one
I am not sure if this question is already asked here.
Anyway, I did not found it in my quick search.

Slower parallel computing on multiple workstations than one alone.

- 3 identical Workstations
- On every one: 2 x X5680 (2 x 6 cores)
- On every one: 24 GB RAM
- Windows XP 64bit

When I solve a problem on one WS, then I am faster than on two or three.

2e7 elements
One WS for 1000 Iteration (1 x 12 cores) about 4 hours. All cores up to 100 %
Two WS for the same (2 x 12 cores) about 20 hours. All cores up to 100%
Three (11 + 11 + 10 = 32 cores) about 15 hours.


All workstations are connected to the same router and sitting in the same room.

Network card 1 GBit/sec on all

Any suggestion??

Thanks a lot


posterdahl May 18, 2011 08:08

I'm not sure this will help you but I can throw some ideas in the air for you.
Normally you'd expect Fluent to scale well using Gbit/E interconnect down to about 250.000 cells/core for regular modelling like aerodynamics. You are way larger than this. 24 cores for 20 mio cells is 800 000+ cells/core.
Scaling/performance is always case dependent but I have some questions and comments for you:

1. Is the switch/router you're using Gbit/E or 100Mbit/E?
- Many offices only have 100Mbit/E connection and this will kill the performance.
2. Are you using some special modelling like combustion or multiphase?
- Avanced modelling typically run better in shared memory than across machines.
3. Are you making a lot of file I/O? (like saving transient timesteps?)
- Writing data over the network to a shared folder is slower than writing to local disk in one machine.
4. Are the machines doing other workloads while you are solving.
- The strange thing here is that you slower on 2 machines and then faster again on 3 machines.

On Intel hex-core typically you don't have memory bandwidth enough on the chip to acive good scaling using all 6 cores/CPU for running Fluent.
The sweetspot for scaling is typically to use 4 cores out of six in each CPU. As an example: Compared to a serial run you could achive 7.5X speed-up on 8 cores in a machine but using 12 cores in the same machine will give you 8.5X. So you don't gain much using the last 4 cores in the system. Typically comparing 24 core runs you'd see that running 2 machines using 12 cores each will be slower than running 3 machines using 8 cores each.


All times are GMT -4. The time now is 05:14.