Fluent parallel computing on multiple Workstations slower than one
I am not sure if this question is already asked here.
Anyway, I did not found it in my quick search.
Slower parallel computing on multiple workstations than one alone.
- 3 identical Workstations
- On every one: 2 x X5680 (2 x 6 cores)
- On every one: 24 GB RAM
- Windows XP 64bit
When I solve a problem on one WS, then I am faster than on two or three.
One WS for 1000 Iteration (1 x 12 cores) about 4 hours. All cores up to 100 %
Two WS for the same (2 x 12 cores) about 20 hours. All cores up to 100%
Three (11 + 11 + 10 = 32 cores) about 15 hours.
All workstations are connected to the same router and sitting in the same room.
Network card 1 GBit/sec on all
Thanks a lot
I'm not sure this will help you but I can throw some ideas in the air for you.
Normally you'd expect Fluent to scale well using Gbit/E interconnect down to about 250.000 cells/core for regular modelling like aerodynamics. You are way larger than this. 24 cores for 20 mio cells is 800 000+ cells/core.
Scaling/performance is always case dependent but I have some questions and comments for you:
1. Is the switch/router you're using Gbit/E or 100Mbit/E?
- Many offices only have 100Mbit/E connection and this will kill the performance.
2. Are you using some special modelling like combustion or multiphase?
- Avanced modelling typically run better in shared memory than across machines.
3. Are you making a lot of file I/O? (like saving transient timesteps?)
- Writing data over the network to a shared folder is slower than writing to local disk in one machine.
4. Are the machines doing other workloads while you are solving.
- The strange thing here is that you slower on 2 machines and then faster again on 3 machines.
On Intel hex-core typically you don't have memory bandwidth enough on the chip to acive good scaling using all 6 cores/CPU for running Fluent.
The sweetspot for scaling is typically to use 4 cores out of six in each CPU. As an example: Compared to a serial run you could achive 7.5X speed-up on 8 cores in a machine but using 12 cores in the same machine will give you 8.5X. So you don't gain much using the last 4 cores in the system. Typically comparing 24 core runs you'd see that running 2 machines using 12 cores each will be slower than running 3 machines using 8 cores each.
|All times are GMT -4. The time now is 13:18.|