![]() |
Fluent parallel computing on multiple Workstations slower than one
Hello,
I am not sure if this question is already asked here. Anyway, I did not found it in my quick search. Problem: Slower parallel computing on multiple workstations than one alone. Hardware: - 3 identical Workstations - On every one: 2 x X5680 (2 x 6 cores) - On every one: 24 GB RAM - Windows XP 64bit Description: When I solve a problem on one WS, then I am faster than on two or three. Example: 2e7 elements One WS for 1000 Iteration (1 x 12 cores) about 4 hours. All cores up to 100 % Two WS for the same (2 x 12 cores) about 20 hours. All cores up to 100% Three (11 + 11 + 10 = 32 cores) about 15 hours. HP-MPI All workstations are connected to the same router and sitting in the same room. Network card 1 GBit/sec on all Any suggestion?? Thanks a lot Peter |
Hi,
I'm not sure this will help you but I can throw some ideas in the air for you. Normally you'd expect Fluent to scale well using Gbit/E interconnect down to about 250.000 cells/core for regular modelling like aerodynamics. You are way larger than this. 24 cores for 20 mio cells is 800 000+ cells/core. Scaling/performance is always case dependent but I have some questions and comments for you: 1. Is the switch/router you're using Gbit/E or 100Mbit/E? - Many offices only have 100Mbit/E connection and this will kill the performance. 2. Are you using some special modelling like combustion or multiphase? - Avanced modelling typically run better in shared memory than across machines. 3. Are you making a lot of file I/O? (like saving transient timesteps?) - Writing data over the network to a shared folder is slower than writing to local disk in one machine. 4. Are the machines doing other workloads while you are solving. - The strange thing here is that you slower on 2 machines and then faster again on 3 machines. On Intel hex-core typically you don't have memory bandwidth enough on the chip to acive good scaling using all 6 cores/CPU for running Fluent. The sweetspot for scaling is typically to use 4 cores out of six in each CPU. As an example: Compared to a serial run you could achive 7.5X speed-up on 8 cores in a machine but using 12 cores in the same machine will give you 8.5X. So you don't gain much using the last 4 cores in the system. Typically comparing 24 core runs you'd see that running 2 machines using 12 cores each will be slower than running 3 machines using 8 cores each. Cheers! Per |
| All times are GMT -4. The time now is 22:22. |