MPICH distributed optimization
Since I am considering expansion of my calculation abilities I study various systems configurations. To test possibility of connection of separate PCs I connected QuadCore q6600 (WIN XP x64) and some old single core AMD (WIN XP x32) (about 10 times slower) through 100MB switch. The version of CFX was 10.0.
With help of instructions from this forum I managed to run PVM and MPICH:
1) PMV is terribly slow. Comparing 5 cores q6600+AMD with various relative speed parameters it was 3 times slower than 4 cores q6600 in 500k elements model.
2) Using MPICH I managed about 20% speedup 5 cores q6600 + AMD vs. 4 cores q6600 in 500k elements model but only about 4% speedup with models bigger than 3000k elements.
Because of these results I want to ask experienced users of clusters:
- what is the result of connection 2 computers of such big difference in speed?
- is it normal that the speedup appears on small models? At least connecting different computers.
- are there any option of optimization of MPICH parameters to gain higher speedup?
It is clear for me that there is no point in connecting computers of such a different abilities (generally) and I am not going to do it in future. I just want gain better understanding of the specification of multi-machine calculations.
It is very hard to get good speedups using clusters of machines with varying specifications. Even if you do get a load balancing ratio which works you will find it changes as the simulation changes so it is just a headache. My recommendation is the only reason to use heterogeneous clusters is to expand the available memory. You will not get much (if any) speedup - and this is what you found.
Clusters have to be the same machine, preferably as identical as possible. Also each node on the cluster should be loaded equally (ie 4 partitions on one machine and 2 on another is a bad idea, 3 and 3 would be much better).
When you have a homogenous cluster you should get speedups using MPICH (assuming you are using windows) of 80-90% or so for small clusters for most jobs.
Your final line is completely correct. Forget about benchmarking the setup you have as it is not going to tell you anything useful. There is a interesting post on the CFX-Community forum page (on the ANSYS web site) which discusses parallel issues, benchmarks and parallel speedup in some detail.
Thanks Glenn, You support my suspections.
I must say that generally speaking I am not going to achieve speedups (I mean: 2 machines - the same problem as on single one) but rather expand capacity (2 machines - 2 times bigger problems). My only demand is to preserve speed in second case assuming that the machines are the same.
|All times are GMT -4. The time now is 21:46.|