CFD Online URL
[Sponsors]
Home > Forums > CFX

MPICH distributed optimization

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   April 26, 2009, 16:05
Default MPICH distributed optimization
  #1
Member
 
Lukasz
Join Date: Mar 2009
Posts: 64
Rep Power: 7
Luk_Fiz is on a distinguished road
Hi all,
Since I am considering expansion of my calculation abilities I study various systems configurations. To test possibility of connection of separate PCs I connected QuadCore q6600 (WIN XP x64) and some old single core AMD (WIN XP x32) (about 10 times slower) through 100MB switch. The version of CFX was 10.0.
With help of instructions from this forum I managed to run PVM and MPICH:
1) PMV is terribly slow. Comparing 5 cores q6600+AMD with various relative speed parameters it was 3 times slower than 4 cores q6600 in 500k elements model.
2) Using MPICH I managed about 20% speedup 5 cores q6600 + AMD vs. 4 cores q6600 in 500k elements model but only about 4% speedup with models bigger than 3000k elements.

Because of these results I want to ask experienced users of clusters:
- what is the result of connection 2 computers of such big difference in speed?
- is it normal that the speedup appears on small models? At least connecting different computers.
- are there any option of optimization of MPICH parameters to gain higher speedup?

It is clear for me that there is no point in connecting computers of such a different abilities (generally) and I am not going to do it in future. I just want gain better understanding of the specification of multi-machine calculations.

Luk
Luk_Fiz is offline   Reply With Quote

Old   April 26, 2009, 19:52
Default
  #2
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 9,870
Rep Power: 78
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
Hi,

It is very hard to get good speedups using clusters of machines with varying specifications. Even if you do get a load balancing ratio which works you will find it changes as the simulation changes so it is just a headache. My recommendation is the only reason to use heterogeneous clusters is to expand the available memory. You will not get much (if any) speedup - and this is what you found.

Clusters have to be the same machine, preferably as identical as possible. Also each node on the cluster should be loaded equally (ie 4 partitions on one machine and 2 on another is a bad idea, 3 and 3 would be much better).

When you have a homogenous cluster you should get speedups using MPICH (assuming you are using windows) of 80-90% or so for small clusters for most jobs.

Your final line is completely correct. Forget about benchmarking the setup you have as it is not going to tell you anything useful. There is a interesting post on the CFX-Community forum page (on the ANSYS web site) which discusses parallel issues, benchmarks and parallel speedup in some detail.

Glenn Horrocks
ghorrocks is offline   Reply With Quote

Old   April 27, 2009, 03:50
Default
  #3
Member
 
Lukasz
Join Date: Mar 2009
Posts: 64
Rep Power: 7
Luk_Fiz is on a distinguished road
Thanks Glenn, You support my suspections.
I must say that generally speaking I am not going to achieve speedups (I mean: 2 machines - the same problem as on single one) but rather expand capacity (2 machines - 2 times bigger problems). My only demand is to preserve speed in second case assuming that the machines are the same.


Luk
Luk_Fiz is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
BOOK ON OPTIMIZATION OF FORTRAN CODES Alberto Main CFD Forum 3 November 10, 2008 07:07
distributed mpich problem Trevor CFX 15 January 9, 2007 18:08
MPICH Parallel Run Error on the Windows Server2003 Saturn CFX 3 August 29, 2006 09:42
CFX distributed computing: Cluster design Q Joe CFX 7 July 11, 2006 11:21
MPICH : undefined reference to.. Vincent CD-adapco 3 March 24, 2005 13:30


All times are GMT -4. The time now is 00:31.