CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

distributed mpich problem

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   January 2, 2007, 18:23
Default distributed mpich problem
  #1
Trevor
Guest
 
Posts: n/a
I am having problems with distributed MPICH between two XP32 and XP64 boxes. This is what I have done. I have the same user with admin rights logged onto both systems as a local user. The rsh and mpich services are configured to use the above mentioned user. I am running CFX10 SP1 on both boxes. On the XP64 box I have installed CFX to c:\1\cfx so that I did not have to worry about spaces in the path, but the XP32 box has it installed in c:\program files\ansys inc\… etc. The correct short path names have been entered into the hosts.ccl files on both machines correctly, the hosts files are identical.

The system works fine when running distributed PVM, although I do get 2 warnings 1. Warning! rsh connection to host X produces the following output before the output of the command: Terminal readThe handle is invalid. This may cause problems spawning parallel slaves, especially on Windows.

2. Warning! rsh connection to host X produces the following output after the output of the command: : It could indicate that an rshd service from a different vendor is running, which may not provide the necessary functionality. This may cause problems spawning parallel slaves.

These are warnings that I happen at the start of the mpich run as well. I would like to fix them up as they may be having an effect on the mpich run.

But when I run distributed MPICH I get the following error

-------------- An error has occurred in cfx5solve: The ANSYS CFX solver has terminated without writing a results file. Command on host cluster exited with return code 0. --------------

That is it, no more explanation that that. The services (rsh and mpich) are running on both machines, the distributed PVM works ok, but with the above described warnings that I would like to clear up. I don't know what the problem is.

Can anyone help with this?

Thanks Trevor P

  Reply With Quote

Old   January 2, 2007, 20:34
Default Re: distributed mpich problem
  #2
Johnny
Guest
 
Posts: n/a
I thought you could only run MPICH on homogeneous clusters. I'm not sure if XP32 and XP64 count as being the same.

Can you run MPICH local parallel, or MPICH distributed on XP32 or XP 64 (not both)?
  Reply With Quote

Old   January 2, 2007, 21:39
Default Re: distributed mpich problem
  #3
Trevor
Guest
 
Posts: n/a
MPICH local parallel runs fine. The documentation only says (from memory) that you cannot run a mix of Unix/Linux and windows, it does not mention and combination of different variants of windows, XP32, XP64, WIN2000 etc. If that is the problem, then it is easily fixed (oh if things were that simple).

PVM runs fine between XP32 Master and XP64 slave, but not MPICH.

Thanks Trevor P
  Reply With Quote

Old   January 3, 2007, 01:20
Default Re: distributed mpich problem
  #4
HekLeR
Guest
 
Posts: n/a
The doc you read only applies to PVM.

MPICH must be homogeneous in hardware & os. so no xp32+xp64 parallel runs with this.
  Reply With Quote

Old   January 3, 2007, 01:47
Default Re: distributed mpich problem
  #5
Trevor
Guest
 
Posts: n/a
On page 57 of the ANSYS CFX 10.0: Installation and Overview doc, under Setting up MPICH for Windows it states Important: Distributed parallel using MPICH cannot be set up using a mixture of UNIX and Windows machines.

Can you confirm that it also means that no XP32 to XP64? Also has anyone got XP64 to XP64 MPICH running with Ansys CFX 10.0 SP1 for win 32 working? or is it just XP32 to XP32?

Thanks for you help, I really appreciate it, I may be making headway.

Thanks Trevor
  Reply With Quote

Old   January 3, 2007, 08:05
Default Re: distributed mpich problem
  #6
Trevor
Guest
 
Posts: n/a
Ok, I just tried distributed mpich between 2 XP32 machines and got exactly the same errors. I even changed my main machines installation directory to c:\1\cfx to remove all spaces etc.

Interestingly enough, I did attempt a run with the mpich daemon service turned off and got exactly the same error, "Command on host XX exited with return code 0". I still get the 2 warnings as beforeand the thing still crashes. Both machines are runnign XP pro SP2, 1 is a dell9400, the other a Toshiba Tecra A1.

Do you have to go into the MPIConfig.exe to configure anything? Is there anyway to test that the mpich service is working, you know just like there is with the rsh <machine name> cmd /c set commands?

I have also just accepted the defaults with the dist parallel set up, I have not touched the advanced options tabs.

Thanks Trevor
  Reply With Quote

Old   January 3, 2007, 11:05
Default Re: distributed mpich problem
  #7
Bian
Guest
 
Posts: n/a
As my experience, MPICH could be easily set on several IDENTICAL machines (both OS and Hardware), but easily has problems on others. Actually, distributed MPICH is unstable and has very less advantages than distributed PVM.

If you want faster speed, local MPICH is the best way. Distributed MPICH and PVM may have the same speed under Windows OS.
  Reply With Quote

Old   January 3, 2007, 20:20
Default Re: distributed mpich problem
  #8
Trevor
Guest
 
Posts: n/a
Bian, I think I read somewhere that you got XP64 to XP64 MPICH working, were they on identical machines, how did you set them up and did you get past those 2 warnings.

Thanks Trev
  Reply With Quote

Old   January 4, 2007, 01:22
Default Re: distributed mpich problem
  #9
Trevor
Guest
 
Posts: n/a
Bian, could the problem be that I have a network of pc's that belong to a workgroup that all have local users and not a domain with a central repositary of users. Hence I have registered users with localusername and password ( the same accross all machines) but the MPI documentation says the users have to be specified in the form of "Domain\User". I have only supplied the User part.

If so, do I have to register the same user multiple times in the form of Machine1\CFDUser, Machine2\CFDUser etc?

Thanks Trevor
  Reply With Quote

Old   January 4, 2007, 02:41
Default Re: distributed mpich problem
  #10
CFDworker
Guest
 
Posts: n/a
Hi Trevor,

I just want to share my experience. I am running on a mpich windows cluster. All my machines work within the same domain, and they are accessed through the same domain password during windows logon.

When I set mpich up, I use the Domain name as "user" and the domain logon password as "password". I supply this information to each of the computers in the system through the: cfx5parallel -register -mpich -user. This worked for me.

Best regards CFDworker
  Reply With Quote

Old   January 4, 2007, 06:49
Default Re: distributed mpich problem
  #11
Trevor
Guest
 
Posts: n/a
CFDworker, thanks for the reply. I think I am fast comming to the realisation that I need a domain, not just a group of pc's running in a workgroup.

This may take some time to setup.

Thanks Trev
  Reply With Quote

Old   January 7, 2007, 06:48
Default Problem FIXED !
  #12
Trevor
Guest
 
Posts: n/a
OK, the PVM warnings were fixed by installing the patch described in microsoft knowledge base KB892099. I now have no warnings.

The MPICH probs were due to a long path to get to the .def file. I only noticed this when I clicked on the command prompt window that appears when you run cfx5solver. I changed the path and the file name to standard 8.3 format and it worked fine.

It runs on a Domain as well as a workgroup, just log in as the same user and password on the 2 machines and presto. I was running a wireless network no probs.

To all of those who tried to help thanks heaps.

Thanks Trev.

  Reply With Quote

Old   January 9, 2007, 00:20
Default Re: Problem FIXED !
  #13
HekLeR
Guest
 
Posts: n/a
I was going to mention the invalid read handle fix for rsh, sorry.

Did the path contain spaces?
  Reply With Quote

Old   January 9, 2007, 00:22
Default Re: distributed mpich problem
  #14
HekLeR
Guest
 
Posts: n/a
The doc was written before xp64 existed. xp64 is a different os.

cfx 10 was released before xp64, so it is not officially supported. However, it should probably work fine, and I think it does.
  Reply With Quote

Old   January 9, 2007, 10:57
Default Re: distributed mpich problem
  #15
Bian
Guest
 
Posts: n/a
Yes, I got CFX11P6 MPICH working on two identical XP64 machines. However, it has other problems (affecting speed) and I am waiting for the official release with everything working.
  Reply With Quote

Old   January 9, 2007, 18:08
Default Re: Problem FIXED !
  #16
Trevor
Guest
 
Posts: n/a
Yes, The path to the .def file had heaps of spaces. I thought the spaces problem was only associated with the path to the cfx root dir, this was not the case.

Trevor
  Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Handling cyclic BC from gambit to openfoam for a cascade airfoil problem - OF 1.6 maverick OpenFOAM Other Meshers: ICEM, Star, Ansys, Pointwise, GridPro, Ansa, ... 2 June 18, 2011 04:36
MPICH parallel problem (CFX-11 preview 5) CFDworker CFX 8 October 10, 2006 21:53
PVM Distributed problem - error connecting zaidun CFX 2 July 5, 2006 09:59
MPICH problem (CFX-5.7.1) Jesper CFX 7 April 16, 2005 05:04
CFX-5.7 MPICH Parallel Problem (Output of Results) James Date CFX 7 February 15, 2005 17:03


All times are GMT -4. The time now is 04:33.