Help! Running parallel mpich2
Hi all,
Im trying to run a parallel job using a local network consisting in two quadcores linked with a normal swich. The net seems to be right (both computers see each other) and rsh runs normally ( i can do the tipical remote probe). i also intall in both machines the mpich2 service and register the same user in both computers. Also i´ve shut down the firewall to avoid problems. The problem is that it works all well but when the solver shows solver in the output screen it gets stoped and exit with code 0 responding to a command from the master node: "Command on host returned with code 0" is the message. at first i obtain code 255 too but now i only get code 0. Can any one help me? i have read the parallel documentation i dont know were is the fail. Thanks in advance for reading this and hope someone could help me. Javier. |
Step 1 is to determine whether the problem is your simulation, the parallel setup or distributed parallel setup.
Does the simulation run OK serial? Does it run OK local parallel? |
Thanks Glenn,
The problem arise when i use distributed setup. The model runs in serial and also in local parallel. I have the problem whtn i try to run working with two separates machines linked by a swich. I did the following: 0 be sure that the net is working and both computers can work 1 turn off firewalls 1 install mpich2 services in both 2 activate the services with the same log and pass 3 run the simulation. 4. i obtain error code 0 when the solver start. Im forgeting something? I will be pleasure of any help. Thanks. |
What OS are you using? Do the other parallel options work (eg HP MPI, PVM)?
|
Hi,
Im using XP64. I try with MPI and it doesnt work aswell. Thanks. |
Hi, This wont solve your problem but just try switching your master node and slave. See if you get the same error.
|
try to use a differnet partition mode, e.g. user defined direction
martin Quote:
|
All times are GMT -4. The time now is 14:16. |