CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   CFX (https://www.cfd-online.com/Forums/cfx/)
-   -   Help! Running parallel mpich2 (https://www.cfd-online.com/Forums/cfx/73226-help-running-parallel-mpich2.html)

jpcfd March 2, 2010 11:06

Help! Running parallel mpich2
 
Hi all,

Im trying to run a parallel job using a local network consisting in two quadcores linked with a normal swich. The net seems to be right (both computers see each other) and rsh runs normally ( i can do the tipical remote probe). i also intall in both machines the mpich2 service and register the same user in both computers. Also i´ve shut down the firewall to avoid problems.

The problem is that it works all well but when the solver shows solver in the output screen it gets stoped and exit with code 0 responding to a command from the master node:

"Command on host returned with code 0" is the message.
at first i obtain code 255 too but now i only get code 0.

Can any one help me? i have read the parallel documentation i dont know were is the fail.

Thanks in advance for reading this and hope someone could help me.

Javier.

ghorrocks March 2, 2010 17:00

Step 1 is to determine whether the problem is your simulation, the parallel setup or distributed parallel setup.

Does the simulation run OK serial? Does it run OK local parallel?

jpcfd March 2, 2010 17:34

Thanks Glenn,

The problem arise when i use distributed setup. The model runs in serial and also in local parallel. I have the problem whtn i try to run working with two separates machines linked by a swich. I did the following:

0 be sure that the net is working and both computers can work
1 turn off firewalls
1 install mpich2 services in both
2 activate the services with the same log and pass
3 run the simulation.
4. i obtain error code 0 when the solver start.

Im forgeting something?

I will be pleasure of any help.

Thanks.

ghorrocks March 3, 2010 05:52

What OS are you using? Do the other parallel options work (eg HP MPI, PVM)?

jpcfd March 4, 2010 07:49

Hi,

Im using XP64. I try with MPI and it doesnt work aswell.

Thanks.

sans March 5, 2010 07:31

Hi, This wont solve your problem but just try switching your master node and slave. See if you get the same error.

tvt_mvt March 6, 2010 09:48

try to use a differnet partition mode, e.g. user defined direction
martin

Quote:

Originally Posted by jpcfd (Post 248148)
Hi all,

Im trying to run a parallel job using a local network consisting in two quadcores linked with a normal swich. The net seems to be right (both computers see each other) and rsh runs normally ( i can do the tipical remote probe). i also intall in both machines the mpich2 service and register the same user in both computers. Also i´ve shut down the firewall to avoid problems.

The problem is that it works all well but when the solver shows solver in the output screen it gets stoped and exit with code 0 responding to a command from the master node:

"Command on host returned with code 0" is the message.
at first i obtain code 255 too but now i only get code 0.

Can any one help me? i have read the parallel documentation i dont know were is the fail.

Thanks in advance for reading this and hope someone could help me.

Javier.



All times are GMT -4. The time now is 14:16.