CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Siemens (https://www.cfd-online.com/Forums/siemens/)
-   -   Probles with 2 node cluster with Mandrake 10.0 (https://www.cfd-online.com/Forums/siemens/55105-probles-2-node-cluster-mandrake-10-0-a.html)

Esti Armendariz January 24, 2006 04:28

Probles with 2 node cluster with Mandrake 10.0
 
Hi there all, we are trying to get up and running a wee 2 node cluster using Mandrake 10.0 as OS.

So far we are stucked with this error:

NP: Spawning STAR processes on multiple nodes (cluster). bash: line 1: /home/starhome/test/entalpia/entalpia_0001/.starboot.run: No such file or directory p0_5413: p4_error: Timeout in making connection to remote process on administracion2: 0 /usr/starcd/MPICH/1.2.4/linux_2.4-gcc_2.95.3-glibc_2.2.2-dso/ch_p4/bin/mpirun: line 1: 5413 Broken pipe /home/starhome/test/entalpia/entalpia_0001/.starboot.run -p4pg .starboot.mpi -p4wd /home/starhome/teSinNombre 1st/entalpia/entalpia_0001 PNP: Shutdown [2006-01-23-17:35:42] Execution aborted by request (SIGABRT) after 309 seconds (TOTAL ELAPSED TIME).

What is that .starboot.run file starCD is looking for?

Have any of you had the same problem? How did you solve it?

Any help and hints are welcome!!

Thanks a lot, Esti.

steve January 24, 2006 08:01

Re: Probles with 2 node cluster with Mandrake 10.0
 
Make sure that you have write permission in every directory you are using. Make sure that both nodes can see the master directory and that it is called by the same name on both nodes. Sometimes an nfs mounted partition has one name on its local machine and another on a remote.

Esti Armendariz January 24, 2006 10:02

Re: Probles with 2 node cluster with Mandrake 10.0
 
Hi there Steve, thanks very much for your quick reply.

Our cluster is composed by 2 CPUs, one of them called Master and the other one Slave, same name on local and remote.

Both machines can communicate perfectly, since the rsh-server is installed and works fine, I can send any command from one machine to the other.

We also have a precompiled MPICH, provided by CDAdapco. The machines work fine in sequential, but when it comes to parallel computations, the same error appears once and again and again.

Could it be because of the computer arquitecture or type or something like that? The same problem appears with these two nodes under Windows...

Any other suggestion?

Thanks a lot...

Mike January 28, 2006 09:09

Re: Probles with 2 node cluster with Mandrake 10.0
 
Esti,

I would echo Steve's comments. It appears to me that the directory /home/starhome/test/entalpia/ on slave is not nfs-mounted to be the same directory on master.

Have you tried the -distribute flag? Using this flag with Star-CD v3.24 or later (where the nfs-mount requirement is not as stringent) may help.


Esti Armendariz February 1, 2006 11:48

Re: Probles with 2 node cluster with Mandrake 10.0
 
Hi Mike, Thanks very much for your hint. Is the first time I face installing Linux and all that stuff so I think I should start from basics and learn a bit more about it... I've no idea what NFS is, so imagine....

Thanks a lot you all, I'll let you know how it goes!!! Cheers, Esti


All times are GMT -4. The time now is 16:20.