CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Running in Parallel on cluster (https://www.cfd-online.com/Forums/openfoam-solving/81642-running-parallel-cluster.html)

NewFoamer November 2, 2010 09:15

Running in Parallel on cluster
 
Hello,

I'm trying to run a cache on my school cluster. Currently I'm issuing following command:

Code:

mpirun --hostfile system/machines -np 6 pimpleFoamScalar -parallel
With my hostfile:
Code:

XXX@top.nbi.dk slots=2 max-slots=2
XXX@charm.nbi.dk slots=2 max-slots=2
XXX@bottom.nbi.dk slots=2 max-slots=2

Which gives me following error:
Code:

[top.nbi.dk:18796] btl: tcp: attempting to connect() to address 130.225.212.8 on port 260
[top.nbi.dk][[19986,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 130.225.212.8 failed: No route to host (113)

From the google-searched I've done, I believe I have to properly set the --mca to something, but I've thus far been unsuccessful in doing so properly. I'm new to OpenMPI, and do not have a lot of experience with network interfaces etc.

Any help or clues as how to debug the problem would be appreciated!
Thank you.

falcao November 2, 2010 10:57

There is a OpenFoam cluster receipe, in the second page, in this same forum.

http://www.cfd-online.com/Forums/ope...am-solved.html

The translation to English is immediately below.

NewFoamer November 2, 2010 11:04

This does not solve my problem - The machines share the same network drive and I've decomposed, made the hostfile etc. correctly.

r08n November 3, 2010 16:20

[top.nbi.dk][[19986,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_comple te_connect] connect() to 130.225.212.8 failed: No route to host (113)

This looks like a network problem (misconfiguration?). First of all, check if you can
connect from one node of the cluster to any another node; e.g., like this:
telnet 130.225.212.8 260; if you get this message "no route to host", check the network configuration.


All times are GMT -4. The time now is 13:01.