CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Installation (https://www.cfd-online.com/Forums/openfoam-installation/)
-   -   How to setup a simple OpenFOAM cluster? (https://www.cfd-online.com/Forums/openfoam-installation/125306-how-setup-simple-openfoam-cluster.html)

TommiPLaiho October 23, 2013 04:01

How to setup a simple OpenFOAM cluster?
 
Hi,

I have been trying to find a tutorial how to setup a simple OpenFOAM cluster. I have two 8 core computers and OpenMPI, OpenFOAM and Paraview are successfully installed to these two computers. I am using Ubuntu Studio 12.04 and Ubuntu CAELinux 10.04 as platforms. The computers are connected with a common router.

I have not found a simple Linux novice tutorial how to connect these two computers and make a simple two computers cluster of these two machines. The tutorial should be really a novice guide, a step by step tutorial.

Few questions:

1. Does the OpenFOAM installation need to be exactly same version in both machines? I have OpenFOAM 2.2.2 in Ubuntu Studio 12.04 and OpenFOAM 2.1.1 in Ubuntu 10.04 CAELinux. Will I face problems with this combination, I mean Ubuntu CAELinux is a master machine and Ubuntu Studio 12.04 is a slave machine. Maybe upper release can run lower release files? The OpenMPI is also lower version 1.4.1 in CAELinux which is a master and 1.5 in a slave machine.

2. The router what I use is not bridged. I have same ip in every machine. Is this a problem?

Please help and thanks in advance.

alf12 October 25, 2013 08:41

Quote:

I have same ip in every machine. Is this a problem?
Yes.

Then you may setup MPI and check that it's working properly

TommiPLaiho October 27, 2013 07:53

Hi,


I have been trying to setup Open MPI with OpenFOAM211. I have compiled the Open MPI by myself but it went very smoothly so I guess the compilation went correctly. However I am not a true expert of the field.


I have a bridged modem having different IP's with different machines right now. I should at first connect two computers with Open MPI in order to run 14 cores with OpenFOAM 211. The OpenFOAM 211 is same version now in both computers which I will call as master and slave and also OpenMPI is same in both computers. This time master machine is CAELinux Ubuntu 10.04 LTS and slave is Ubuntu Studio 12.04 LTS. I have read many articles and also FAQ but now I cant progress any more by myself, I need some help. Thank you for understanding.


So when I run this code:


Code:

/opt/openmpi-1.6.5/bin/mpirun -hostfile /home/tommi2/Desktop/machinefile -np 14 /opt/openfoam211/platforms/linux*/bin/pisoFoam -parallel
and give a password for my slave machine I will get this long, very long error by OpenFOAM211 and Open MPI. In order to be honest I don't fully understand its whole meaning. Here is goes:



Code:

/opt/openfoam211/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libincompressibleTurbulenceModel.so: cannot open shared object file: No such file or directory
 /opt/openfoam211/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libincompressibleTurbulenceModel.so: cannot open shared object file: No such file or directory
 /opt/openfoam211/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libincompressibleTurbulenceModel.so: cannot open shared object file: No such file or directory
 /opt/openfoam211/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libincompressibleTurbulenceModel.so: cannot open shared object file: No such file or directory
 /opt/openfoam211/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libincompressibleTurbulenceModel.so: cannot open shared object file: No such file or directory
 /opt/openfoam211/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libincompressibleTurbulenceModel.so: cannot open shared object file: No such file or directory
 /opt/openfoam211/platforms/linux64GccDPOpt/bin/pisoFoam: error while loading shared libraries: libincompressibleTurbulenceModel.so: cannot open shared object file: No such file or directory
 [caelinux:05633] [[49395,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/util/nidmap.c at line 371
 [caelinux:05633] [[49395,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
 [caelinux:05633] [[49395,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
 --------------------------------------------------------------------------
 It looks like orte_init failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during orte_init; some of which are due to configuration or
 environment problems.  This failure appears to be an internal failure;
 here's some additional information (which may only be relevant to an
 Open MPI developer):
 
  orte_ess_base_build_nidmap failed
  --> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS
 --------------------------------------------------------------------------
 --------------------------------------------------------------------------
 It looks like orte_init failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during orte_init; some of which are due to configuration or
 environment problems.  This failure appears to be an internal failure;
 here's some additional information (which may only be relevant to an
 Open MPI developer):
 
  orte_ess_set_name failed
  --> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS
 --------------------------------------------------------------------------
 [caelinux:05633] [[49395,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/runtime/orte_init.c at line 132
 --------------------------------------------------------------------------
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or environment
 problems.  This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):
 
  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
 --------------------------------------------------------------------------
 *** An error occurred in MPI_Init
 *** before MPI was initialized
 *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
 [caelinux:5633] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
 *** An error occurred in MPI_Init
 *** before MPI was initialized
 *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
 [caelinux:5634] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
 [caelinux:05634] [[49395,1],3] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/util/nidmap.c at line 371
 [caelinux:05634] [[49395,1],3] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
 [caelinux:05634] [[49395,1],3] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
 [caelinux:05634] [[49395,1],3] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/runtime/orte_init.c at line 132
 --------------------------------------------------------------------------
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or environment
 problems.  This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):
 
  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
 --------------------------------------------------------------------------
 [caelinux:05635] [[49395,1],5] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/util/nidmap.c at line 371
 [caelinux:05635] [[49395,1],5] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
 [caelinux:05635] [[49395,1],5] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
 [caelinux:05635] [[49395,1],5] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/runtime/orte_init.c at line 132
 --------------------------------------------------------------------------
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or environment
 problems.  This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):
 
  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
 --------------------------------------------------------------------------
 *** An error occurred in MPI_Init
 *** before MPI was initialized
 *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
 [caelinux:5635] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
 [caelinux:05636] [[49395,1],7] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/util/nidmap.c at line 371
 [caelinux:05636] [[49395,1],7] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
 [caelinux:05636] [[49395,1],7] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
 [caelinux:05636] [[49395,1],7] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/runtime/orte_init.c at line 132
 --------------------------------------------------------------------------
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or environment
 problems.  This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):
 
  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
 --------------------------------------------------------------------------
 *** An error occurred in MPI_Init
 *** before MPI was initialized
 *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
 [caelinux:5636] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
 --------------------------------------------------------------------------
 mpirun has exited due to process rank 1 with PID 5633 on
 node caelinux exiting improperly. There are two reasons this could occur:
 
 1. this process did not call "init" before exiting, but others in
 the job did. This can cause a job to hang indefinitely while it waits
 for all processes to call "init". By rule, if one process calls "init",
 then ALL processes must call "init" prior to termination.
 
 2. this process called "init", but exited without calling "finalize".
 By rule, all processes that call "init" MUST call "finalize" prior to
 exiting or it will be considered an "abnormal termination"
 
 This may have caused other processes in the application to be
 terminated by signals sent by mpirun (as reported here).
 --------------------------------------------------------------------------
 [caelinux:05630] 3 more processes have sent help message help-orte-runtime.txt / orte_init:startup:internal-failure
 [caelinux:05630] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
 [caelinux:05630] 3 more processes have sent help message help-orte-runtime / orte_init:startup:internal-failure

So I will also give my OpenFOAM decomposeParDict setup for 14 cores:




Code:

/*--------------------------------*- C++ -*----------------------------------*\
 | =========                |                                                |
 | \\      /  F ield        | OpenFOAM Extend Project: Open Source CFD        |
 |  \\    /  O peration    | Version:  1.6-ext                              |
 |  \\  /    A nd          | Web:      www.extend-project.de                |
 |    \\/    M anipulation  |                                                |
 \*---------------------------------------------------------------------------*/
 FoamFile
 {
    version    2.0;
    format      ascii;
    class      dictionary;
    object      decomposeParDict;
 }
 // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
 
 numberOfSubdomains 14;
 
 method          hierarchical;
 //method          metis;
 //method          parMetis;
 
 simpleCoeffs
 {
    n              (4 1 1);
    delta          0.001;
 }
 
 hierarchicalCoeffs
 {
    n              (14 1 1);
    delta          0.001;
    order          xyz;
 }
 
 manualCoeffs
 {
    dataFile        "cellDecomposition";
 }
 
 metisCoeffs
 {
    //n                  (5 1 1);
    //cellWeightsFile    "constant/cellWeightsFile";
 }
 
 // ************************************************************************* //

just in case there is a fault in that. Please help me I am totally confused and thanks in advance.

wyldckat October 27, 2013 15:15

Greetings to all!

@Tommi:
  1. The two machines should not have the same IP. But they should be on the same network.
  2. Both machines should use the same version and architecture of OpenFOAM and Open-MPI.
    • If different versions of Open-MPI is a must, then it should be a system based Open-MPI, so that there aren't any problems with paths.
  3. Beyond version and architecture, the OpenFOAM installation path on each machine should be identical. Well, at least if you want to keep things simple.
AFAIK, there isn't any perfect guide available online for setting up an OpenFOAM based cluster. Nonetheless, in this blog post of mine: Notes about running OpenFOAM in parallel - you can find a collection of notes, where among them is this thread: Cluster OpenFOAM [Solved] (... start reading from post #2)

Good luck! Best regards,
Bruno


All times are GMT -4. The time now is 20:49.