CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Running OpenFOAM on a Cluster (https://www.cfd-online.com/Forums/openfoam-solving/205593-running-openfoam-cluster.html)

Rishab August 21, 2018 12:25

Running OpenFOAM on a Cluster
 
Hi


im new to openfoam we have a cluster with one master node nd a client node with passwordless ssh enabled nd running openfoamv6 on both the machines with the case directories present in both nodes i get the following error when i try to run the case with foamJob. I read the other post where i got to know that it is environment problem so how to i set the environment properly? your help will be highly appreciated the following is the error.


mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client1
Executable: /opt/openfoam6/bin/foamJob





regards,
Rishab

hokhay August 21, 2018 12:58

Have you tried to run the case separately on each machine? Can you specify the command you have type?

Rishab August 22, 2018 08:12

Hi,


yes ive tried running the case individually on the machines it runs perfectly
the command is mpirun --hostfile machines -np 6 foamJob simpleFoam -parallel


regards
Rishab

hokhay August 22, 2018 11:34

U don't need foamJob in this command. Just type "mpirun --hostfile machines -np 6 simpleFoam -parallel"

Rishab August 22, 2018 12:13

yes ive done that too...i get the same error!

hokhay August 22, 2018 12:18

Can you type "which simpleFoam" on both computers and show us the output

Rishab August 22, 2018 12:22

this is what i get when i type which simpleFoam on master and on client

/opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam

hokhay August 22, 2018 12:50

What error does it show when you type "mpirun --hostfile machines -np 6 simpleFoam -parallel". Can you paste here?

Rishab August 22, 2018 12:56

this is the error i get when i type mpirun --hostfile machines np -6 simpleFoam -parallel



mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client1
Executable: /opt/openfoam6/bin/simpleFoam

hokhay August 22, 2018 13:07

I can see that it is trying to look for simpleFoam at /opt/openfoam6/bin/simpleFoam but the actual location is /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam on client 1, so it says cannot find the executable file.

How did you install OpenFoam? Do you use the same method to install them on both computers?

Rishab August 22, 2018 13:10

yes ive used the same method to install openfoam described in this website
https://openfoam.org/download/6-ubuntu/

Rishab August 23, 2018 05:56

Hi


when i type "mpirun --hostfile machines -np 6 simpleFoam -parallel" i get the following error



[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] usock_peer_send_blocking: send() to socket 39 failed: Broken pipe (32)
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0]-[[57312,1],0] usock_peer_accept: usock_peer_send_connect_ack failed
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client1
Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam







when i type "mpirun --hostfile machines -np 6 foamJob simpleFoam -parallel" i get the following error


Application : simpleFoam
Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 &
Application : simpleFoam
Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 &
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client1
Executable: /opt/openfoam6/bin/foamJob

feacluster August 23, 2018 11:54

Let's first make sure you can run on one node. What do you get when you run just this:

runParallel simpleFoam -parallel
mpirun -np 3 simpleFoam -parallel

And what do you get when you run:

mpirun --hostfile machines -np 6 hostname

Rishab August 23, 2018 12:37

Hi


when i run "runParallel simpleFoam -parallel" this is what i get :-
runParallel: command not found



when i run "mpirun -np 3 simpleFoam -parallel" the solver starts running this is what i get :-
/*---------------------------------------------------------------------------*\
========= |
\\ / F ield | OpenFOAM: The Open Source CFD Toolbox
\\ / O peration | Website: https://openfoam.org
\\ / A nd | Version: 6
\\/ M anipulation |
\*---------------------------------------------------------------------------*/
Build : 6-1a0c91b3baa8
Exec : simpleFoam -parallel
Date : Aug 23 2018
Time : 21:54:51
Host : "mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141"
PID : 9239
I/O : uncollated
Case : /home/mpiuser/OpenFOAM/mpiuser-6/run/tutorials/incompressible/simpleFoam/24-30-8.50
nProcs : 2
Slaves : 1("rishabghombal-HP-15-Notebook-PC.9240")
Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking
polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 10)
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Overriding OptimisationSwitches according to controlDict
maxThreadFileBufferSize 2e+09;

maxMasterFileBufferSize 2e+09;

Create mesh for time = 0


SIMPLE: Convergence criteria found
p: tolerance 1e-05
U: tolerance 1e-05
"(k|epsilon|)": tolerance 1e-05

Reading field p

Reading field U

Reading/calculating face flux field phi

Selecting incompressible transport model Newtonian
Selecting turbulence model type RAS
Selecting RAS turbulence model kEpsilon
RAS
{
RASModel kEpsilon;
turbulence on;
printCoeffs on;
Cmu 0.09;
C1 1.44;
C2 1.92;
C3 0;
sigmak 1;
sigmaEps 1.3;
}

No MRF models present

No finite volume options present

Starting time loop

Time = 1

smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 0.0205988, No Iterations 4
smoothSolver: Solving for Uy, Initial residual = 1, Final residual = 0.0245561, No Iterations 4
smoothSolver: Solving for Uz, Initial residual = 1, Final residual = 0.0245738, No Iterations 4
GAMG: Solving for p, Initial residual = 1, Final residual = 0.0767341, No Iterations 5
time step continuity errors : sum local = 0.027811, global = 0.0131447, cumulative = 0.0131447
smoothSolver: Solving for epsilon, Initial residual = 0.0859358, Final residual = 0.00617931, No Iterations 3
bounding epsilon, min: -1031.63 max: 13226.2 average: 851.478
smoothSolver: Solving for k, Initial residual = 1, Final residual = 0.0955549, No Iterations 6
ExecutionTime = 60.07 s ClockTime = 60 s

Time = 2



when i run "mpirun --hostfile machines -np 6 hostname" this is what i get :-
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the (master)
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the (client)

feacluster August 23, 2018 12:56

Quote:

Originally Posted by Rishab (Post 703690)

when i run "mpirun --hostfile machines -np 6 hostname" this is what i get :-
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the (master)
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the (client)

This looks a bit strange. What is the contents of your machines file? And what do you get when you type "mpirun -V"? Is firewall disabled on both machines?

Rishab August 23, 2018 13:21

these are the contents of my machines file



master slots=3
client slots=3


when i type "mpirun -V" i get :-
mpirun (Open MPI) 2.1.1

Report bugs to http://www.open-mpi.org/community/help/

feacluster August 23, 2018 13:39

Try this:

mpirun -np 6 -hostfile machines hostname

It should output:

master
master
master
client
client
client

Rishab August 23, 2018 13:51

when i type "mpirun -np 6 -hostfile machines hostname" this is what i get :-
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604


just like

master
master
master
client
client
client
but instead of master it gives out mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the master node and instead of client it gives out mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604 which is the client node

feacluster August 23, 2018 13:56

Ok, that looks correct . What you showed earlier was only:

master
master

Now try:

source /path/toopenfoam/installation/etc/bashrc
run
cd to job folder
then do:
mpirun -np 6 -hostfile machines simpleFoam -parallel

Rishab August 23, 2018 14:13

when i type "source /opt/openfoam6/etc/bashrc" i dont get anything like nothing happens



when i type "mpirun -np 6 -hostfile machines simpleFoam -parallel" this is what i get:-
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] usock_peer_send_blocking: send() to socket 39 failed: Broken pipe (32)
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0]-[[57312,1],0] usock_peer_accept: usock_peer_send_connect_ack failed
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client
Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam

feacluster August 23, 2018 14:15

Quote:

Originally Posted by Rishab (Post 703712)
when i type "source /path/toopenfoam/installation/etc/bashrc" this is what i get:-
bash: /path/toopenfoam/installation/etc/bashrc: No such file or directory

This you have to edit to customize to your installation.. It was not meant to be copy/pasted.. read it again.

Rishab August 24, 2018 07:22

Hi
when i type "mpirun -np 6 -hostfile machines simpleFoam -parallel" this is what i get:-


[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20848] [[33493,0],0] usock_peer_send_blocking: send() to socket 41 failed: Broken pipe (32)
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20848] [[33493,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20848] [[33493,0],0]-[[33493,1],0] usock_peer_accept: usock_peer_send_connect_ack failed
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client1
Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam
--------------------------------------------------------------------------
3 total processes failed to start



when i type "mpirun -np 6 -hostfile machines foamJob simpleFoam -parallel" this is what i get:-


Application : simpleFoam
Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 &
Application : simpleFoam
Application : simpleFoam
Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 &
Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 &
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client1
Executable: /opt/openfoam6/bin/foamJob
--------------------------------------------------------------------------
3 total processes failed to start



Along with a log file which says :-
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20554] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Rishab August 24, 2018 08:40

yes i did customise it and when i type "source /opt/openfoam6/etc/bashrc" nothing happens i dont get any output



then when i type "mpirun -np 6 -hostfile machines simpleFoam -parallel" this is what i get:-
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] usock_peer_send_blocking: send() to socket 39 failed: Broken pipe (32)
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0]-[[57312,1],0] usock_peer_accept: usock_peer_send_connect_ack failed
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 3; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).

Node: client
Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam

feacluster August 24, 2018 09:21

Quote:

Originally Posted by Rishab (Post 703800)
yes i did customise it and when i type "source /opt/openfoam6/etc/bashrc" nothing happens i dont get any output

I don't believe that is supposed to produce any output. It just sets the environment varibles. Put that source line in the bashrc of your master and slave node.

For example here's how the bashrc file looks like for me:

source /opt/apps/OpenFOAM/OpenFOAM-v1712/etc/bashrc

Rishab August 24, 2018 10:07

Hi
yes the line is there in bash file in both nodes ie master and client

source /opt/openfoam6/etc/bashrc this is the line in my case

feacluster August 24, 2018 10:47

Rank 3 is the process that starts on the slave. So probably some paths are different on the slave node. Is /opt a NFS share or did you install it separately on both machines?

At this point I would remove the installation from the slave and just make /opt an NFS share.

hokhay August 25, 2018 00:37

Quote:

Originally Posted by Rishab (Post 703812)
Hi
yes the line is there in bash file in both nodes ie master and client

source /opt/openfoam6/etc/bashrc this is the line in my case

Just to remind you that the source line needs to place at the first line in the bashrc file. Did you do that?

Rishab August 25, 2018 07:11

Hi,feacluster
opt is not a nfs shared....i have not installed nfs ive installed openfoam separately inside opt

Rishab August 25, 2018 07:14

hi,hokhay
yes i placed the source line in the first line of the bash file and also in etc/profile and also in ~/.profile this did the trick thanks a lot for you support feacluster and hokhay! u guys are my heroes!

Rishab August 25, 2018 07:21

Hi

the solver is running but i ran out of memory in just 10 iterations! can anyone explain why this is happening?
my case has approximately 7 million cells and my computer config is :-
4GB RAM
500GB HD
4core i5 Processor running at 3.8GHz ie four physical cores and two logical cores
i get a dialogue box after 10 iterations saying simpleFoam stopped unexpectedly because it ran out of memory!
if i have to increase memory what should i upgrade in my computers!?

my decomposeParDict file has the distributed option set to "no" is this causing this problem?

hokhay August 25, 2018 14:41

I think it's just purely too little RAM and 7 millions cell is not small number. My rough guess for 4GB ram, you maybe able to run 2millions cell simulation.

To be honest, your PC is not up to the job for practical CFD. A 7 millions cells sim would take at least 2 days to complete, depending on the convergence rate. You need to get a brand new PC with at least 16GB Ram

Rishab August 25, 2018 17:54

Hi
can you please explain a way to calculate these things maybe not exactly but roughly because this is just a coarse mesh and i will significantly refine my mesh for grid independence which will also increase the number of cells in future and i will also do some fluid structure interaction simulation.

so if i need to upgrade to a new computer i need to decide the specs, and im open to even buying a server or setup another cluster whichever gives me a better performance.

all i know is that if the number of cells reduce to less than 50k cells per processor openfoam does not solve it in parallel. my target is to solve 30 million cells in less than 2 hours. The university im studying right now is willing to fund for the setup.

so please advice and ur inputs and suggestions is highly appreciated.

regards
rishab

hokhay August 25, 2018 23:28

I don't think there is any way to calculate this. It is just from my experience.

For your reference, I am running a car aerodynamics steady state simulation with 35 millions cell on 12 server computers with total of 192 cores and this configuration takes about 18 hours to run 10000 iterations. They are 6 year old servers with E5-2650 cpu. The new AMD EYPC cpu could easily double the performance.

To finish a 30 millions cell sim in 2 hours, I guess you may need more powerfull servers than what I have and your simulation needs to convergence with less iterations. It is really case dependence.

Rishab August 26, 2018 07:12

so when u say server PC what do u exactly mean by it do u literally mean server or is it a PC if so can u please tell me the specs and how many memory slots for a 16core cpu and so on?

hokhay August 26, 2018 14:02

I mean server. The one I am using is PowerEdge R620. It is a dual cpu computer and total of 16 Ram slots. You can find the spec from Dell website. Also OpenFoam is I/O intensitive software, which means the memory bandwidth has a larger impact on performance than CPU.

I suggest you to read the following paper.
https://www.researchgate.net/publica...and_Don'ts

feacluster August 27, 2018 13:34

Quote:

Originally Posted by hokhay (Post 703878)
Just to remind you that the source line needs to place at the first line in the bashrc file. Did you do that?

Interesting, I don't have mine in the first line. Here's what my bashrc looks like:

source /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/mpivars.sh
source /opt/apps/OpenFOAM/OpenFOAM-v1712/etc/bashrc
export LD_LIBRARY_PATH=/opt/apps/intel:$LD_LIBRARY_PATH

export I_MPI_FABRICS=shm:dapl
export I_MPI_DAPL_PROVIDER=ofa-v2-ib0
export I_MPI_DYNAMIC_CONNECTION=0


All times are GMT -4. The time now is 16:47.