CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Run OpenFoam in 2 nodes of a cluster

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   April 22, 2016, 11:22
Default Run OpenFoam in 2 nodes of a cluster
  #1
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Hi I'm trying to set and run a simulation of OpenFoam in a cluster (2nodes 16cores per node).
I'm using qsub and a pbs script. However if I don't specify a host file it runs 32 thread in the first node.
When I specify an host file it doesn't run.

These are the commands in my startjob.pbs:

Code:
#!/bin/bash -l
#PBS -N AOC_OF
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI
cd $PBS_O_WORKDIR


#mpirun --hostfile hosts.txt -np 32 simpleFoam -parallel > log.simpleFoam_1st
mpirun --host node6,node7 -np 32 simpleFoam -parallel > log.simpleFoam_1st

This is the hosts.txt file in the main folder (the same of startjob):

Code:
node6
node7
where I'm doing wrong?
thanks in advance,
WhiteW

Last edited by WhiteW; May 2, 2016 at 06:38.
WhiteW is offline   Reply With Quote

Old   May 1, 2016, 05:35
Default
  #2
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Hi,
Has nobody tried to set OF to run on 2 or more nodes?
Thanks,
WhiteW
WhiteW is offline   Reply With Quote

Old   May 1, 2016, 17:26
Default
  #3
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,748
Blog Entries: 39
Rep Power: 103
wyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of light
Quick answer: Notes about running OpenFOAM in parallel - using foamExec might solve the issue. I know I have this written somewhere in a thread... here's an example: http://www.cfd-online.com/Forums/ope...tml#post504516 - post #7

Found it with the following on Google:
Code:
site:cfd-online.com "wyldckat" "foamExec" parallel
wyldckat is offline   Reply With Quote

Old   May 2, 2016, 06:37
Default
  #4
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Thanks for the help Bruno, I have added the path to foamExec in the comand, however it doesn't work.

When I run the startjob (qsub startjob.pbs) with inside the comand:

Code:
mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/FoamExec simpleFoam -parallel > log.simpleFoam_1st
The process is running in qstat (label R), but checking "qstat -n" it seems to run on node1 and node2 (in machines.txt I have specified node6 and node7). However if I enter in node1 and node2 OF is not running and an empty logfile of 0kb has been created.

When I try to run the mpi command directly from the frontend of the cluster, without using qsub, I have the following errors:

[node7:12516] Error: unknown option "--tree-spawn"
input in flex scanner failed
[node6:55424] Error: unknown option "--tree-spawn"
input in flex scanner failed


Have I to install foam-extend? I have only OpenFOAM 2.3.0 installed in the cluster.
Thanks,
WhiteW
WhiteW is offline   Reply With Quote

Old   May 3, 2016, 17:06
Default
  #5
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,748
Blog Entries: 39
Rep Power: 103
wyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of light
Quick answer: I suspect that the problem will reveal itself if you use this command:
Code:
mpirun --hostfile hosts.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1
Because:
  1. Your command had "FoamExec" instead of "foamExec".
  2. At the end of the line I added this:
    Code:
    2>&1
    which redirects the textual error output stream (2) into the standard text output stream (1). This way the actual error messages will be stored into the log file.
wyldckat is offline   Reply With Quote

Old   May 4, 2016, 03:24
Default
  #6
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Thanks for the explanation!
I have run the correct command and now the error is written in the logfile; however it reports the same error:

Code:
[node7:21261] Error: unknown option "--tree-spawn"
input in flex scanner failed
[node6:67200] Error: unknown option "--tree-spawn"
input in flex scanner failed
Could it be an OF problem or is it a mpirun issue?
WhiteW
WhiteW is offline   Reply With Quote

Old   May 6, 2016, 04:23
Default
  #7
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Hi banji. To write the output in a file you have to run:

Code:
mpirun -np 4 pisoFoamIPM -parallel >logfile.txt
[ Moderator note: this part of the text was copied to here: http://www.cfd-online.com/Forums/ope...tml#post599206 ]

--------------------------------------------------------------------------------------------------------------------------------------------------------

I'm still trying to run OF on the two nodes of the cluster, still no success.

Using:
Code:
mpirun -np 32 --hostfile hosts2.txt /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1
Error:
Code:
/home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec: line 145: exec: simpleFoam: not found
Using the --prefix option:
Code:
mpirun --hostfile hosts2.txt --prefix /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/ /home/whitew/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st >2&1
Error:

Code:
bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 33250) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
    node6 - daemon did not report back when launched
    node7 - daemon did not report back when launched
bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory
How to set correctly the LD_LIBRARY_PATH?

Someone solved, adding in the first line of bashrc:

Code:
source /whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
However, this has no effect..
Are there other solutions to solve the problem?
Thanks,
WhiteW

Last edited by wyldckat; May 8, 2016 at 15:23. Reason: see "Moderator note:"
WhiteW is offline   Reply With Quote

Old   May 6, 2016, 04:34
Default
  #8
Member
 
Olabanji
Join Date: Jan 2013
Location: U.S.A
Posts: 31
Rep Power: 5
banji is on a distinguished road
Quote:
Originally Posted by WhiteW View Post
[CODE]
Code:
bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 33250) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
    node6 - daemon did not report back when launched
    node7 - daemon did not report back when launched
bash: /opt/ofed154/mpi/gcc/openmpi-1.4.3/bin/bin/orted: No such file or directory
How to set correctly the LD_LIBRARY_PATH?

Someone solved, adding in the first line of bashrc:

Code:
source /whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
However, this has no effect..
Are there other solutions to solve the problem?
Thanks,
WhiteW
I once had a similar problem and noticed it was as a result of conflicting libraries - local vs OpenFoam's. I'd suggest you recompile the source pack again (if this won't cause too much pain).

Last edited by wyldckat; May 8, 2016 at 15:19. Reason: removed excess part of the quote and left only the essential part
banji is offline   Reply With Quote

Old   May 8, 2016, 15:13
Default
  #9
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,748
Blog Entries: 39
Rep Power: 103
wyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of light
Quote:
Originally Posted by WhiteW View Post
Code:
source /whitew/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
However, this has no effect..
Are there other solutions to solve the problem?
Thanks,
WhiteW
Quick answer: I'm so sorry, I completely forgot about what installation instructions you were following

Please run the following commands, which will populate the "prefs.sh" file at OpenFOAM's "etc" folder:
Code:
cd /home/whitew/OpenFOAM/OpenFOAM-2.3.0/etc/
echo export WM_NCOMPPROCS=4 > prefs.sh
echo export foamCompiler=ThirdParty >> prefs.sh
echo export WM_COMPILER=Gcc48 >> prefs.sh
echo export WM_MPLIB=SYSTEMOPENMPI >> prefs.sh
This will fix the problem with mpirun being a bit clueless about which specific settings to use with OpenFOAM.

As for foamExec, there is a small fix that might be useful to do as well:
  1. Edit the file "bin/foamExec" inside the "OpenFOAM-2.3.0" folder.
  2. In the very first line, change this:
    Code:
    #!/bin/sh
    to this:
    Code:
    #!/bin/bash
This will ensure that the correct shell is used for launching OpenFOAM's utilities.
wyldckat is offline   Reply With Quote

Old   May 10, 2016, 08:46
Default
  #10
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Hi, I have written the modifications you suggested.
Now using the source in bashrc (source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc) and the modifications in foamExec and prefs.sh

Error:
Code:
/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found
That line of setting.sh reports:
Code:
libDir=`mpicc --showme:link | sed -e 's/.*-L\([^ ]*\).*/\1/'`

Then I have compiled prefs.sh with chmod +x and run. The error now is:

Code:
home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found
/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Data unpack had inadequate space (-25) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Data unpack had inadequate space (-25) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Data unpack had inadequate space" (-25) instead of "Success" (0)
--------------------------------------------------------------------------
[node3:35896] [[21228,1],16] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35896] *** An error occurred in MPI_Init
[node3:35896] *** on a NULL communicator
[node3:35896] *** Unknown error
[node3:35896] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: node3
  PID:        35896

--------------------------------------------------------------------------
[node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35897] [[21228,1],17] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35899] [[21228,1],19] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35900] [[21228,1],20] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35898] [[21228,1],18] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35901] [[21228,1],21] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35902] [[21228,1],22] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35905] [[21228,1],23] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35909] [[21228,1],26] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
[node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 117
[node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174
[node3:35910] [[21228,1],27] ORTE_ERROR_LOG: Data unpack had inadequate space in file runtime/orte_init.c at line 128
--------------------------------------------------------------------------
mpirun has exited due to process rank 19 with PID 35899 on
node node3 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[frontend:20585] 9 more processes have sent help message help-orte-runtime.txt / orte_init:startup:internal-failure
[frontend:20585] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[frontend:20585] 9 more processes have sent help message help-orte-runtime / orte_init:startup:internal-failure
[frontend:20585] 9 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure
[frontend:20585] 9 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
[frontend:20585] 9 more processes have sent help message help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
Thanks for the help!
WhiteW
WhiteW is offline   Reply With Quote

Old   May 10, 2016, 10:00
Default
  #11
Senior Member
 
Mahdi Hosseinali
Join Date: Apr 2009
Location: NB, Canada
Posts: 261
Rep Power: 10
anishtain4 is on a distinguished road
To run my simulations on a SunGridEngine I use the following script:
Quote:
#$ -cwd
#$ -l h_rt=47:0:0
#$ -pe ompi* 20

module purge
module load gcc/4.6.4 openmpi/gcc openfoam/3.0
source $OPENFOAM/etc/bashrc

mpirun pimpleFoam -parallel
However you are running on a PBS which may have a different set of commands so the set up might be different (I guess it should only be the top part)
anishtain4 is offline   Reply With Quote

Old   May 10, 2016, 10:11
Default
  #12
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Hi anishtain4, yes, using pbs it works only if I sent the job in a node (not more that one).
Here is the setting of the pbs files:

If I send the job from the frontend to 1 node it works:
Code:
#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI
cd $PBS_O_WORKDIR

mpirun -np 32 simpleFoam -parallel > log.simpleFoam_1st
Sending the job to two nodes it gives errors:
Code:
#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=16:NRM
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=ThirdParty WM_COMPILER=Gcc48 WM_MPLIB=SYSTEMOPENMPI
cd $PBS_O_WORKDIR

mpirun -np 16--hostfile hosts2.txt /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1
WhiteW is offline   Reply With Quote

Old   May 10, 2016, 18:35
Default
  #13
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,748
Blog Entries: 39
Rep Power: 103
wyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of lightwyldckat is a glorious beacon of light
Quick answer @WhiteW: Hopefully the following will do the trick:
  1. Edit the file "foamExec" once again.
  2. Near the end of the file you will find these two commands:
    Code:
    sourceRc
    exec "$@"
  3. Add before the first line the line needed for loading the respective module:
    Code:
    module load openmpi-x86_64
    sourceRc
    exec "$@"
  4. Save the file and close the editor.
As for the job script, here is what I suggest that you use for each scenario, at least based on what you wrote:
  • One node:
    Code:
    #!/bin/bash -l
    #PBS -N AOC_OF_14_M4
    #PBS -S /bin/bash
    #PBS -l nodes=2:ppn=16
    #PBS -l walltime=999:00:00
    
    module load openmpi-x86_64
    source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
    cd $PBS_O_WORKDIR
    
    mpirun -np 32 simpleFoam -parallel > log.simpleFoam_1st 2>&1
  • Two nodes:
    Code:
    #!/bin/bash -l
    #PBS -N AOC_OF_14_M4
    #PBS -S /bin/bash
    #PBS -l nodes=1:ppn=16:NRM
    #PBS -l walltime=999:00:00
    
    module load openmpi-x86_64
    source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
    cd $PBS_O_WORKDIR
    
    mpirun -np 16 --hostfile hosts2.txt $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1
    Notes:
    1. You had a small flaw, because you had this:
      Code:
      16--hostfile
      when it should be this:
      Code:
      16 --hostfile
      Namely, a space was missing.
    2. Using "$WM_PROJECT_DIR" is only to make it easier to read, because the path should expand automatically to the correct path. You can see this by running:
      Code:
      echo $WM_PROJECT_DIR
Nonetheless, I believe that there is something incorrectly configured in the line "#PBS -l nodes", at least based on what you provided.
wyldckat is offline   Reply With Quote

Old   May 11, 2016, 11:55
Default
  #14
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Thanks!
I have now added the string to foamExec.
However when I run using qsub with the file startjob.pbs:

Code:
#!/bin/bash -l
#PBS -N AOC_OF_14_M4
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=999:00:00

module load openmpi-x86_64
source /home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
cd $PBS_O_WORKDIR

mpirun -np 32 --hostfile hosts2.txt $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1
I get the error in logfile:
Code:
/home/whiteW/OpenFOAM/OpenFOAM-2.3.0/etc/config/settings.sh: line 384: mpicc: command not found
The strange thing is that in hosts2.txt I have specified:
Code:
node2 cpu=16
node3 cpu=16
qstat -n tells me the processes are running on node1 and node2. However in these nodes there are no runing process.
WhiteW is offline   Reply With Quote

Old   May 13, 2016, 00:37
Default
  #15
Senior Member
 
Mahdi Hosseinali
Join Date: Apr 2009
Location: NB, Canada
Posts: 261
Rep Power: 10
anishtain4 is on a distinguished road
I'm not an expert in clusters, so these are things that I'm guessing:

1. what is the :NRM after your nodes when running on one node? I noticed it is missing when you are running with two nodes. It's not related? No?

2. I'm not sure if your host file is defined correctly or not? I think it needs to be a list of node names on your cluster, for example mine looks something like this:
Quote:
"cl083.acenet.ca.7353"
"cl083.acenet.ca.7354"
"cl084.acenet.ca.26982"
"cl084.acenet.ca.26983"
...
anishtain4 is offline   Reply With Quote

Old   May 13, 2016, 03:01
Default
  #16
Member
 
Join Date: Dec 2015
Posts: 40
Rep Power: 3
WhiteW is on a distinguished road
Hi anishtain4 thanks for the reply,
NRM means that the nodes are automatically assigned to the job, hence you don't have to specify specific nodes.
The job starts correctly if I don't specify the option --hostfile hosts2.txt, however it assignes 32 thread to only one node, he does not divide the processes in two nodes (16 on node1 and 16 on node2)

using both:
Code:
#PBS -l nodes=node1:ppn=16+node2:ppn=16
or
Code:
#PBS -l nodes=2:ppn=16
and

Code:
mpirun -np 32 $WM_PROJECT_DIR/bin/foamExec simpleFoam -parallel > log.simpleFoam_1st 2>&1
the job starts with 32 processes in the first node.
The error about "mpicc: command not found" seems to appear when I specify the --hostfile option.

In the file hosts.txt I have written the name of the nodes as reportd in the file etc/hosts :

Code:
...
192.168.0.1     node1
192.168.0.2     node2
192.168.0.3     node3
192.168.0.4     node4
192.168.0.5     node5
192.168.0.6     node6
192.168.0.7     node7

WhiteW
WhiteW is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to run SU2 on cluster of computers (How to specify nodes ?) aero_amit SU2 22 May 15, 2016 23:00
Can not run OpenFOAM in parallel in clusters, help! ripperjack OpenFOAM Running, Solving & CFD 5 May 6, 2014 15:25
OpenFOAM solvers not able to run in parallel raagh77 OpenFOAM Installation 5 November 27, 2013 18:05
Something weird encountered when running OpenFOAM in parallel on multiple nodes xpqiu OpenFOAM Running, Solving & CFD 2 May 2, 2013 04:59
Unable to run OF in parallel on a multiple-node cluster quartzian OpenFOAM 3 November 24, 2009 14:37


All times are GMT -4. The time now is 10:11.