CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Installation (http://www.cfd-online.com/Forums/openfoam-installation/)
-   -   MPI Environment Variables (http://www.cfd-online.com/Forums/openfoam-installation/57440-mpi-environment-variables.html)

braennstroem November 30, 2007 12:43

Hi, the cluster-installatio
 
Hi,

the cluster-installation makes some minor trouble. The parallel run using openmpi does not work as wanted. foamInstallationTest gives me:

$JAVA_PATH ...pb367/OpenFOAM/linux64/j2sdk1.4.2_05 no no
$MICO_ARCH_PATH ...ico-2.3.12/platforms/linux64GccDPOpt yes yes yes
$LAM_ARCH_PATH --------- env variable not set --------- yes
$MPICH_ARCH_PATH --------- env variable not set --------- no
-------------------------------------------------------------------------------


Checking the FOAM env variables set on the LD_LIBRARY_PATH...
-------------------------------------------------------------------------------
Environment_variable Set_to_file_or_directory Valid Path Crit
-------------------------------------------------------------------------------
$FOAM_LIBBIN ...M/OpenFOAM-1.4.1/lib/linux64GccDPOpt yes yes yes
$FOAM_USER_LIBBIN ...OAM/ppb367-1.4.1/lib/linux64GccDPOpt yes yes no
$LAM_ARCH_PATH --------- env variable not set --------- yes

The environmentals did not get set somehow...
otherwise the installation seems to be running. Does anyone experienced similar problem and could help?

Greetings!
Fabian

mighelone November 30, 2007 13:29

Hi Fabian, What kind of pro
 
Hi Fabian,

What kind of problem with openmpi?

I guess that the environmental variable about MPICH and LAM are not necessary if you use openmpi!?!

braennstroem November 30, 2007 15:09

Hi Michele, yes, I think yo
 
Hi Michele,

yes, I think you are right; I thought mpich is openmpi... kind of stupid. The problem is that I am not able get the parallel run going using a pbs queueing system and my first idea was those missing env. variables. I try to capture more infos...
Thanks!
Fabian

asaha December 2, 2007 12:48

Network communication can be a
 
Network communication can be a hurdle in getting the parallel run. I resolved recently the issue by stopping the iptables service in fedora machines.

braennstroem December 5, 2007 08:28

Hi, it seems, that the prob
 
Hi,

it seems, that the problem is related to my shell. I have a 'sh' shell as default and I am somehow not able to distribute the needed enviornment variables. Setting the source call for openfoam's variables in the calculation script just works on the first node of all the nodes selected by 'qsub'. An idea was now, to adjust the mpirun command with an additional argument which does the sourcing on all machines!? For lam there exists such an option, which adjust the ssh call, but until now I could not find any for openmpi... Does anyone have an idea?

Fabian

Btw. Changing the shell seems to be too much work and I am running openfoam version 1.4.1

mighelone December 5, 2007 08:44

Hi Fabian! I have the same
 
Hi Fabian!

I have the same problem with torque/PBS.
PBS load only a predefined PATH, so you need to define the OpenFOAM path in other way.

In my PBS script I add the following line to call the openfoam bashrc script

source ~/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/bashrc

But the invocation of a script in a PBS script can be done only if torque is compiled with the following options:

./configure --enable-shell-use-argv --disable-shell-pipe

braennstroem December 11, 2007 09:02

Hi, thanks Michele, but it
 
Hi,

thanks Michele, but it seems to be a more difficult tasks, than switching the shell (and this even won't work...). It seems that openmpi does not support such a environment transfering option. So my last chance is lam, but than I have to recompile everything!?

Fabian

braennstroem December 20, 2007 05:12

Hi, it works now using:
 
Hi,

it works now using:

1 #!/bin/bash
2
3 eval "module() { eval `/opt/modules/3.1.6/bin/modulecmd bash \$*\`; }"
4
5
6 #module rm mpi
7 module load mpi/openmpi-1.2.3
...

but it is very, very slow!?
A simpleFoam case with about about 1.5 Polycells and 8.5Mio points on 12 Cpus takes for the first iteration:
ExecutionTime = 48.87 s ClockTime = 411 s
132 DILUPBiCG: Solving for Ux, Initial residual = 1, Final residual = 0.00421735, No Iterations 1
133 DILUPBiCG: Solving for Uy, Initial residual = 1, Final residual = 0.00440672, No Iterations 1
134 DILUPBiCG: Solving for Uz, Initial residual = 1, Final residual = 0.00413678, No Iterations 1
135 GAMG: Solving for p, Initial residual = 1, Final residual = 9.9972e-08, No Iterations 21
136 time step continuity errors : sum local = 8.07473e-08, global = -3.90179e-09, cumulative = -3.90179e-09
137 DILUPBiCG: Solving for epsilon, Initial residual = 0.0157856, Final residual = 0.000105516, No Iterations 1
138 bounding epsilon, min: -0.101362 max: 213.297 average: 0.0641029
139 DILUPBiCG: Solving for k, Initial residual = 1, Final residual = 0.00670869, No Iterations 1
140 ExecutionTime = 48.87 s ClockTime = 411 s

and the clocktime does not get smaller.

On one CPU with 1.4.1-dev AMG it takes:
ExecutionTime = 133.78 s ClockTime = 134 s

Maybe, the case is to small for 12 cpus, but I actually started from a 8.5 Million Tetra mesh and did not expect such a huge reduction in polies.
There might be any option to switch an additional option for infiniband!?

Would be nice, if anyone has any suggestions!
Greetings!
Fabian

braennstroem February 8, 2008 10:19

Hi, me again... It seems tha
 
Hi,
me again...
It seems that I used a wrong script running the case;
the actual mpirun-call is:
mpirun -np 12 simpleFoam . E1_SIMPLE_SEHR_POLY_GAMG_HWW_Vergleich -parallel > log

but I am missing the hostfile, which makes a problem as you can see on the used hosts:


23 Exec : simpleFoam . E1_SIMPLE_SEHR_POLY_GAMG_HWW_Vergleich -parallel
24 [0] Date : Dec 17 2007
25 [0] Time : 06:41:34
26 [0] Host : noco001.nec
27 [0] PID : 28456
28 [1] Time : 06:41:34
29 [1] Host : noco001.nec
30 [1] PID : 28457
31 [3] Date : Dec 17 2007
32 [3] Time : 06:41:34
33 [3] Host : noco001.nec
34 [3] PID : 28459
35 [5] Date : Dec 17 2007
36 [5] Time : 06:41:34
37 [5] Host : noco001.nec
38 [5] PID : 28461
39 [6] Date : Dec 17 2007
40 [6] Time : 06:41:34
41 [6] Host : noco001.nec
42 [6] PID : 28462
43 [7] Date : Dec 17 2007
44 [7] Time : 06:41:34
45 [7] Host : noco001.nec
46 [7] PID : 28463
47 [8] Date : Dec 17 2007
48 [8] Time : 06:41:34
49 [8] Host : noco001.nec
50 [8] PID : 28464
51 [9] Date : Dec 17 2007
52 [9] Time : 06:41:34
53 [9] Host : noco001.nec
54 [9] PID : 28465
55 [10] Date : Dec 17 2007
56 [10] Time : 06:41:34
57 [10] Host : noco001.nec
58 [10] PID : 28466
59 [11] Date : Dec 17 2007
60 [11] Time : 06:41:34
61 [11] Host : noco001.nec
62 [11] PID : 28467

Obviously, this does not make the calculation faster. But using the '--hostfile $PBS_NODEFILE' is not working at all...!?

Fabian


All times are GMT -4. The time now is 19:44.