CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Problems running OpenFOAM 2.3 in parallel (https://www.cfd-online.com/Forums/openfoam-solving/133913-problems-running-openfoam-2-3-parallel.html)

vinz April 22, 2014 11:42

Problems running OpenFOAM 2.3 in parallel
 
Dear all,

Since I installed OpenFOAM 2.3 I've not been able to use it in parallel.
I don't know why. It's been working perfectly for years with the previous versions and this one is giving me headache with two different machines.

I am using Ubuntu 12.04, and I get the following error as soon as I try to run in parallel (this exemple is with Allrun in motorbike tutorial, but it's the same for every solver):

Quote:

--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: carbon
Framework: crs
Component: none
--------------------------------------------------------------------------
[carbon:17798] *** Process received signal ***
[carbon:17798] Signal: Segmentation fault (11)
[carbon:17798] Signal code: Address not mapped (1)
[carbon:17798] Failing at address: 0x28
[carbon:17798] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x2ada7058c4a0]
[carbon:17798] [ 1] /usr/lib/libopen-pal.so.0(mca_base_select+0x108) [0x2ada72d29518]
[carbon:17798] [ 2] /usr/lib/libopen-pal.so.0(opal_crs_base_select+0x7e) [0x2ada72d3b90e]
[carbon:17798] [ 3] /usr/lib/libopen-pal.so.0(opal_cr_init+0x31e) [0x2ada72d1a0ee]
[carbon:17798] [ 4] /usr/lib/libopen-pal.so.0(opal_init+0x159) [0x2ada72d19a59]
[carbon:17798] [ 5] /usr/lib/libopen-rte.so.0(orte_init+0x4d) [0x2ada72ac5a0d]
[carbon:17798] [ 6] /usr/lib/libmpi.so.0(+0x362e1) [0x2ada7221a2e1]
[carbon:17798] [ 7] /usr/lib/libmpi.so.0(MPI_Init+0x16b) [0x2ada7223b3fb]
[carbon:17798] [ 8] /home/vincent/OpenFOAM/OpenFOAM-2.3.0/platforms/linux64GccDPOpt/lib/openmpi-1.6.5/libPstream.so(_ZN4Foam8UPstream4initERiRPPc+0xd) [0x2ada7091b9bd]
[carbon:17798] [ 9] /home/vincent/OpenFOAM/OpenFOAM-2.3.0/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZN4Foam7argListC1ERiRPPcbbb+0xb32) [0x2ada6f458fc2]
[carbon:17798] [10] snappyHexMesh() [0x411d6a]
[carbon:17798] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x2ada7057776d]
[carbon:17798] [12] snappyHexMesh() [0x416c3d]
[carbon:17798] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 17795 on node carbon exited on signal 11 (Segmentation fault).
Does someone have an idea of what's going one?
Regarding the setup I used the source files and compiled everything. After few times I managed to get no compilation errors but I am not able to run the cases in parallel yet.

Thanks for your help

Vincent

wyldckat April 22, 2014 14:28

Greetings Vincent,

Which installation instructions did you follow?

Because according to the output you've provided, the problem is that the shell environment is configured to using the custom Open-MPI 1.6.5 that comes with OpenFOAM's ThirdParty package, but it's instead using the "libmpi.so" library present in your system, which is not compatible.

Best regards,
Bruno

chrisb2244 April 23, 2014 01:19

Possibly on the same topic, does OF-2.3.0 have a higher requirement of some kind for the version of OpenMPI?

Currently I have an installation of OF-2.3.0 on the cluster I work with, and for values of $NSLOTS less than or equal to 14, everything works perfectly.
When I try and run with more then 14 processors, I get errors like:

Code:

qrsh_starter: executing child process (null) failed: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 13339) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.

The version of OpenMPI on the cluster is 1.6.3. OF is configured with
FOAM_MPI = openmpi-system
and
WM_MPLIB = SYSTEMOPENMPI

With both 14 and 80 processors, the mpirun command is used via a qsub'd script (Sun Grid Engine)

I'm further confused about the number 14. The cluster contains a collection of nodes, each with two 8-core processors, ie, 16 processing cores per node. Consequently, a limit of 16 would make me think I have problems communicating between nodes (although I have password-less ssh connections), but 14 seems a little peculiar. Edit: Pretty sure this is actually due to memory limits - the amount of memory I requested was slightly higher than the mem/proc available, so only 14 of the 16 cores could be used, since 14 * mem/proc was all of the memory on the node. So I guess this isn't curious at all, just when I ask for a 15th processor, it requires a second node.

It's been a little while since I tried, but I'm pretty sure under OF-2.2.2 I had 32 cores working without issue.

Best,
Christian

vinz April 23, 2014 02:42

Quote:

Originally Posted by wyldckat (Post 487560)
Greetings Vincent,

Which installation instructions did you follow?

Because according to the output you've provided, the problem is that the shell environment is configured to using the custom Open-MPI 1.6.5 that comes with OpenFOAM's ThirdParty package, but it's instead using the "libmpi.so" library present in your system, which is not compatible.

Best regards,
Bruno

Dear Bruno,

Thanks for your reply.
Actually, I would like to run my system mpirun, which is the one I normally used with the previous versions of OpenFOAM.
But even explicitely calling the system mpirun (/usr/bin/mpirun -np 6 snappyHexMesh -parallel) I get a similar error:
Code:


--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      carbon
Framework: crs
Component: none
--------------------------------------------------------------------------
[carbon:22893] *** Process received signal ***
[carbon:22893] Signal: Segmentation fault (11)
[carbon:22893] Signal code: Address not mapped (1)
[carbon:22893] Failing at address: 0x28
[carbon:22893] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x2aff99ca4cb0]
[carbon:22893] [ 1] /usr/lib/libopen-pal.so.0(mca_base_select+0x108) [0x2aff99a41518]
[carbon:22893] [ 2] /usr/lib/libopen-pal.so.0(opal_crs_base_select+0x7e) [0x2aff99a5390e]
[carbon:22893] [ 3] /usr/lib/libopen-pal.so.0(opal_cr_init+0x31e) [0x2aff99a320ee]
[carbon:22893] [ 4] /usr/lib/libopen-pal.so.0(opal_init+0x159) [0x2aff99a31a59]
[carbon:22893] [ 5] /usr/lib/libopen-rte.so.0(orte_init+0x4d) [0x2aff997dea0d]
[carbon:22893] [ 6] /usr/bin/mpirun() [0x402fe5]
[carbon:22893] [ 7] /usr/bin/mpirun() [0x402b34]
[carbon:22893] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x2aff99ed376d]
[carbon:22893] [ 9] /usr/bin/mpirun() [0x402a59]
[carbon:22893] *** End of error message ***
Segmentation fault (core dumped)

This is the reason why I tried to add the thirdParty libs to my path.
Regarding, the instructions I tried to follow the ones I found on openfoam.com:
http://www.openfoam.org/download/source.php

What do you suggest to fix this setup?

Some more information, this is my LD_LIBRARY_PATH:
Code:

echo $LD_LIBRARY_PATH
/home/vincent/OpenFOAM/ThirdParty-2.3.0/platforms/linux64Gcc/CGAL-4.3/lib:/home/vincent/OpenFOAM/ThirdParty-2.3.0/platforms/linux64Gcc/ParaView-4.1.0/lib/paraview-4.1:/home/vincent/OpenFOAM/OpenFOAM-2.3.0/platforms/linux64GccDPOpt/lib/openmpi-1.6.5:/home/vincent/OpenFOAM/ThirdParty-2.3.0/platforms/linux64GccDPOpt/lib/openmpi-1.6.5:/home/vincent/OpenFOAM/ThirdParty-2.3.0/platforms/linux64Gcc/openmpi-1.6.5/lib:/home/vincent/OpenFOAM/ThirdParty-2.3.0/platforms/linux64Gcc/openmpi-1.6.5/lib64:/home/vincent/OpenFOAM/vincent-2.3.0/platforms/linux64GccDPOpt/lib:/home/vincent/OpenFOAM/site/2.3.0/platforms/linux64GccDPOpt/lib:/home/vincent/OpenFOAM/OpenFOAM-2.3.0/platforms/linux64GccDPOpt/lib:/home/vincent/OpenFOAM/ThirdParty-2.3.0/platforms/linux64GccDPOpt/lib:/home/vincent/OpenFOAM/OpenFOAM-2.3.0/platforms/linux64GccDPOpt/lib/dummy
(carbon) ~/OpenFOAM/vincent-2.3.0/run/tutorials/incompressible/simpleFoam/motorBike > ls -latr /home/vincent/OpenFOAM/ThirdParty-2.3.0/platforms/linux64GccDPOpt/lib/openmpi-1.6.5

I don't see why it is looking into /usr/lib

wyldckat April 25, 2014 15:04

Greetings to all!

@Christian: If I read your post correctly, you figured out that the problem was that more memory was need than there was available on the 1st node. Therefore, mystery solved :)


@Vincent: If you followed the instructions from http://www.openfoam.org/download/source.php - and did not modify the setting in the variable "WM_MPLIB" to "SYSTEMOPENMPI", in the file "$HOME/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc", then you have a conflict of settings, because you've built OpenFOAM with the custom Open-MPI and then you're trying to use the system's Open-MPI, which is likely incompatible. To know which mpirun it's being used, run:
Code:

which mpirun
Now, if you did properly modify OpenFOAM's "bashrc" file, then it might be something else. How have you set-up the OpenFOAM shell environment to be activated? Namely, did you add this line to the end of your personal "~/.bashrc" file?
Code:

source $HOME/OpenFOAM/OpenFOAM-2.3.0/etc/bashrc
Best regards,
Bruno

tre95 June 16, 2019 10:32

Hello wyldckat, I have a similar issue, thread:

https://www.cfd-online.com/Forums/op...imulation.html
It would be very nice if you could check it out and see whether you can help me to get rid of the bug. Thanks in advance!

mm66 December 2, 2019 16:49

Problems running OpenFOAM 2.3 in parallel
 
I am trying to run OpenFOAM while sharing the resources between two computers. I included the hostfile but am getting the following error:

Code:

[vm2:26669] *** Process received signal ***
[vm2:26669] Signal: Segmentation fault (11)
[vm2:26669] Signal code: Address not mapped (1)
[vm2:26669] Failing at address: 0x5634a8006d6e
[vm2:26669] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f51f2147890]
[vm2:26669] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x3d)[0x7f51f1ddb98d]
[vm2:26669] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_argv_free+0x29)[0x7f51f23a2519]
[vm2:26669] [ 3] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(+0x283cb)[0x7f51f262e3cb]
[vm2:26669] [ 4] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(orte_util_add_hostfile_nodes+0xc1)[0x7f51f262f3f1]
[vm2:26669] [ 5] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(orte_ras_base_allocate+0xd3d)[0x7f51f26607fd]
[vm2:26669] [ 6] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_libevent2022_event_base_loop+0xdc9)[0x7f51f23ba209]
[vm2:26669] [ 7] mpirun(+0x74a3)[0x5634a6d7e4a3]
[vm2:26669] [ 8] mpirun(+0x5aea)[0x5634a6d7caea]
[vm2:26669] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f51f1d65b97]
[vm2:26669] [10] mpirun(+0x59ea)[0x5634a6d7c9ea]
[vm2:26669] *** End of error message ***

How can this be resolved?

PS: I am using OpenFOAM v1812:
Code:

$echo $WM_MPLIB
SYSTEMOPENMPI

$echo $FOAM_MPI
openmpi-system


mm66 December 3, 2019 11:03

I figured out what was wrong. In the host file I was using this format:

Code:

user@ip cpu=N
whereas it should have been:

Code:

ip cpu=N


All times are GMT -4. The time now is 01:21.