CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Programming & Development

OpenMPI error at the beginnin of parallel OpenFOAM Simulation

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By tre95

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 16, 2019, 07:36
Default OpenMPI error at the beginnin of parallel OpenFOAM Simulation
  #1
New Member
 
Elias Trautner
Join Date: Jun 2019
Posts: 4
Rep Power: 6
tre95 is on a distinguished road
Hello everyone,


I am currently using a Deep Learning Tool (Tensorflow) to access an artificial neural network during my OpenFOAM simulation. To do so, I used the C API of Tensorflow and wrote my own code. I had to include some headers and link to some shared libraries, but everything went ok, also using parallel runs with OpenMPI.


However now I wanted to increase the speed of the Tensorflow usage so I compiled it from source and activated AVX support (which is allowed on my CPU). Doing so I created new headers and .so-files. However, now the following situation occured:


- Before the upgrade to AVX: Both single core runs as well as parallel simulation using mpirun worked without problems
- After the upgrade to AVX: Single core runs perfect and 60 % faster during the ANN usage, however if I want to use mpirun on several cores I get this error (it repeates as often as the number of cores I want to use in parallel):
Code:
[node134:10568] *** Process received signal ***
[node134:10568] Signal: Segmentation fault (11)
[node134:10568] Signal code: Address not mapped (1)
[node134:10568] Failing at address: (nil)
[node134:10568] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7fac03c53f20]
[node134:10568] [ 1] /home/elias/OpenFOAM/elias-4.1/platforms/linux64GccDPInt32Opt/lib/libtensorflow_framework.so.1(hwloc_bitmap_and+0x14)[0x7fabe8f05534]
[node134:10568] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_hwloc_base_filter_cpus+0x380)[0x7fabcccbab80]
[node134:10568] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_ess_pmi.so(+0x2b4e)[0x7fabcbbe6b4e]
[node134:10568] [ 4] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(orte_init+0x22e)[0x7fabccf0e1de]
[node134:10568] [ 5] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_mpi_init+0x30e)[0x7fabe70a027e]
[node134:10568] [ 6] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Init+0x6b)[0x7fabe70c12ab]
[node134:10568] [ 7] /opt/OpenFOAM/OpenFOAM-4.1/platforms/linux64GccDPInt32Opt/lib/openmpi-system/libPstream.so(_ZN4Foam8UPstream4initERiRPPc+0x1f)[0x7fac03a0c43f]
[node134:10568] [ 8] /opt/OpenFOAM/OpenFOAM-4.1/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam7argListC1ERiRPPcbbb+0x719)[0x7fac04e1aed9]
[node134:10568] [ 9] tabulatedCombustionFoam(+0x279b8)[0x559e1bd079b8]
[node134:10568] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fac03c36b97]
[node134:10568] [11] tabulatedCombustionFoam(+0x30a0a)[0x559e1bd10a0a]
[node134:10568] *** End of error message ***

- Strangely if I decompose my domain to 1 subdomain and do mpirun without the -parallel tag it works again


Obviously this is an issue concerning mpirun. During the compilation of Tensorflow with AVX from source (using Google's bazel tool) I had to chose whether I want MPI support. Of course I said yes, and I entered the MPI Toolkit folder just as default: /usr


Now I read in this post (Problems running OpenFOAM 2.3 in parallel) that there might be a conflict between OpenFOAM and Tensorflow trying to use different OpenMPI versions. Can you help me to fix it? I have to ask here because obviously people not using OpenFOAM seem to be unable to help me with this issue.


Edit: I just recognized that if I want to do ./Allwmake in opt/OpenFOAM/ThirdParty, I get:
Build MPI libraries if required

+ cd openmpi
./Allwmake: 78: cd: can't cd to openmpi
+ exit 1

Last edited by tre95; June 16, 2019 at 11:40.
tre95 is offline   Reply With Quote

Old   June 16, 2019, 14:27
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quick answer: In principle, you're using the same Open-MPI in the system... I'm assuming that MPICH2 is not installed at "/usr", given that mpirun gives you Open-MPI by default.

Building with another/custom Open-MPI version will not solve the issue.

Since you are using OpenFOAM 4.1, it looks like you tripped over this bug: https://bugs.openfoam.org/view.php?id=2815

This bug fix was available in OpenFOAM 5, but not in 4.x. You have two choices:
  1. Upgrade to OpenFOAM 5 or 6.
  2. Or manually apply these changes that fix the bug: https://github.com/OpenFOAM/OpenFOAM...e1546a51ecc090
__________________
wyldckat is offline   Reply With Quote

Old   June 17, 2019, 06:36
Default
  #3
New Member
 
Elias Trautner
Join Date: Jun 2019
Posts: 4
Rep Power: 6
tre95 is on a distinguished road
Hello, thank you very much for your support!


Fortunately I did not have to make any changes (upgrade would have not been possible as 4.1 is the version used at the Institute I work at), as the error was in Tensorflow. The issue is solved here:


https://github.com/tensorflow/tensorflow/issues/29838


Normally the issue should not occur any more as the Tensorflow issue was already solved and the changes were merged to Tensorflow's master.
wyldckat likes this.
tre95 is offline   Reply With Quote

Reply

Tags
mpi error


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology wyldckat OpenFOAM 17 November 10, 2017 15:54
OpenFOAM 4.0 Released CFDFoundation OpenFOAM Announcements from OpenFOAM Foundation 2 October 6, 2017 05:40
[OpenFOAM.org] OpenFOAM 2.4.0 OpenMPI Epoll warning on parallel job Talder OpenFOAM Installation 3 November 15, 2015 12:24
Explicitly filtered LES saeedi Main CFD Forum 16 October 14, 2015 11:58
Can not run OpenFOAM in parallel in clusters, help! ripperjack OpenFOAM Running, Solving & CFD 5 May 6, 2014 15:25


All times are GMT -4. The time now is 20:34.