OpenMPI error at the beginnin of parallel OpenFOAM Simulation
Hello everyone,
I am currently using a Deep Learning Tool (Tensorflow) to access an artificial neural network during my OpenFOAM simulation. To do so, I used the C API of Tensorflow and wrote my own code. I had to include some headers and link to some shared libraries, but everything went ok, also using parallel runs with OpenMPI. However now I wanted to increase the speed of the Tensorflow usage so I compiled it from source and activated AVX support (which is allowed on my CPU). Doing so I created new headers and .so-files. However, now the following situation occured: - Before the upgrade to AVX: Both single core runs as well as parallel simulation using mpirun worked without problems - After the upgrade to AVX: Single core runs perfect and 60 % faster during the ANN usage, however if I want to use mpirun on several cores I get this error (it repeates as often as the number of cores I want to use in parallel): Code:
[node134:10568] *** Process received signal *** - Strangely if I decompose my domain to 1 subdomain and do mpirun without the -parallel tag it works again Obviously this is an issue concerning mpirun. During the compilation of Tensorflow with AVX from source (using Google's bazel tool) I had to chose whether I want MPI support. Of course I said yes, and I entered the MPI Toolkit folder just as default: /usr Now I read in this post (https://www.cfd-online.com/Forums/op...-parallel.html) that there might be a conflict between OpenFOAM and Tensorflow trying to use different OpenMPI versions. Can you help me to fix it? I have to ask here because obviously people not using OpenFOAM seem to be unable to help me with this issue. Edit: I just recognized that if I want to do ./Allwmake in opt/OpenFOAM/ThirdParty, I get: Build MPI libraries if required + cd openmpi ./Allwmake: 78: cd: can't cd to openmpi + exit 1 |
Quick answer: In principle, you're using the same Open-MPI in the system... I'm assuming that MPICH2 is not installed at "/usr", given that mpirun gives you Open-MPI by default.
Building with another/custom Open-MPI version will not solve the issue. Since you are using OpenFOAM 4.1, it looks like you tripped over this bug: https://bugs.openfoam.org/view.php?id=2815 This bug fix was available in OpenFOAM 5, but not in 4.x. You have two choices:
|
Hello, thank you very much for your support!
Fortunately I did not have to make any changes (upgrade would have not been possible as 4.1 is the version used at the Institute I work at), as the error was in Tensorflow. The issue is solved here: https://github.com/tensorflow/tensorflow/issues/29838 Normally the issue should not occur any more as the Tensorflow issue was already solved and the changes were merged to Tensorflow's master. |
All times are GMT -4. The time now is 08:50. |