MPI issues on debian Debian 11 bullseye
Hi guys
I have compiled SU2 on a new machine running Debian GNU/Linux 11 (bullseye). I installed Open MPI 4.1.0 with apt-get install and compiled everything. Whenever I try to run the program I get an error concerning MPI_win_create. I tried both SU2 version 7.3 and 7.4. I also tried mpi and mpich getting the same problem. The version of the libraries installed on the machine are libmpi.so.40.30.0 libmpich.so.12.1.10 This is weird! I have another machine running linux mint 19 (tessa) on which I compiled SU2 version 7.3 without problems. The only difference I can see is the MPI version that on the old machine is Open MPI version 3. Does anyone found a similar behaviour ? Any hints on how to solve the problem ? Thanks in advance for the help you can give me Flavio Here is the message I get flavio@cfd1 ~/prova $ mpirun -n 2 SU2_CFD inv_ONERAM6.cfg [cfd1:151413] *** An error occurred in MPI_Win_create [cfd1:151413] *** reported by process [1424949249,0] [cfd1:151413] *** on communicator MPI_COMM_WORLD [cfd1:151413] *** MPI_ERR_WIN: invalid window [cfd1:151413] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [cfd1:151413] *** and potentially your MPI job) [cfd1:151409] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal [cfd1:151409] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages PS I also tried the pre-compiled version of SU2 which uses mpich. The program starts but then it always crashes! |
Hello,
I've had some problems on Ubuntu 22 with OpenMPI 4, related to HWLOC and something about 32 bit pci devices *shrug*. Maybe it's the same for you but you have the warnings silenced, see here https://github.com/open-mpi/hwloc/issues/354 With mpich 4 I get the warnings but the code runs fine. How did you build SU2 with mpich? Be careful if you have open mpi alongside mpich. This is my build command for mpich: export CC=mpicc.mpich export CXX=mpicxx.mpich export CXXFLAGS="-march=native -funroll-loops -ffast-math -fno-finite-math-only" ./meson.py build --optimization=2 --warnlevel=3 --prefix=$PWD/build -Dcustom-mpi=true If you find out the issue with Open MPI please update this thread. |
mpich ok !
Hi Pedro thanks a lot for your support.
I tried again implementing your hints. I had no success with openmpi. I alwasys get the same output without additional hints. However I used your example to recompile SU2 with mpich and its now working !!! I have a last question concerning the -Dwith-omp option. Can I recompile the code with mpich and -Dwith-omp=true or the option is just for Openmpi ? Thanks a lot fot your help Flavio Quote:
|
Hi Flavio,
Glad it works. Yes you can use mpich and openmp |
Just tagging onto this thread; I observe the same fatal error when running the Quickstart in serial mode, but it runs just fine in parallel.
|
Quote:
Code:
ERROR : You are trying to launch a computation without initializing MPI but the wrapper has been built in parallel. Please add the --parallel option in order to initialize MPI for the wrapper. It seems awkward, since last time (somewhere between 2018 and 2020) I can compile SU2 with parallel support then run it in serial by simply typing SU2_CFD without any verbose mpirun command. |
Quote:
|
I encountered the same fatal error following the quick compilating guide.
HTML Code:
./meson.py build -Dcustom-mpi=true -Dextra-deps=mpich After installing Pkg-config and resolving the libfabric linking issue, the build was successful the issue was resolved. |
@DrRedskull, concerning:
Quote:
I always just use `--prefix=$(pwd)` which expands to your SU2 code repository. So in you code directory will be a `bin`-folder with the binaries and you should not have to deal with sudo. For what it is worth... I am dealing with the same issue as OP. Forcing `--mca osc ucx` fixes the problem but is not really satisfying. I am on openMPI 4.1.2 on WSL (Ubuntu). |
All times are GMT -4. The time now is 15:29. |