CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   MPI issues on debian Debian 11 bullseye (https://www.cfd-online.com/Forums/su2/244755-mpi-issues-debian-debian-11-bullseye.html)

flavio73 August 26, 2022 05:48

MPI issues on debian Debian 11 bullseye
 
Hi guys
I have compiled SU2 on a new machine running Debian GNU/Linux 11 (bullseye). I installed Open MPI 4.1.0 with apt-get install and compiled everything. Whenever I try to run the program I get an error concerning MPI_win_create.
I tried both SU2 version 7.3 and 7.4. I also tried mpi and mpich getting the same problem.
The version of the libraries installed on the machine are

libmpi.so.40.30.0
libmpich.so.12.1.10

This is weird! I have another machine running linux mint 19 (tessa) on which I compiled SU2 version 7.3 without problems. The only difference I can see is the MPI version that on the old machine is Open MPI version 3.

Does anyone found a similar behaviour ? Any hints on how to solve the problem ? Thanks in advance for the help you can give me
Flavio

Here is the message I get


flavio@cfd1 ~/prova $ mpirun -n 2 SU2_CFD inv_ONERAM6.cfg
[cfd1:151413] *** An error occurred in MPI_Win_create
[cfd1:151413] *** reported by process [1424949249,0]
[cfd1:151413] *** on communicator MPI_COMM_WORLD
[cfd1:151413] *** MPI_ERR_WIN: invalid window
[cfd1:151413] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[cfd1:151413] *** and potentially your MPI job)
[cfd1:151409] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[cfd1:151409] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages



PS
I also tried the pre-compiled version of SU2 which uses mpich. The program starts but then it always crashes!

pcg August 27, 2022 13:08

Hello,

I've had some problems on Ubuntu 22 with OpenMPI 4, related to HWLOC and something about 32 bit pci devices *shrug*.
Maybe it's the same for you but you have the warnings silenced, see here https://github.com/open-mpi/hwloc/issues/354

With mpich 4 I get the warnings but the code runs fine. How did you build SU2 with mpich? Be careful if you have open mpi alongside mpich.
This is my build command for mpich:
export CC=mpicc.mpich
export CXX=mpicxx.mpich

export CXXFLAGS="-march=native -funroll-loops -ffast-math -fno-finite-math-only"

./meson.py build --optimization=2 --warnlevel=3 --prefix=$PWD/build -Dcustom-mpi=true

If you find out the issue with Open MPI please update this thread.

flavio73 August 28, 2022 06:26

mpich ok !
 
Hi Pedro thanks a lot for your support.

I tried again implementing your hints. I had no success with openmpi. I alwasys get the same output without additional hints. However I used your example to recompile SU2 with mpich and its now working !!!

I have a last question concerning the -Dwith-omp option. Can I recompile the code with mpich and -Dwith-omp=true or the option is just for Openmpi ?

Thanks a lot fot your help

Flavio







Quote:

Originally Posted by pcg (Post 834638)
Hello,

I've had some problems on Ubuntu 22 with OpenMPI 4, related to HWLOC and something about 32 bit pci devices *shrug*.
Maybe it's the same for you but you have the warnings silenced, see here https://github.com/open-mpi/hwloc/issues/354

With mpich 4 I get the warnings but the code runs fine. How did you build SU2 with mpich? Be careful if you have open mpi alongside mpich.
This is my build command for mpich:
export CC=mpicc.mpich
export CXX=mpicxx.mpich

export CXXFLAGS="-march=native -funroll-loops -ffast-math -fno-finite-math-only"

./meson.py build --optimization=2 --warnlevel=3 --prefix=$PWD/build -Dcustom-mpi=true

If you find out the issue with Open MPI please update this thread.


pcg August 29, 2022 19:10

Hi Flavio,
Glad it works. Yes you can use mpich and openmp

CSMDakota November 29, 2022 20:04

Just tagging onto this thread; I observe the same fatal error when running the Quickstart in serial mode, but it runs just fine in parallel.

  • SU2 v7.4.0
  • Ubuntu 22.04
  • Open MPI 4.1.2
--Brandon--

STK July 12, 2023 04:54

Quote:

Originally Posted by CSMDakota (Post 840323)
Just tagging onto this thread; I observe the same fatal error when running the Quickstart in serial mode, but it runs just fine in parallel.

  • SU2 v7.4.0
  • Ubuntu 22.04
  • Open MPI 4.1.2
--Brandon--

Same here, I used
  • SU2 v7.5.1
  • Ubuntu 22.04
  • Open MPI 4.1.2
I noticed that if I run SU2_CFD.py without --parallel option, I got
Code:

ERROR : You are trying to launch a computation without initializing MPI but the wrapper has been built in parallel. Please add the --parallel option in order to initialize MPI for the wrapper.
Therefore I guessed there are no way to compile one SU2 build that works both in serial and parallel.
It seems awkward, since last time (somewhere between 2018 and 2020) I can compile SU2 with parallel support then run it in serial by simply typing SU2_CFD without any verbose mpirun command.

hconel December 25, 2023 10:03

Quote:

Originally Posted by flavio73 (Post 834662)
Hi Pedro thanks a lot for your support.

I tried again implementing your hints. I had no success with openmpi. I alwasys get the same output without additional hints. However I used your example to recompile SU2 with mpich and its now working !!!

I have a last question concerning the -Dwith-omp option. Can I recompile the code with mpich and -Dwith-omp=true or the option is just for Openmpi ?

Thanks a lot fot your help

Flavio

I was having the same problem and this solved it. Thanks so much.

DrRedskull March 2, 2024 02:18

I encountered the same fatal error following the quick compilating guide.
  • SU2 v8.0.1
  • Ubuntu 22.04
  • MPICH 4.0
In my case the issue was not having installed `pkg-config` and `libfabric`. So I deleted my entire earlier build and changed the configuration in meson.py and built it again as
HTML Code:

./meson.py build -Dcustom-mpi=true -Dextra-deps=mpich
sudo ./ninja -C build install

Note: I used sudo as there were some issues with polikit privileges when building in may case.
After installing Pkg-config and resolving the libfabric linking issue, the build was successful the issue was resolved.

TKatt April 3, 2024 10:02

@DrRedskull, concerning:

Quote:

I used sudo as there were some issues with polikit privileges when building in may case.
I guess you did not use the `--prefix=/some/install/dir` option for the `./meson.py ...` command. The default on Ubunutu e.g. is`/usr/local/bin` and that folder requires root privileges for any changes. I guess you could change that as a root user, but I would recommend just installing into a non-root folder and you do not have that issue. And I think it is better/safer to use sudo when you really don't need to on a regular basis.

I always just use `--prefix=$(pwd)` which expands to your SU2 code repository. So in you code directory will be a `bin`-folder with the binaries and you should not have to deal with sudo.


For what it is worth... I am dealing with the same issue as OP. Forcing `--mca osc ucx` fixes the problem but is not really satisfying. I am on openMPI 4.1.2 on WSL (Ubuntu).


All times are GMT -4. The time now is 15:29.