foam-extend-3.2 Pstream: "MPI_ABORT was invoked"
I am having a similar related issue with foam-extend 3.2. It installed with no problems and runs in parallel using system Open MPI on a single node (up to 12 cores). But, when I try using more than 1 node I get the following MPI_ABORT:
Code:
-------------------------------------------------------------------------- Code:
OptimisationSwitches [ Moderator note: moved from http://www.cfd-online.com/Forums/ope...end-3-2-a.html ] |
foam-extend-3.2 Pstream: "MPI_ABORT was invoked"
Hi All,
I am having major problems getting foam-extend-3.2 running across multiple nodes on a cluster (actually, I have tried two different clusters with the same result). The code installed just fine and runs in serial and in parallel on a single node with descent scaling (so, MPI seems to be running on a single node just fine). However, as soon as I try to bridge multiple nodes, I get the following MPI_ABORT error as soon as simpleFoam (or other solvers that I have tested) enters the time loop: Code:
Starting time loop I noticed that the Pstream library changed locations from foam-extend-3.1 to foam-extend-3.2 and seems to have changed quite a bit. I wonder if that is part of the issue? |
Quick answer: Please try using the parallel testing utility that exists for OpenFOAM and foam-extend. Instructions for foam-extend are provided here: http://www.cfd-online.com/Forums/ope...tml#post560394 - post #12
The other possibility that comes to mind is that perhaps there is a shell environment flag that is automatically loading parts of shell environment variables for foam-extend only on the nodes, resulting in incompatible versions of simpleFoam being loaded. One test I usually do for this is to launch mpirun with a shell script that simply outputs the current shell environment into a log file, so that I can examine what the shell environment looks like on each launched process. For example, a script containing this: Code:
#!/bin/sh |
1 Attachment(s)
Hi Bruno,
Thanks for your recommendation. I used your script and looked at my shell environment, which looks fine to me (is showing the correct $PATH that includes foam-extend-3.2 for all processes). So, I don't think that's the problem. I had to slightly modify the parallelTest utility to get it compiled in foam-extend-3.2 since it appears as though Time.H no longer exists and I was getting "Time.H: No such file or directory." The source code is attached below. I ran parallelTest using MPI across multiple nodes (2 nodes with 12 cores each) and here is the resultant stderr: Code:
[13] slave sending to master 0 There are no error messages. But, since the output is not synchronized, it's difficult to tell whether there is a problem or not. Does anything pop out at you? Thanks for your help. I really appreciate it. |
Hi Brent,
The output from parallelTest seems OK. Since it didn't crash, this means that at least the basic communication is working as intended with foam-extend's own Pstream mechanism. I went back to see how you had tried to define the optimization flag and I then remembered that foam-extend does things a bit differently from OpenFOAM. Please check this post: http://www.cfd-online.com/Forums/ope...tml#post491522 - post #7 Oh, this is interesting... check this commit message as well: http://sourceforge.net/p/foam-extend...a0ca1f8ec3230/ If I understood it correctly, you can do the following: Code:
mpirun ... simpleFoam -parallel -OptimisationSwitches commsType=nonBlocking Bruno |
Hi Bruno,
Making sure commsType was set to 'nonBlocking' in this way seems to have solved my issue. Unfortunately, I wiped my previous test case where I was trying to set it in the case controlDict to see why that didn't work. But, regardless it is now working and I am happy! Thanks for your help with this! I really appreciate it. Thanks, Brent |
All times are GMT -4. The time now is 12:28. |