CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

mpirun detected that one or more processes exited... on remotely accessed

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree3Likes
  • 1 Post By Geb1313
  • 1 Post By mahsankhan
  • 1 Post By basma

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 18, 2020, 11:08
Default mpirun detected that one or more processes exited... on remotely accessed
  #1
New Member
 
Join Date: Mar 2020
Posts: 16
Rep Power: 5
Geb1313 is on a distinguished road
Hi,

I have been working with OpenFoam for the last few months. Now I wanted to continue working from home ( Home quarantined ) . I am accessing a workstation via Remote desktop connection from my laptop.

A parallel running of a dambreak case on 16 processors gives the following error:
"
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[27991,1],0]
Exit code: 145 "

Any help please!

Note that: a) similar cases were working fine when I was working in my office.
b) The same case is working fine in my laptop
c) The same case with a single processor is ok in the workstation
d) I am connected via VPN to the company's internet to access the workstation.
namsivag likes this.
Geb1313 is offline   Reply With Quote

Old   April 10, 2020, 15:05
Question I got the same error
  #2
New Member
 
Ahsan
Join Date: Nov 2019
Location: Bologna, Italy
Posts: 27
Rep Power: 5
mahsankhan is on a distinguished road
I ran:
decomposePar

then:
mpirun -np 3 interMixingFoam -parallel


And then I have got the same error:




Build : v1906 OPENFOAM=1906
Arch : "LSB;label=32;scalar=64"
Exec : interMixingFoam -parallel
Date : Apr 10 2020
Time : 20:37:29
Host : DESKTOP-KVKH8JA
PID : 2839
fileName::stripInvalid() called for invalid fileName /mnt/c/Users/AhsanKhan/Documents/Docs/theCase/5thCaseRun/mixingTank3DTurbulentDecomposed
For debug level (= 2) > 1 this is considered fatal
fileName::stripInvalid() called for invalid fileName /mnt/c/Users/AhsanKhan/Documents/Docs/theCase/5thCaseRun/mixingTank3DTurbulentDecomposed
For debug level (= 2) > 1 this is considered fatal
fileName::stripInvalid() called for invalid fileName /mnt/c/Users/AhsanKhan/Documents/Docs/theCase/5thCaseRun/mixingTank3DTurbulentDecomposed
For debug level (= 2) > 1 this is considered fatal
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[5421,1],1]
Exit code: 1




Kindly someone help me with this issue.
namsivag likes this.
mahsankhan is offline   Reply With Quote

Old   September 7, 2020, 08:45
Thumbs up
  #3
New Member
 
Ahsan
Join Date: Nov 2019
Location: Bologna, Italy
Posts: 27
Rep Power: 5
mahsankhan is on a distinguished road
I found the error. My computer's username has a space in it so that was the problem.
Unfortunately, the username can not be changed...
mahsankhan is offline   Reply With Quote

Old   September 7, 2020, 12:10
Default
  #4
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,554
Rep Power: 36
olesen will become famous soon enougholesen will become famous soon enough
Quote:
Originally Posted by mahsankhan View Post
I found the error. My computer's username has a space in it so that was the problem.
Unfortunately, the username can not be changed...
Looks like a windows mapping issue. The easy solution, which normally works fine, is to enable spaces.

In the etc/controlDict
Code:
InfoSwitches
{
    // The default ASCII write precision
    writePrecision  6;

...


    // Allow space character in fileName (use with caution)
    // Ignored (always 1) for Windows.
// Default:    allowSpaceInFileName    0;


    allowSpaceInFileName    1;
}


This will likely get it working for you.


/mark
olesen is offline   Reply With Quote

Old   February 8, 2021, 13:43
Default
  #5
New Member
 
Join Date: Jan 2021
Location: Edmonton
Posts: 4
Rep Power: 4
saavedra00 is on a distinguished road
Quote:
Originally Posted by Geb1313 View Post
Hi,

I have been working with OpenFoam for the last few months. Now I wanted to continue working from home ( Home quarantined ) . I am accessing a workstation via Remote desktop connection from my laptop.

A parallel running of a dambreak case on 16 processors gives the following error:
"
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[27991,1],0]
Exit code: 145 "

Any help please!

Note that: a) similar cases were working fine when I was working in my office.
b) The same case is working fine in my laptop
c) The same case with a single processor is ok in the workstation
d) I am connected via VPN to the company's internet to access the workstation.
Hello Geb1313,

Were you able to find a solution? My case is very similar. I am running the program in parallel and remotely with Chrome Remote Desktop. The error appears after several iterations.

Quote:
Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. ------------------------------------------------------- -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[51775,1],0]
Exit code: 144
saavedra00 is offline   Reply With Quote

Old   February 8, 2021, 16:32
Default
  #6
New Member
 
cem
Join Date: Feb 2021
Posts: 6
Rep Power: 4
sharkilwo5 is on a distinguished road
try to add a hostfile in the case folder and run the case as shown below

mpirun -np 4 --hostfile hostfile interMixingFoam -parallel
sharkilwo5 is offline   Reply With Quote

Old   February 8, 2021, 22:19
Default
  #7
New Member
 
Join Date: Jan 2021
Location: Edmonton
Posts: 4
Rep Power: 4
saavedra00 is on a distinguished road
Thanks for your suggestion Sharkilwo5. I am running in parallel but locally in a multiprocessor machine.

The problem was related to the stability. The case is constrained by the timestep.
saavedra00 is offline   Reply With Quote

Old   February 21, 2021, 03:38
Default
  #8
Member
 
Mahmoud
Join Date: Nov 2020
Location: United Kingdom
Posts: 42
Rep Power: 4
Mahmoud Abbaszadeh is on a distinguished road
Quote:
Originally Posted by saavedra00 View Post
Thanks for your suggestion Sharkilwo5. I am running in parallel but locally in a multiprocessor machine.

The problem was related to the stability. The case is constrained by the timestep.
Hi,

I have the same problem now. Could you please explain that what do you mean it is constrained by the timestep?

Cheers
Mahmoud Abbaszadeh is offline   Reply With Quote

Old   July 19, 2022, 20:12
Default
  #9
New Member
 
Basma Maged
Join Date: Dec 2021
Posts: 7
Rep Power: 3
basma is on a distinguished road
Quote:
Originally Posted by saavedra00 View Post
Thanks for your suggestion Sharkilwo5. I am running in parallel but locally in a multiprocessor machine.

The problem was related to the stability. The case is constrained by the timestep.
I have the same problem could you explain the solution please
sharkilwo5 likes this.
basma is offline   Reply With Quote

Old   July 21, 2022, 04:19
Default
  #10
New Member
 
cem
Join Date: Feb 2021
Posts: 6
Rep Power: 4
sharkilwo5 is on a distinguished road
Hi Basma,
Can you explain what error exactly you are getting? I will try to help you.
sharkilwo5 is offline   Reply With Quote

Old   April 29, 2023, 15:14
Default
  #11
New Member
 
Burak
Join Date: Nov 2012
Posts: 13
Rep Power: 12
Burak_1984 is on a distinguished road
Hi There


I am running a snappyHexMesh job on remote machine nodes and I have some similar error that shows up. The machines are at the university server remote machines so I cannot touch the installation.

Right now for the sake of trying out I am using the STL files from ďturbine sitingĒ and increasing the refinement level. Since I use the original tutorial file I know there is no problem with that one. After a refinement level ( in my case it is 5) I get a similar error like the posters previously.

My decomposition is hierarchical (4 7 1)

(Previous Stuff).....
Shell refinement iteration 2
----------------------------

Marked for refinement due to distance to explicit features : 0 cells.
Marked for refinement due to refinement shells : 2372394 cells.
Marked for refinement due to refinement shells : 2372394 cells.
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------

--------------------------------------------------------------------------
mpirun noticed that process rank 25 with PID 7837 on node a013 exited on signal 9 (Killed).

I am using 28 cores but this stuff happens in even 168 cores.I donít think computational power is the issue (My size is barely 2 million or so).The command I use is.

mpirun -np 28 snappyHexMesh -overwrite -case ď......./.../TurbineSitingDryRun_19042023/"

I am using openfoam-v2212 but have been experiencing the same problem with openfoam-v1812 so I donít think itís a library issue or installation issue.

I believe somehow the system is unintentionally caping some computational power which means itís crashing one note and exiting.

Can anyone offer any suggestions at this point? I appreciate all the help. Remember I cannot change that much on the installation so I have to do something on my end

Regards
Burak K.
Attached Files
File Type: zip slurm-36103out.zip (12.4 KB, 2 views)
Burak_1984 is offline   Reply With Quote

Reply

Tags
mpirun, parallel running

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
PEMFC model with FLUENT brahimchoice FLUENT 22 April 19, 2020 15:44
[ANSYS Meshing] Help with element size sandri_92 ANSYS Meshing & Geometry 14 November 14, 2018 07:54
fluent divergence for no reason sufjanst FLUENT 2 March 23, 2016 16:08
user subroutine error CFDUSER CFX 2 December 9, 2006 06:31
user defined function cfduser CFX 0 April 29, 2006 10:58


All times are GMT -4. The time now is 14:50.