CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Installation

mpirun problems

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree2Likes
  • 1 Post By rgarcia
  • 1 Post By wyldckat

Reply
 
LinkBack Thread Tools Display Modes
Old   May 4, 2010, 08:01
Default mpirun problems
  #1
Senior Member
 
Join Date: Feb 2010
Posts: 175
Rep Power: 6
vaina74 is on a distinguished road
I installed OpenFOAM-1.6.x and something strange happened. If I launch a parallel running:
Code:
foamJob -p -s simpleFoam
I obtain
Code:
mpirun noticed that process rank 1 with PID [4 digits] on node xxx-laptop
exited on signal 11 (segmentation fault)
and the Ubuntu freezes!
Then I followed a test procedure (see here, post 19-20) and the output seemed correct. I runned the case in parallel mode again and all was ok. A heisenbug, it was suggested.
Now the problem came back, the parallel test output is:
Code:
Parallel processing using OPENMPI with 2 processors
Executing: mpirun -np 2 /home/giulia/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log
Building on  2  cores
Building on  2  cores
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  1.6.x                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-069803848c44
Exec   : parallelTest -parallel
Date   : May 04 2010
Time   : 13:44:38
Host   : giulia-laptop
PID    : 2150
Case   : /home/giulia/OpenFOAM/giulia-1.6.x/run/hydrofoil_0
nProcs : 2
Slaves : 
1
(
giulia-laptop.2151
)

Pstream initialized with:
    floatTransfer     : 0
    nProcsSimpleSum   : 0
    commsType         : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

[1] [0] 
Starting transfers
[1] 
[1] slave sending to master 0
[1] slave receiving from master 0

Starting transfers
[0] 
[0] master receiving from slave 1
[0] (0 1 2)
[0] master sending to slave 1
[1] (0 1 2)
End

Finalising parallel run
but when I run my case I always obtain
Code:
mpirun noticed that process rank 1 with PID [4 digits] on node  xxx-laptop
exited on signal 11 (segmentation fault)
Please, help me!
vaina74 is offline   Reply With Quote

Old   May 4, 2010, 08:19
Default
  #2
Senior Member
 
Join Date: Feb 2010
Posts: 175
Rep Power: 6
vaina74 is on a distinguished road
mh. Maybe it's an amount of memory question, but I can't understand why I had no problems before. I'm not an expert of Ubuntu, can anyone help me?
vaina74 is offline   Reply With Quote

Old   May 4, 2010, 09:41
Default
  #3
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 6,997
Blog Entries: 32
Rep Power: 69
wyldckat is a jewel in the roughwyldckat is a jewel in the roughwyldckat is a jewel in the rough
Hello Maurizio, it's me again

Uhm, you didn't elaborate on what happened last time. Possibly it's a swap problem; read the Swap FAQ at help.ubuntu and increase your Ubuntu's swap size.

Then try again to crash your Ubuntu

Best regards,
Bruno

PS: later in the day I'll review the post you made on how to have a side-by-side OpenFOAM 1.6 + 1.6.x installation

Last edited by wyldckat; May 4, 2010 at 09:42. Reason: serious typo... typed Ubuntu instead of OpenFOAM :P
wyldckat is offline   Reply With Quote

Old   May 4, 2010, 10:48
Default
  #4
Senior Member
 
Join Date: Feb 2010
Posts: 175
Rep Power: 6
vaina74 is on a distinguished road
You are my angel, do you know it?
I expanded the notebook memory, adding a 512 Mb swap file. And now mpirun works! Well, I was afraid of having to install my (few and not so smart) neurones on my notebook
Thank you very much, Bruno.
vaina74 is offline   Reply With Quote

Old   May 4, 2010, 17:19
Default
  #5
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 6,997
Blog Entries: 32
Rep Power: 69
wyldckat is a jewel in the roughwyldckat is a jewel in the roughwyldckat is a jewel in the rough
You're welcome and I'm glad it actually wasn't an heisenbug

By the way, I don't remember seeing this written in OpenFOAM's forum, nor on the unofficial openfoamwiki.net, but by my experience, there is a minimum amount of RAM specifically required for doing a full build of OpenFOAM. The magic number is somewhere between 1.3GiB and 1.5GiB of RAM, and swap won't cover that necessity!!

Best regards,
Bruno

Last edited by wyldckat; May 4, 2010 at 17:19. Reason: typo...
wyldckat is offline   Reply With Quote

Old   May 31, 2011, 06:20
Default
  #6
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 5
rgarcia is on a distinguished road
Hey guys!

I want to run a simulation through a bash script. The aim is to simulate wind coming from 16 differents directions. When I run the simulation (16 cases one after another) and I use a first order scheme for divergence, there is no problem. Nevertheless, when I run the same simulation in a second order scheme, my computer stop running and I have to reboot it. The error I obtain is:

-----------------------------------------------------------------------------------------------------------------------------------------------
mpirun noticed that process rank 5 with PID 1890 on node cener-desktop exited on signal 11 (segmentation fault)
-----------------------------------------------------------------------------------------------------------------------------------------------

As I understand by your message, it could be a problem of RAM or swap memory although I have 15 GB of RAM memory and 12 GB of Swap space, so I think that the memory shouldn't be a problem!

Do you have any idea???

Thanks a lot!

PS: I don't know if it matters, but I use "mpirun -np 8 simpleFoam -parallel" to run the simulations
ebrahim27 likes this.
rgarcia is offline   Reply With Quote

Old   May 31, 2011, 06:33
Default
  #7
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 6,997
Blog Entries: 32
Rep Power: 69
wyldckat is a jewel in the roughwyldckat is a jewel in the roughwyldckat is a jewel in the rough
Greetings rgarcia and welcome to the forum!

Mmm, have you tried running in serial to see if that order works at all?
Try monitoring how much RAM the simpleFoam processes are using and see if it crashes when they were increasing RAM ocupation. Another problem could be insufficient contiguous memory, i.e., allocating 3GB in a single matrix on RAM, when there are the RAM is loaded with various processes that occupy in various locations... although I haven't seen many problems like that lately...

Yet another possibility is that there isn't enough MPI buffer length for communication. That's definable... in "OpenFOAM-*/etc/settings.sh" if I'm not mistaken. I would have to verify the variable name, but right now I can't.

Good luck!
Bruno
wyldckat is offline   Reply With Quote

Old   May 31, 2011, 08:25
Default
  #8
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 5
rgarcia is on a distinguished road
Hey Bruno!

Thanks for your quick reply!

In serial it works good... but it takes very long! The thing that I don't understand is that when I do some directions it works (in parallel) but sometimes it didn't...

It has to be a reason but it seems random! I'm becoming crazy!
Aparently, the problem is combining second order and parallel running... (any second order schemes work well for the 16 directions)

If you have any more suggestion I'll be glad to receive it! In any case, thank you very much!
rgarcia is offline   Reply With Quote

Old   June 1, 2011, 04:03
Default
  #9
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 6,997
Blog Entries: 32
Rep Power: 69
wyldckat is a jewel in the roughwyldckat is a jewel in the roughwyldckat is a jewel in the rough
Hi rgarcia,

  • Edit the file OpenFOAM*/etc/settings.sh;
  • Find the lines that have this:
    Code:
    # Set the minimum MPI buffer size (used by all platforms except SGI MPI)
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    : ${minBufferSize:=20000000}
    
    
    if [ "${MPI_BUFFER_SIZE:=$minBufferSize}" -lt $minBufferSize ]
    then
        MPI_BUFFER_SIZE=$minBufferSize
    fi
    export MPI_BUFFER_SIZE
  • Change 20000000 to 200000000.
  • Save the file.
  • Start a new terminal and try running it in parallel again.
Other possibilities is to try and divide the mesh in fewer or more sub-domains.
And have you checked the sanity of the mesh, by running checkMesh?

Other than these, it could have to do with boundary conditions or some configuration you're overlooking, something like maxCo or some other thing like that

Best regards,
Bruno
sunshuai likes this.
wyldckat is offline   Reply With Quote

Old   June 2, 2011, 04:59
Default
  #10
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 5
rgarcia is on a distinguished road
Quote:
Originally Posted by wyldckat View Post
Hi rgarcia,

  • Edit the file OpenFOAM*/etc/settings.sh;
  • Find the lines that have this:
    Code:
    # Set the minimum MPI buffer size (used by all platforms except SGI MPI)
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    : ${minBufferSize:=20000000}
    
    
    if [ "${MPI_BUFFER_SIZE:=$minBufferSize}" -lt $minBufferSize ]
    then
        MPI_BUFFER_SIZE=$minBufferSize
    fi
    export MPI_BUFFER_SIZE
  • Change 20000000 to 200000000.
  • Save the file.
  • Start a new terminal and try running it in parallel again.
Hi Bruno,

Has you recommend, I'm trying to change the settings.sh file, but I can't:

rgarcia@cener-desktop:/opt/openfoam171/etc$ chmod +x settings.sh
chmod: cambiando los permisos de «settings.sh»: Operación no permitida
rgarcia@cener-desktop:/opt/openfoam171/etc$ chmod +w settings.sh
chmod: cambiando los permisos de «settings.sh»: Operación no permitida

I can copy the file to another folder and change it but then i'm not able to paste it again...

I had already tried the other suggestions you made and it doesn't seems to have anything to do with that!

Thanks again Bruno!
rgarcia is offline   Reply With Quote

Old   June 3, 2011, 09:31
Default
  #11
Member
 
ubald's Avatar
 
Nicolas Lussier Clément
Join Date: Apr 2009
Location: Montréal, Qc, Canada
Posts: 43
Rep Power: 7
ubald is on a distinguished road
Hi rgarcia

how many cell do you have ?
It work with first order and parallel but not with second
order and parallel that is what you say?
Are you using some custom code ? Did you try without it ?

Regards

Bruno, I'd like to ask you a question:

What is the meaning of minBufferSize:=20000000 ??
What dos it limit ?

Regards

Nicolas Lussier
ubald is offline   Reply With Quote

Old   June 4, 2011, 08:21
Default
  #12
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 6,997
Blog Entries: 32
Rep Power: 69
wyldckat is a jewel in the roughwyldckat is a jewel in the roughwyldckat is a jewel in the rough
Greetings to all!

@rgarcia: you should run like this:
Code:
sudo chmod o+w settings.sh
The sudo command will request your password to run the application as superuser, namely as root. This is necessary because the /opt folder is a system folder, from where everyone can read and execute, but only the root user can make changes to the files.
As for "o+w", this will give the proper permission for you to edit the file directly without sudo. After changing it, you can use the option "o-w" to revert the change.


@Nicolas: MPI_BUFFER_SIZE indicates the minimum message size in bytes required for communications between MPI processes.
I'm suggesting this solution in an attempt to check if it's an MPI related problem or an OpenFOAM problem.


@rgarcia: In a related note, you might also want to create a small case in the mean time that reproduces this same problem, because it might be necessary to report this as a bug, after we've tried to isolate the problem.
But still, after increasing the message size, trying with fewer cores is also a good idea, in an attempt to isolate the problem.

Best regards,
Bruno
wyldckat is offline   Reply With Quote

Old   June 6, 2011, 03:40
Default
  #13
New Member
 
Join Date: May 2011
Posts: 8
Rep Power: 5
rgarcia is on a distinguished road
Greetings Nicolas and Bruno!

@Bruno: Finally I could change de MPI_BUFFER_SIZE but apparently It's not a problem of message size. Where should I write a repport for my bug?

@Nicolas: I try two cases, one very simple (50000 cells) and the other 500000 cells. I wrote a bash that allows me to do a rose wind study. The study begins at 0 direction (adding the velocity components in /0/U) and after 1000 iterations, it change to the direction 22.5, etc. (16 directions in total). I'm custom model in turbulence.

The case work in second order for coarser and finest grid... but until 4 processor! If I run the case with 5, 6, 7 or 8 processor it didn't work! And the message that always appear it's "mpirun... signal 11 (Segmentation fault)".
rgarcia is offline   Reply With Quote

Old   June 6, 2011, 16:09
Default
  #14
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 6,997
Blog Entries: 32
Rep Power: 69
wyldckat is a jewel in the roughwyldckat is a jewel in the roughwyldckat is a jewel in the rough
Hi rgarcia,

OK, you can report the possible bug here: http://www.openfoam.com/bugs/
Giving a small test case and making a full description of the problem is the best thing to do.

On a side note, OpenFOAM has some issues with patches that are divided between sub-domains. I suspect that this may be the problem that is occurring here.

I vaguely remember that there is an option for enforcing patches to not be split apart... you can start reading here: snnappyHexMesh with cyclic boundary conditions

Best regards,
Bruno
wyldckat is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Needed Benchmark Problems for FSI Mechstud Main CFD Forum 4 July 26, 2011 12:13
MPIRUN fails lfbarcelo OpenFOAM 3 March 29, 2010 07:41
what is wrong with the mpirun parameter -mca ? donno OpenFOAM 6 March 24, 2010 17:00
Problems with mpirun duderino OpenFOAM 17 February 5, 2010 13:00
Some problems with Star CD Micha CD-adapco 0 August 6, 2003 13:55


All times are GMT -4. The time now is 14:31.