CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Killing an parallel computing with mpirun

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By haakon

Reply
 
LinkBack Thread Tools Display Modes
Old   April 2, 2013, 03:59
Default Killing an parallel computing with mpirun
  #1
Member
 
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 52
Rep Power: 5
malaboss is on a distinguished road
Hi FOAMers,
I am working on 2D cases in parallel.
The machine I use is shared between several colleagues and I need to stop my simulations during the night (in my case, at 11PM). Then I can restart my computations at 7 AM.

I wrote a script which stop my computations with the command :

Code:
kill $MpirunPID
Everything is fine there

At 7AM, the script asks for a restart from the latest time (I just call mpirun -np $numberofProcessors $solver)
And sometimes it tells me that the folder associated to the latest time is not complete (for example it says :

Code:
--> FOAM FATAL IO ERROR: 
cannot find file

file: /home/OpenFOAM/host-2.1.1/run/cylindre/turbulence/Spalart/pimple/domaine_elargi/cylindreRE1000000_79/68.0095/p at line 0.

    From function regIOobject::readStream()
    in file db/regIOobject/regIOobjectRead.C at line 73.

FOAM exiting
What is really weird is that sometimes, everything is fine and sometimes it gives this message ...
When I receive this error message, I must erase the latest time and rerun the case.

Thus I have 2 questions, and I hope you can help me :
1) How can I tell to OpenFoam that when I stop the computations I want it to finish the iteration ?
2) Why sometimes calling "kill" works and sometimes it stops the computations in the middle of the iteration ? Am I just lucky ?

Thank you all for your help !
malaboss is offline   Reply With Quote

Old   April 2, 2013, 04:43
Default
  #2
Senior Member
 
Håkon Strandenes
Join Date: Dec 2011
Location: Norway
Posts: 111
Rep Power: 11
haakon will become famous soon enough
The concept here is that by default, killing a process will stop it immediately, independent on what it is doing. That means that you can be lucky, when you kill it after it has finished a write, but before a new write is started, you can restart from the last write. However, you can also be unlucky and kill it in the middle of a write operation, and in that case you have an incomplete timestep, from which you cannot restart.

The solution here is to use OpenFOAMs run-time controls. I suggest you look at these pages:

http://www.openfoam.org/version2.1.0...me-control.php
http://www.openfoam.org/version2.2.0...me-control.php

You will need to set the stopAtWriteNowSignal to a positive integer, and send the same signal to your process when you want it to stop. It will then nicely write the latest timestep to disk and stop.

Disclaimer: I have never tried this personally, and does not know details on how this work, however it looks promising.
malaboss likes this.
haakon is offline   Reply With Quote

Old   April 2, 2013, 10:24
Default
  #3
Member
 
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 52
Rep Power: 5
malaboss is on a distinguished road
Hi,
Thank you so much for the links.

I added this code in my controlDict file :

Code:
OptimisationSwitches
{
    // Force dumping (at next timestep) upon signal
    writeNowSignal              10;
    // Force dumping (at next timestep) and clean exit upon signal
    stopAtWriteNowSignal        20; //-1;
}
When I kill the process named mpirun with a kill -10 everything goes fine, an the iteration is complete.
However, when I tried to use kill -20, it did not kill anything. Actually, nothing happened and the process was still running. I may not have completely understood what is stopAtWriteNowSignal.

If you have the answer, I would be very delighted.
Thank you anyway !
malaboss is offline   Reply With Quote

Old   April 2, 2013, 10:42
Default
  #4
Senior Member
 
Håkon Strandenes
Join Date: Dec 2011
Location: Norway
Posts: 111
Rep Power: 11
haakon will become famous soon enough
I did a quick test and found that if I only set
Code:
OptimisationSwitches
{
    // Force dumping (at next timestep) and clean exit upon signal
    stopAtWriteNowSignal        10;
}
I could send signal 10 to the mpirun process with
Code:
kill -s 10 $MPIRUN_PID
and OpenFOAM will nicely write the solution filed and abort. The terminal log from OpenFOAM is:
Code:
Courant Number mean: 0.0028915643 max: 0.15907544
DILUPBiCG:  Solving for Ux, Initial residual = 6.2541889e-07, Final residual = 6.2541889e-07, No Iterations 0
DILUPBiCG:  Solving for Uz, Initial residual = 9.1704148e-05, Final residual = 6.8430736e-08, No Iterations 1
GAMG:  Solving for p, Initial residual = 0.036746401, Final residual = 0.0012739916, No Iterations 3
GAMG:  Solving for p, Initial residual = 0.0030208887, Final residual = 0.00011714219, No Iterations 7
time step continuity errors : sum local = 5.6965912e-13, global = -7.8049053e-17, cumulative = -2.6564765e-16
GAMG:  Solving for p, Initial residual = 0.0022172423, Final residual = 0.00010582563, No Iterations 10
mpirun: Forwarding signal 10 to job
sigStopAtWriteNow : setting up write and stop at end of the next iteration

GAMG:  Solving for p, Initial residual = 0.00024891902, Final residual = 8.8121468e-08, No Iterations 22
time step continuity errors : sum local = 4.2722368e-16, global = -1.0474626e-18, cumulative = -2.6669511e-16
ExecutionTime = 6.86 s  ClockTime = 9 s

Time = 3.285

Courant Number mean: 0.0028915635 max: 0.15907251
DILUPBiCG:  Solving for Ux, Initial residual = 6.2474242e-07, Final residual = 6.2474242e-07, No Iterations 0
DILUPBiCG:  Solving for Uz, Initial residual = 9.1592459e-05, Final residual = 6.7962237e-08, No Iterations 1
GAMG:  Solving for p, Initial residual = 0.036745335, Final residual = 0.001272349, No Iterations 3
GAMG:  Solving for p, Initial residual = 0.0030201551, Final residual = 0.00011688914, No Iterations 7
time step continuity errors : sum local = 5.6825884e-13, global = -5.5901766e-17, cumulative = -3.2259688e-16
GAMG:  Solving for p, Initial residual = 0.0022140212, Final residual = 0.00010510169, No Iterations 10
GAMG:  Solving for p, Initial residual = 0.00024748843, Final residual = 7.5154998e-08, No Iterations 23
time step continuity errors : sum local = 3.6429105e-16, global = -9.8798764e-19, cumulative = -3.2358487e-16
ExecutionTime = 8.48 s  ClockTime = 11 s

End

Finalising parallel run
I don't know if signal 20 is reserved for something, overriden by the operating system or something else. Anyways, you only need one signal, right?
haakon is offline   Reply With Quote

Old   April 2, 2013, 11:04
Default
  #5
Member
 
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 52
Rep Power: 5
malaboss is on a distinguished road
Yup my problem is already resolved, but I found strange the problem with -20
I switched the value for the signals so that I have
Code:
OptimisationSwitches
{
    // Force dumping (at next timestep) upon signal
    writeNowSignal              20;
    // Force dumping (at next timestep) and clean exit upon signal
    stopAtWriteNowSignal        10; //-1;
}
kill -10 still works and kill -20 doesn't...
By "it doesn't work" I mean : when I use -20 in this case, the simulation doesn't stop but doesn't write any other folder either for new times.

Thank you for your help. At least my first problem is now solved !
malaboss is offline   Reply With Quote

Old   April 3, 2013, 03:37
Default
  #6
Member
 
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 52
Rep Power: 5
malaboss is on a distinguished road
The script went all right tonight kill -10 calling stopAtWriteNowSignal.
Thanks again !
malaboss is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
unchangeable continuity residuals in parallel computing wlt_1985 FLUENT 0 August 1, 2011 12:15
Diffusion equation solved using Parallel Computing Sachin Paramane Main CFD Forum 0 June 11, 2007 23:48
Parallel Computing on Multi-Core Processors Upgrading Hardware CFX 6 June 7, 2007 15:54
Parallel Computing peter Main CFD Forum 7 May 15, 2006 09:53
Parallel Computing Classes at San Diego Supercomputer Center Jan. 20-22 Amitava Majumdar Main CFD Forum 0 January 5, 1999 13:00


All times are GMT -4. The time now is 13:31.