|
[Sponsors] |
April 2, 2013, 03:59 |
Killing an parallel computing with mpirun
|
#1 |
Member
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 53
Rep Power: 13 |
Hi FOAMers,
I am working on 2D cases in parallel. The machine I use is shared between several colleagues and I need to stop my simulations during the night (in my case, at 11PM). Then I can restart my computations at 7 AM. I wrote a script which stop my computations with the command : Code:
kill $MpirunPID At 7AM, the script asks for a restart from the latest time (I just call mpirun -np $numberofProcessors $solver) And sometimes it tells me that the folder associated to the latest time is not complete (for example it says : Code:
--> FOAM FATAL IO ERROR: cannot find file file: /home/OpenFOAM/host-2.1.1/run/cylindre/turbulence/Spalart/pimple/domaine_elargi/cylindreRE1000000_79/68.0095/p at line 0. From function regIOobject::readStream() in file db/regIOobject/regIOobjectRead.C at line 73. FOAM exiting When I receive this error message, I must erase the latest time and rerun the case. Thus I have 2 questions, and I hope you can help me : 1) How can I tell to OpenFoam that when I stop the computations I want it to finish the iteration ? 2) Why sometimes calling "kill" works and sometimes it stops the computations in the middle of the iteration ? Am I just lucky ? Thank you all for your help ! |
|
April 2, 2013, 04:43 |
|
#2 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
The concept here is that by default, killing a process will stop it immediately, independent on what it is doing. That means that you can be lucky, when you kill it after it has finished a write, but before a new write is started, you can restart from the last write. However, you can also be unlucky and kill it in the middle of a write operation, and in that case you have an incomplete timestep, from which you cannot restart.
The solution here is to use OpenFOAMs run-time controls. I suggest you look at these pages: http://www.openfoam.org/version2.1.0...me-control.php http://www.openfoam.org/version2.2.0...me-control.php You will need to set the stopAtWriteNowSignal to a positive integer, and send the same signal to your process when you want it to stop. It will then nicely write the latest timestep to disk and stop. Disclaimer: I have never tried this personally, and does not know details on how this work, however it looks promising. |
|
April 2, 2013, 10:24 |
|
#3 |
Member
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 53
Rep Power: 13 |
Hi,
Thank you so much for the links. I added this code in my controlDict file : Code:
OptimisationSwitches { // Force dumping (at next timestep) upon signal writeNowSignal 10; // Force dumping (at next timestep) and clean exit upon signal stopAtWriteNowSignal 20; //-1; } However, when I tried to use kill -20, it did not kill anything. Actually, nothing happened and the process was still running. I may not have completely understood what is stopAtWriteNowSignal. If you have the answer, I would be very delighted. Thank you anyway ! |
|
April 2, 2013, 10:42 |
|
#4 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
I did a quick test and found that if I only set
Code:
OptimisationSwitches { // Force dumping (at next timestep) and clean exit upon signal stopAtWriteNowSignal 10; } Code:
kill -s 10 $MPIRUN_PID Code:
Courant Number mean: 0.0028915643 max: 0.15907544 DILUPBiCG: Solving for Ux, Initial residual = 6.2541889e-07, Final residual = 6.2541889e-07, No Iterations 0 DILUPBiCG: Solving for Uz, Initial residual = 9.1704148e-05, Final residual = 6.8430736e-08, No Iterations 1 GAMG: Solving for p, Initial residual = 0.036746401, Final residual = 0.0012739916, No Iterations 3 GAMG: Solving for p, Initial residual = 0.0030208887, Final residual = 0.00011714219, No Iterations 7 time step continuity errors : sum local = 5.6965912e-13, global = -7.8049053e-17, cumulative = -2.6564765e-16 GAMG: Solving for p, Initial residual = 0.0022172423, Final residual = 0.00010582563, No Iterations 10 mpirun: Forwarding signal 10 to job sigStopAtWriteNow : setting up write and stop at end of the next iteration GAMG: Solving for p, Initial residual = 0.00024891902, Final residual = 8.8121468e-08, No Iterations 22 time step continuity errors : sum local = 4.2722368e-16, global = -1.0474626e-18, cumulative = -2.6669511e-16 ExecutionTime = 6.86 s ClockTime = 9 s Time = 3.285 Courant Number mean: 0.0028915635 max: 0.15907251 DILUPBiCG: Solving for Ux, Initial residual = 6.2474242e-07, Final residual = 6.2474242e-07, No Iterations 0 DILUPBiCG: Solving for Uz, Initial residual = 9.1592459e-05, Final residual = 6.7962237e-08, No Iterations 1 GAMG: Solving for p, Initial residual = 0.036745335, Final residual = 0.001272349, No Iterations 3 GAMG: Solving for p, Initial residual = 0.0030201551, Final residual = 0.00011688914, No Iterations 7 time step continuity errors : sum local = 5.6825884e-13, global = -5.5901766e-17, cumulative = -3.2259688e-16 GAMG: Solving for p, Initial residual = 0.0022140212, Final residual = 0.00010510169, No Iterations 10 GAMG: Solving for p, Initial residual = 0.00024748843, Final residual = 7.5154998e-08, No Iterations 23 time step continuity errors : sum local = 3.6429105e-16, global = -9.8798764e-19, cumulative = -3.2358487e-16 ExecutionTime = 8.48 s ClockTime = 11 s End Finalising parallel run |
|
April 2, 2013, 11:04 |
|
#5 |
Member
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 53
Rep Power: 13 |
Yup my problem is already resolved, but I found strange the problem with -20
I switched the value for the signals so that I have Code:
OptimisationSwitches { // Force dumping (at next timestep) upon signal writeNowSignal 20; // Force dumping (at next timestep) and clean exit upon signal stopAtWriteNowSignal 10; //-1; } By "it doesn't work" I mean : when I use -20 in this case, the simulation doesn't stop but doesn't write any other folder either for new times. Thank you for your help. At least my first problem is now solved ! |
|
April 3, 2013, 03:37 |
|
#6 |
Member
Malik
Join Date: Dec 2012
Location: Austin, USA
Posts: 53
Rep Power: 13 |
The script went all right tonight kill -10 calling stopAtWriteNowSignal.
Thanks again ! |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
unchangeable continuity residuals in parallel computing | wlt_1985 | FLUENT | 0 | August 1, 2011 12:15 |
Diffusion equation solved using Parallel Computing | Sachin Paramane | Main CFD Forum | 0 | June 11, 2007 23:48 |
Parallel Computing on Multi-Core Processors | Upgrading Hardware | CFX | 6 | June 7, 2007 15:54 |
Parallel Computing | peter | Main CFD Forum | 7 | May 15, 2006 09:53 |
Parallel Computing Classes at San Diego Supercomputer Center Jan. 20-22 | Amitava Majumdar | Main CFD Forum | 0 | January 5, 1999 12:00 |