Killing an parallel computing with mpirun
Hi FOAMers,
I am working on 2D cases in parallel. The machine I use is shared between several colleagues and I need to stop my simulations during the night (in my case, at 11PM). Then I can restart my computations at 7 AM. I wrote a script which stop my computations with the command : Code:
kill $MpirunPID At 7AM, the script asks for a restart from the latest time (I just call mpirun -np $numberofProcessors $solver) And sometimes it tells me that the folder associated to the latest time is not complete (for example it says : Code:
--> FOAM FATAL IO ERROR: When I receive this error message, I must erase the latest time and rerun the case. Thus I have 2 questions, and I hope you can help me : 1) How can I tell to OpenFoam that when I stop the computations I want it to finish the iteration ? 2) Why sometimes calling "kill" works and sometimes it stops the computations in the middle of the iteration ? Am I just lucky ? Thank you all for your help ! |
The concept here is that by default, killing a process will stop it immediately, independent on what it is doing. That means that you can be lucky, when you kill it after it has finished a write, but before a new write is started, you can restart from the last write. However, you can also be unlucky and kill it in the middle of a write operation, and in that case you have an incomplete timestep, from which you cannot restart.
The solution here is to use OpenFOAMs run-time controls. I suggest you look at these pages: http://www.openfoam.org/version2.1.0...me-control.php http://www.openfoam.org/version2.2.0...me-control.php You will need to set the stopAtWriteNowSignal to a positive integer, and send the same signal to your process when you want it to stop. It will then nicely write the latest timestep to disk and stop. Disclaimer: I have never tried this personally, and does not know details on how this work, however it looks promising. |
Hi,
Thank you so much for the links. I added this code in my controlDict file : Code:
OptimisationSwitches However, when I tried to use kill -20, it did not kill anything. Actually, nothing happened and the process was still running. I may not have completely understood what is stopAtWriteNowSignal. If you have the answer, I would be very delighted. Thank you anyway ! |
I did a quick test and found that if I only set
Code:
OptimisationSwitches Code:
kill -s 10 $MPIRUN_PID Code:
Courant Number mean: 0.0028915643 max: 0.15907544 |
Yup my problem is already resolved, but I found strange the problem with -20
I switched the value for the signals so that I have Code:
OptimisationSwitches By "it doesn't work" I mean : when I use -20 in this case, the simulation doesn't stop but doesn't write any other folder either for new times. Thank you for your help. At least my first problem is now solved ! :) |
The script went all right tonight kill -10 calling stopAtWriteNowSignal.
Thanks again ! |
All times are GMT -4. The time now is 05:18. |