Stevie Wonder August 12, 2004 14:29

Linux: when the job suddenly stops working.

Sometimes, not often, I notice that my solver stops working in my linux OS. I am using CFX-5.7 and I use to submit the job in background from command line like cfx5solve -def foo.def & or through Solver Manager, and I experienced that sometimes the solver stops without crashing or giving me any output in the .out file, it just freezes.
If I go to a terminal and type top I realize that the solver job is there but is not consuming the computer processor. I would like to ask whether anyone has got such behavior using linux and jobs in the background? Note that when I use Ctrl-Z (I use bash) I immediately type bg to the job work in background and so I can continue using my X terminal.
I know there are workarounds like perform periodic backups but this issue has made me very curious about this weird behavour in linux OS based computers.
Thanks in advance, Stevie

Glenn Horrocks August 12, 2004 18:39

Re: Linux: when the job suddenly stops working.

Is it a big job? If the job is bigger than available memory it will use swap space and the CPU load will go down, sometimes to almost zero.


Stevie Wonder August 13, 2004 08:05

Re: Linux: when the job suddenly stops working.
Thanks Glenn for the answer,

I usually do not work with big problems using serial solver and the memory requirements are always lower than my amount of RAM. No swap space invoked since the HD activity is quite normal, just the solver activity is 0.0% according to top.

I still have not noticed this issue in windows serial/parallel solver just in my linux box.


Glenn Horrocks August 15, 2004 18:36

Re: Linux: when the job suddenly stops working.

Sometimes when the solver crashes in multi-processor mode it does not clear the slave jobs and they stay on the machine with 0% CPU. This has happened to me occassionally with the MPICH parallel algorithm (and never with PVM). You can safely kill these jobs manually.

Glenn Horrocks

