Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation
From: Cedric Le Goater <hidden>
Date: 2009-03-17 06:55:48
From: Cedric Le Goater <hidden>
Date: 2009-03-17 06:55:48
Again, how would 'cr' obtain exit status for these tasks, and how would it distinguish failure from normal operation?
Here's our solution to this issue. mcr maintains in its kernel container object an exitcode attribute for the mcr-restart process. This process is detached from the fork tree of the restarted application. when the restart is finished, an mcr-wait command can be called to reap this exitcode. This make it possible to distinguish an exit of the application process from an exit of the mcr-restart process. This is a must-have for batch managers in an HPC environment. Cheers, C.