[slurm-users] Suspended and released job continues running in a "down" partition
    Brian Andrus 
    toomuchit at gmail.com
       
    Wed Mar 24 15:13:12 UTC 2021
    
    
  
Suspend is really nothing more than hitting ^S on the job, so there is 
no interaction between it and the partition once it gets running.
What behavior would you expect? Suspend is not cancel, which would need 
to be done to get the job out of that partition (even if it were 
checkpoint, then cancel to be resumed on another node).
Brian Andrus
On 3/24/2021 7:31 AM, Gestió Servidors wrote:
>
> Hi,
>
> I have got this new question for you:
>
> In my cluster there is a running job. Then, I change a partition state 
> from “up” to “down”. Then, that job continues “running” because it was 
> already running before the state had changed. Now, I run explicitly a 
> “scontrol suspend my_job”. After it, my job remains at the queue 
> because of it is suspended and, also, I have change partition status 
> to “down”. After 1 hour (for example), I run “scontrol resume myjob” 
> and, I don’t know why, job continues “running”… in a partition than is 
> still “down”. Why?
>
> Thanks
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210324/c004b00a/attachment.htm>
    
    
More information about the slurm-users
mailing list