[slurm-users] Slurm preemption grace time

Jeffrey Frey frey at udel.edu
Mon Nov 20 08:33:21 MST 2017


• GraceTime: Specifies a time period for a job to execute after it is selected to be preempted. This option can be specified by partition or QOS using the slurm.conf file or database respectively. This option is only honored if PreemptMode=CANCEL. The GraceTime is specified in seconds and the default value is zero, which results in no preemption delay. Once a job has been selected for preemption, its end time is set to the current time plus GraceTime. The job is immediately sent SIGCONT and SIGTERM signals in order to provide notification of its imminent termination. This is followed by the SIGCONT, SIGTERM and SIGKILL signal sequence upon reaching its new end time.


"The job is immediately sent SIGCONT and SIGTERM signals in order to provide notification of its imminent termination."


Default behavior on SIGTERM is for a program to exit; your program is probably ending when it receives that initial SIGTERM.






> On Nov 20, 2017, at 10:21 AM, Ailing Zhang <zhangal1992 at gmail.com> wrote:
> 
> 
> Hi slurm community,
> 
> I'm testing preemption with partition based preemption. Partitions test-high and test-low share the same nodes. I set GraceTime=600 and PreemptMode=CANCEL in test-low. But once I submitted a job to test-high, job in test-low is immediately killed without any grace time. 
> Here is my configs.
> PartitionName=test-low
>    AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
>    AllocNodes=ALL Default=NO QoS=N/A
>    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=600 Hidden=NO
>    MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
>    Nodes=node[100-102]
>    PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
>    OverTimeLimit=NONE PreemptMode=CANCEL
>    State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
>    DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
> 
> PartitionName=test-high
>    AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
>    AllocNodes=ALL Default=NO QoS=N/A
>    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
>    MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
>    Nodes=node[100-102]  PriorityJobFactor=30 PriorityTier=30 RootOnly=NO ReqResv=NO OverSubscribe=NO
>    OverTimeLimit=NONE PreemptMode=OFF
>    State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
>    DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
> 
> Any help will be much appreciated.
> 
> Thanks!
> Ailing


::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171120/c6620770/attachment.html>


More information about the slurm-users mailing list