[slurm-users] Slurm preemption grace time

Ailing Zhang zhangal1992 at gmail.com
Mon Nov 20 19:24:45 MST 2017


Thanks Jeffrey, bypassing SIGTERM solved my problem! :D

Best,
Ailing

On Mon, Nov 20, 2017 at 8:33 AM, Jeffrey Frey <frey at udel.edu> wrote:

> • *GraceTime*: Specifies a time period for a job to execute after it is
> selected to be preempted. This option can be specified by partition or QOS
> using the slurm.conf file or database respectively. This option is only
> honored if PreemptMode=CANCEL. The GraceTime is specified in seconds and
> the default value is zero, which results in no preemption delay. Once a job
> has been selected for preemption, its end time is set to the current time
> plus GraceTime. The job is immediately sent SIGCONT and SIGTERM signals in
> order to provide notification of its imminent termination. This is followed
> by the SIGCONT, SIGTERM and SIGKILL signal sequence upon reaching its new
> end time.
>
>
> "The job is immediately sent SIGCONT and SIGTERM signals in order to
> provide notification of its imminent termination."
>
>
> Default behavior on SIGTERM is for a program to exit; your program is
> probably ending when it receives that initial SIGTERM.
>
>
>
>
>
>
> On Nov 20, 2017, at 10:21 AM, Ailing Zhang <zhangal1992 at gmail.com> wrote:
>
>
> Hi slurm community,
>
> I'm testing preemption with partition based preemption. Partitions
> test-high and test-low share the same nodes. I set GraceTime=600 and
> PreemptMode=CANCEL in test-low. But once I submitted a job to test-high,
> job in test-low is immediately killed without any grace time.
> Here is my configs.
> PartitionName=test-low
>    AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
>    AllocNodes=ALL Default=NO QoS=N/A
>    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=600
> Hidden=NO
>    MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO
> MaxCPUsPerNode=UNLIMITED
>    Nodes=node[100-102]
>    PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO
> OverSubscribe=NO
>    OverTimeLimit=NONE PreemptMode=CANCEL
>    State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
>    DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
> PartitionName=test-high
>    AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
>    AllocNodes=ALL Default=NO QoS=N/A
>    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
>    MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO
> MaxCPUsPerNode=UNLIMITED
>    Nodes=node[100-102]  PriorityJobFactor=30 PriorityTier=30 RootOnly=NO
> ReqResv=NO OverSubscribe=NO
>    OverTimeLimit=NONE PreemptMode=OFF
>    State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
>    DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
> Any help will be much appreciated.
>
> Thanks!
> Ailing
>
>
>
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
> Jeffrey T. Frey, Ph.D.
> Systems Programmer V / HPC Management
> Network & Systems Services / College of Engineering
> University of Delaware, Newark DE  19716
> Office: (302) 831-6034  Mobile: (302) 419-4976
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171120/aa3806fb/attachment.html>


More information about the slurm-users mailing list