[slurm-users] Slurm preemption grace time
Ailing Zhang
zhangal1992 at gmail.com
Mon Nov 20 19:24:45 MST 2017
Thanks Jeffrey, bypassing SIGTERM solved my problem! :D
Best,
Ailing
On Mon, Nov 20, 2017 at 8:33 AM, Jeffrey Frey <frey at udel.edu> wrote:
> • *GraceTime*: Specifies a time period for a job to execute after it is
> selected to be preempted. This option can be specified by partition or QOS
> using the slurm.conf file or database respectively. This option is only
> honored if PreemptMode=CANCEL. The GraceTime is specified in seconds and
> the default value is zero, which results in no preemption delay. Once a job
> has been selected for preemption, its end time is set to the current time
> plus GraceTime. The job is immediately sent SIGCONT and SIGTERM signals in
> order to provide notification of its imminent termination. This is followed
> by the SIGCONT, SIGTERM and SIGKILL signal sequence upon reaching its new
> end time.
>
>
> "The job is immediately sent SIGCONT and SIGTERM signals in order to
> provide notification of its imminent termination."
>
>
> Default behavior on SIGTERM is for a program to exit; your program is
> probably ending when it receives that initial SIGTERM.
>
>
>
>
>
>
> On Nov 20, 2017, at 10:21 AM, Ailing Zhang <zhangal1992 at gmail.com> wrote:
>
>
> Hi slurm community,
>
> I'm testing preemption with partition based preemption. Partitions
> test-high and test-low share the same nodes. I set GraceTime=600 and
> PreemptMode=CANCEL in test-low. But once I submitted a job to test-high,
> job in test-low is immediately killed without any grace time.
> Here is my configs.
> PartitionName=test-low
> AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
> AllocNodes=ALL Default=NO QoS=N/A
> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=600
> Hidden=NO
> MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO
> MaxCPUsPerNode=UNLIMITED
> Nodes=node[100-102]
> PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO
> OverSubscribe=NO
> OverTimeLimit=NONE PreemptMode=CANCEL
> State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
> DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
> PartitionName=test-high
> AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
> AllocNodes=ALL Default=NO QoS=N/A
> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
> MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO
> MaxCPUsPerNode=UNLIMITED
> Nodes=node[100-102] PriorityJobFactor=30 PriorityTier=30 RootOnly=NO
> ReqResv=NO OverSubscribe=NO
> OverTimeLimit=NONE PreemptMode=OFF
> State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
> DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
> Any help will be much appreciated.
>
> Thanks!
> Ailing
>
>
>
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
> Jeffrey T. Frey, Ph.D.
> Systems Programmer V / HPC Management
> Network & Systems Services / College of Engineering
> University of Delaware, Newark DE 19716
> Office: (302) 831-6034 Mobile: (302) 419-4976
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171120/aa3806fb/attachment.html>
More information about the slurm-users
mailing list