[slurm-users] Reservation to exceed time limit on a partition for a user

Matthew BETTINGER matthew.bettinger at external.total.com
Thu Jan 3 08:23:37 MST 2019


Answering my own question here.  I created a hidden partition which shows like this 

PartitionName=FOO
   AllowGroups=ALL AllowAccounts=rt AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=YES GraceTime=0 Hidden=YES
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=nid00[192-255]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=1280 TotalNodes=64 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

I can run jobs in there but if I set it to just a user (myself) then the job does not run.  I may have to just make this partition like this until I can figure out the correct way since we need this to run today.

On 1/3/19, 8:41 AM, "Matthew BETTINGER" <matthew.bettinger at external.total.com> wrote:

    Hello,
    
    We are running slurm 17.02.6 with accounting on a cray CLE system.
    
    We currently have a 24 hour job run limit on our partitions and a user needs to run a job which will exceed 24 hours runtime.  I tried to make a reservation as seen below allocating the user 36 hours to run his job but it was killed at the 24 hour run limit.  Can someone explain what is going on and what is the proper way to allow a user to exceed the partition time limit without having to modify slurm.conf and push it out to all of the nodes, run ansible plays  and reconfigure etc.  I thought that this is what reservation was for.
    
    Here is the reservation I created that failed when it ran over 24 hours
    
    scontrol show res
    ReservationName=CoolBreeze StartTime=2018-12-27T10:08:11 EndTime=2018-12-28T22:08:11 Duration=1-12:00:00
    Nodes=nid00[192-239] NodeCnt=48 CoreCnt=480 Features=(null) PartitionName=GPU Flags=
    TRES=cpu=960
    Users=coolbreeze Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
    
    Here is the partition with the resources the user needs to run on
    
    PartitionName=GPU
       AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
       AllocNodes=ALL Default=NO QoS=N/A
       DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
       MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
       Nodes=nid00[192-255]
       PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE
       OverTimeLimit=NONE PreemptMode=OFF
       State=UP TotalCPUs=1280 TotalNodes=64 SelectTypeParameters=NONE
       DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
    
    Thanks!
    
    
    
    



More information about the slurm-users mailing list