[slurm-users] Limiting the number of CPU

Thu Nov 14 15:01:31 UTC 2019

Hi,

Is lowercase #sbatch really valid?

> Am 14.11.2019 um 14:09 schrieb Sukman <sukman at pusat.itb.ac.id>:
> 
> Hi Brian,
> 
> thank you for the suggestion.
> 
> It appears that my node is in drain state.
> I rebooted the node and everything became fine.
> 
> However, the QOS still cannot be applied properly.
> Do you have any opinion regarding this issue?
> 
> 
> $ sacctmgr show qos where Name=normal_compute format=Name,Priority,MaxWal,MaxTRESPU
>      Name   Priority     MaxWall     MaxTRESPU
> ---------- ---------- ----------- -------------
> normal_co+         10    00:01:00  cpu=2,mem=1G
> 
> 
> when I run the following script:
> 
> #!/bin/bash
> #SBATCH --job-name=hostname
> #sbatch --time=00:50
> #sbatch --mem=1M
> #SBATCH --nodes=1
> #SBATCH --ntasks=1
> #SBATCH --ntasks-per-node=1
> #SBATCH --cpus-per-task=1
> #SBATCH --nodelist=cn110
> 
> srun hostname
> 
> 
> It turns out that the QOSMaxMemoryPerUser has been met
> 
> $ squeue
>             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>                88      defq hostname   sukman PD       0:00      1 (QOSMaxMemoryPerUser)
> 
> 
> $ scontrol show job 88
> JobId=88 JobName=hostname
>   UserId=sukman(1000) GroupId=nobody(1000) MCS_label=N/A
>   Priority=4294901753 Nice=0 Account=user QOS=normal_compute
>   JobState=PENDING Reason=QOSMaxMemoryPerUser Dependency=(null)
>   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>   RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
>   SubmitTime=2019-11-14T19:49:37 EligibleTime=2019-11-14T19:49:37
>   StartTime=Unknown EndTime=Unknown Deadline=N/A
>   PreemptTime=None SuspendTime=None SecsPreSuspend=0
>   LastSchedEval=2019-11-14T19:55:50
>   Partition=defq AllocNode:Sid=itbhn02:51072
>   ReqNodeList=cn110 ExcNodeList=(null)
>   NodeList=(null)
>   NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>   TRES=cpu=1,node=1
>   Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
>   MinCPUsNode=1 MinMemoryNode=257758M MinTmpDiskNode=0
>   Features=(null) DelayBoot=00:00:00
>   Gres=(null) Reservation=(null)
>   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
>   Command=/home/sukman/script/test_hostname.sh
>   WorkDir=/home/sukman/script
>   StdErr=/home/sukman/script/slurm-88.out
>   StdIn=/dev/null
>   StdOut=/home/sukman/script/slurm-88.out
>   Power=
> 
> 
> $ scontrol show node cn110
> NodeName=cn110 Arch=x86_64 CoresPerSocket=1
>   CPUAlloc=0 CPUErr=0 CPUTot=56 CPULoad=0.01
>   AvailableFeatures=(null)
>   ActiveFeatures=(null)
>   Gres=(null)
>   NodeAddr=cn110 NodeHostName=cn110 Version=17.11
>   OS=Linux 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017
>   RealMemory=257758 AllocMem=0 FreeMem=255742 Sockets=56 Boards=1
>   State=IDLE ThreadsPerCore=1 TmpDisk=268629 Weight=1 Owner=N/A MCS_label=N/A
>   Partitions=defq
>   BootTime=2019-11-14T18:50:56 SlurmdStartTime=2019-11-14T18:53:23
>   CfgTRES=cpu=56,mem=257758M,billing=56
>   AllocTRES=
>   CapWatts=n/a
>   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
> 
> 
> ---------------------------------------
> 
> Sukman
> ITB Indonesia
> 
> 
> 
> 
> ----- Original Message -----
> From: "Brian Andrus" <toomuchit at gmail.com>
> To: slurm-users at lists.schedmd.com
> Sent: Tuesday, November 12, 2019 10:41:42 AM
> Subject: Re: [slurm-users] Limiting the number of CPU
> 
> You are trying to specifically run on node cn110, so you may want to 
> check that out with sinfo
> 
> A quick "sinfo -R" can list any down machines and the reasons.
> 
> Brian Andrus
> 
>