[slurm-users] Limiting the number of CPU

Brian Andrus toomuchit at gmail.com
Tue Nov 12 03:41:42 UTC 2019


You are trying to specifically run on node cn110, so you may want to 
check that out with sinfo

A quick "sinfo -R" can list any down machines and the reasons.

Brian Andrus

On 11/10/2019 11:23 PM, Sukman wrote:
> Hi Brian,
>
> I see. Thank you for your suggestion.
> I definitely will try it.
>
> Anyway, I am now suffering a new problem.
> The job cannot start because of "Resources" problem.
>
> Would anyone help on this issue?
>
>
> I previously enabled these options in Slurm.conf
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
>
> However, since the job didn't work well by enabling those options, now those options are disabled again.
> I then, restarted Slurm in both head node and compute node.
>
>
> But, now when I run a job containing this script, the job is pending.
>
>> the script
> #!/bin/bash
> #SBATCH --job-name=hostname
> ##sbatch --time=00:50
> ##sbatch --mem=10M
> ##SBATCH --nodes=1
> ##SBATCH --ntasks=1
> ##SBATCH --ntasks-per-node=1
> ##SBATCH --cpus-per-task=1
> ##SBATCH --nodelist=cn110
>
> srun hostname
>
>
>> scontrol show job 79
> JobId=79 JobName=hostname
>     UserId=sukman(1000) GroupId=nobody(1000) MCS_label=N/A
>     Priority=4294901753 Nice=0 Account=user QOS=normal_compute
>     JobState=PENDING Reason=Resources Dependency=(null)
>     Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>     RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
>     SubmitTime=2019-11-11T14:10:41 EligibleTime=2019-11-11T14:10:41
>     StartTime=Unknown EndTime=Unknown Deadline=N/A
>     PreemptTime=None SuspendTime=None SecsPreSuspend=0
>     LastSchedEval=2019-11-11T14:18:41
>     Partition=defq AllocNode:Sid=itbhn02:11211
>     ReqNodeList=(null) ExcNodeList=(null)
>     NodeList=(null)
>     NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>     TRES=cpu=1,node=1
>     Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>     MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>     Features=(null) DelayBoot=00:00:00
>     Gres=(null) Reservation=(null)
>     OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
>     Command=/home/sukman/script/test_hostname.sh
>     WorkDir=/home/sukman/script
>     StdErr=/home/sukman/script/slurm-79.out
>     StdIn=/dev/null
>     StdOut=/home/sukman/script/slurm-79.out
>     Power=
>
>
>
> ------------------------------------------
>
> Suksmandhira H
> ITB Indonesia
>
>
>
> ----- Original Message -----
> From: "Brian W. Johanson" <bjohanso at psc.edu>
> To: "Slurm User Community List" <slurm-users at lists.schedmd.com>
> Sent: Friday, November 8, 2019 8:58:40 PM
> Subject: Re: [slurm-users] Limiting the number of CPU
>
> Suksmandhira,
> That qos specifies a walltime, cpu, and memory limit.  From the job script, it appears you are within the cpu limit.  But, the job script does not specify walltime nor memory and your squeue output is not showing those values (or cpu) for the job.
> 'scontrol show job=JOBID' will show it all values.  Added flags=DenyOnLimit to the qos will reject the job when it is over the limit of a QOS, hopefully so there are not jobs that will never run sitting in queue.
>
> -b
>



More information about the slurm-users mailing list