<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

    <style type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>

  </head>

  <body bidimailui-charset-is-forced="true" text="#000000"

    bgcolor="#FFFFFF">

    <p>3 possible issue, inline below</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 14/11/2019 14:58:29, Sukman wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:1846287531.339891.1573736309582.JavaMail.root@pusat.itb.ac.id">

      <pre class="moz-quote-pre" wrap="">Hi Brian,

thank you for the suggestion.

It appears that my node is in drain state.

I rebooted the node and everything became fine.

However, the QOS still cannot be applied properly.

Do you have any opinion regarding this issue?

$ sacctmgr show qos where Name=normal_compute format=Name,Priority,MaxWal,MaxTRESPU

      Name   Priority     MaxWall     MaxTRESPU

---------- ---------- ----------- -------------

normal_co+         10    00:01:00  cpu=2,mem=1G

when I run the following script:

#!/bin/bash

#SBATCH --job-name=hostname

#sbatch --time=00:50

#sbatch --mem=1M</pre>

    </blockquote>

    I believe those should be uppercase #SBATCH<br>

    <blockquote type="cite"

      cite="mid:1846287531.339891.1573736309582.JavaMail.root@pusat.itb.ac.id">

      <pre class="moz-quote-pre" wrap="">

#SBATCH --nodes=1

#SBATCH --ntasks=1

#SBATCH --ntasks-per-node=1

#SBATCH --cpus-per-task=1

#SBATCH --nodelist=cn110

srun hostname

It turns out that the QOSMaxMemoryPerUser has been met

$ squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

                88      defq hostname   sukman PD       0:00      1 (QOSMaxMemoryPerUser)

$ scontrol show job 88

JobId=88 JobName=hostname

   UserId=sukman(1000) GroupId=nobody(1000) MCS_label=N/A

   Priority=4294901753 Nice=0 Account=user QOS=normal_compute

   JobState=PENDING Reason=QOSMaxMemoryPerUser Dependency=(null)

   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

   RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A

   SubmitTime=2019-11-14T19:49:37 EligibleTime=2019-11-14T19:49:37

   StartTime=Unknown EndTime=Unknown Deadline=N/A

   PreemptTime=None SuspendTime=None SecsPreSuspend=0

   LastSchedEval=2019-11-14T19:55:50

   Partition=defq AllocNode:Sid=itbhn02:51072

   ReqNodeList=cn110 ExcNodeList=(null)

   NodeList=(null)

   NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

   TRES=cpu=1,node=1

   Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*

   MinCPUsNode=1 MinMemoryNode=257758M MinTmpDiskNode=0</pre>

    </blockquote>

    MinMemoryNode seems to require more than FreeMem in Node below<br>

    <blockquote type="cite"

      cite="mid:1846287531.339891.1573736309582.JavaMail.root@pusat.itb.ac.id">

      <pre class="moz-quote-pre" wrap="">

   Features=(null) DelayBoot=00:00:00

   Gres=(null) Reservation=(null)

   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

   Command=/home/sukman/script/test_hostname.sh

   WorkDir=/home/sukman/script

   StdErr=/home/sukman/script/slurm-88.out

   StdIn=/dev/null

   StdOut=/home/sukman/script/slurm-88.out

   Power=

$ scontrol show node cn110

NodeName=cn110 Arch=x86_64 CoresPerSocket=1

   CPUAlloc=0 CPUErr=0 CPUTot=56 CPULoad=0.01

   AvailableFeatures=(null)

   ActiveFeatures=(null)

   Gres=(null)

   NodeAddr=cn110 NodeHostName=cn110 Version=17.11

   OS=Linux 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017

   RealMemory=257758 AllocMem=0 FreeMem=255742 Sockets=56 Boards=1</pre>

    </blockquote>

    <p>This would appear to be wrong - 56 sockets?</p>

    <p>How did you configure the node in slurm.conf?</p>

    <p>FreeMem lower than MinMemoryNode - not sure if that is relevant.<br>

    </p>

    <blockquote type="cite"

      cite="mid:1846287531.339891.1573736309582.JavaMail.root@pusat.itb.ac.id">

      <pre class="moz-quote-pre" wrap="">

   State=IDLE ThreadsPerCore=1 TmpDisk=268629 Weight=1 Owner=N/A MCS_label=N/A

   Partitions=defq

   BootTime=2019-11-14T18:50:56 SlurmdStartTime=2019-11-14T18:53:23

   CfgTRES=cpu=56,mem=257758M,billing=56

   AllocTRES=

   CapWatts=n/a

   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0

   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

---------------------------------------

Sukman

ITB Indonesia

----- Original Message -----

From: "Brian Andrus" <a class="moz-txt-link-rfc2396E" href="mailto:toomuchit@gmail.com"><toomuchit@gmail.com></a>

To: <a class="moz-txt-link-abbreviated" href="mailto:slurm-users@lists.schedmd.com">slurm-users@lists.schedmd.com</a>

Sent: Tuesday, November 12, 2019 10:41:42 AM

Subject: Re: [slurm-users] Limiting the number of CPU

You are trying to specifically run on node cn110, so you may want to 

check that out with sinfo

A quick "sinfo -R" can list any down machines and the reasons.

Brian Andrus

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Regards,

Daniel Letai

+972 (0)505 870 456</pre>

  </body>

</html>