[slurm-users] How to look for free nodes of a certain constraint efficiently
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Thu Oct 14 14:34:24 UTC 2021
Hi Matt,
How about this sinfo command:
$ sinfo -O NodeList:30,Features:30,StateLong
NODELIST AVAIL_FEATURES STATE
i023 xeon2650v2,infiniband,xeon16 draining@
i[004-022,024-050] xeon2650v2,infiniband,xeon16 allocated
x[001-192] xeon2650v4,opa,xeon24 allocated
a[001-128] xeon6242r,opa,xeon40 allocated
b[001-012],c[001-196] xeon6148v5,opa,xeon40 allocated
You can grep for the desired features and use the nodelist in column 1 for
further processing.
/Ole
On 10/14/21 2:44 PM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND
APPLICATIONS INC] wrote:
> All,
>
> I work on a cluster that uses SLURM which has various types of nodes that
> are are controlled via --constraint flags in sbatch.
>
> Now, I started thinking "How can I figure out how many jobs are
> running/pending/etc on a certain type of node?". I first thought obviously
> "squeue --constraint=foo", but...nope. No --constraint flag with squeue.
> Okay. Constraints are just Features by another name, but...you can't seem
> to just squeue a feature either.
>
> I asked a SLURM guru here and they suggested using --nodelist/-w a la:
>
> squeue -a -w nodea[001-100],nodeb[001-100],... -t r
>
> where you pass in all the nodes of a certain type. And, yep, that works!
> But that also means I have to know what nodes are what type. I could
> obviously do a one-time parsing of "scontrol show nodes" and see what each
> chunk is and be done with it...but dangit I'm lazy and SLURM has so many
> programs and options there might just be something and I haven't read the
> right manpage! :)
>
> So I was wondering if anyone out there knows of a cool/elegant/efficient
> way of doing this?
>
> Thanks,
>
> Matt
>
> PS: I still might write a bash script where I've listed what the node
> names are of constraint and realize I might have to update it once every
> year or two. Now time to look at what parser SLURM uses for nodelist. Can
> you use regexes and use *, etc? Or just use nodea[001-100]? Time to find out!
More information about the slurm-users
mailing list