[slurm-users] slurm.conf syntax checker?

Marcus Wagner wagner at itc.rwth-aachen.de
Wed Oct 27 05:51:29 UTC 2021


Hi Diego,

sorry for the delay.


On 10/18/21 14:20, Diego Zuccato wrote:
> Il 15/10/2021 06:02, Marcus Wagner ha scritto:
>
>> mostly, our problem was, that we forgot to add/remove a node to/from 
>> the partitions/topology file, which caused slurmctld to deny startup. 
>> So I wrote a simple checker for that. Here is the output of a sample 
>> run:
> Even "just" catching syntax errors and the most common errors is 
> already a big help, expecially for noobs :)
>
>> [OK]: All nodeweights are correct.
> What do you mean with this? How can weights be "incorrect"?

We are using nodeweights calculated out of different factors,  like cpu 
generation, memory, cores and available generic resources.
We have e.g. some nodes with additional NVMe disks, these should be 
scheduled later than the nodes without NVMes, but can be forced for 
scheduling by asking for the constraint nvme.
My checker does calculate these weights, so I do not have to calculate 
these by myself, just insert the calculated value.
Example output (instead of "[OK]: All nodeweights are correct.")
NodeName=lns[07-08]                                 Sockets=8 
CoresPerSocket=18 ThreadsPerCore=1 RealMemory=1020000 
Feature=broadwell,bwx8860,nvme,hostok,hpcwork Gres=gpu:pascal:1  
Weight=111544(was 1) State=UNKNOWN

So, the correct weight is 111544, but I set it to "1" in the configfile. 
The checker tells me "Weight=111544(was 1)", that the correct value for 
this kind of node would be 111544 and not "1".

Best
Marcus
>
>> If someone is interested ...Surely I am :)
>

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de




More information about the slurm-users mailing list