[slurm-users] slurm.conf syntax checker?
Marcus Wagner
wagner at itc.rwth-aachen.de
Wed Oct 27 05:51:29 UTC 2021
Hi Diego,
sorry for the delay.
On 10/18/21 14:20, Diego Zuccato wrote:
> Il 15/10/2021 06:02, Marcus Wagner ha scritto:
>
>> mostly, our problem was, that we forgot to add/remove a node to/from
>> the partitions/topology file, which caused slurmctld to deny startup.
>> So I wrote a simple checker for that. Here is the output of a sample
>> run:
> Even "just" catching syntax errors and the most common errors is
> already a big help, expecially for noobs :)
>
>> [OK]: All nodeweights are correct.
> What do you mean with this? How can weights be "incorrect"?
We are using nodeweights calculated out of different factors, like cpu
generation, memory, cores and available generic resources.
We have e.g. some nodes with additional NVMe disks, these should be
scheduled later than the nodes without NVMes, but can be forced for
scheduling by asking for the constraint nvme.
My checker does calculate these weights, so I do not have to calculate
these by myself, just insert the calculated value.
Example output (instead of "[OK]: All nodeweights are correct.")
NodeName=lns[07-08] Sockets=8
CoresPerSocket=18 ThreadsPerCore=1 RealMemory=1020000
Feature=broadwell,bwx8860,nvme,hostok,hpcwork Gres=gpu:pascal:1
Weight=111544(was 1) State=UNKNOWN
So, the correct weight is 111544, but I set it to "1" in the configfile.
The checker tells me "Weight=111544(was 1)", that the correct value for
this kind of node would be 111544 and not "1".
Best
Marcus
>
>> If someone is interested ...Surely I am :)
>
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de
More information about the slurm-users
mailing list