Hi all,
I recently wrote an SLURM input plugin [0] for Telegraf [1].
I just wanted to let the community know so that you can use it if you'd
find that useful.
Maybe its existence can also be included in the documentation somewhere?
Anyway, thanks a ton for your time,
Pablo Collado Soto
References:
0: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/slurm
1: https://www.influxdata.com/time-series-platform/telegraf/
+ -------------------------------------- +
| …
[View More]Never let your sense of morals prevent |
| you from doing what is right. |
| -- Salvor Hardin, "Foundation" |
+ -------------------------------------- +
[View Less]
Hello,
We have a new cluster and I'm trying to setup fairshare accounting. I'm trying to track CPU, MEM and GPU. It seems that billing for individual jobs is correct, but billing isn't being accumulated (TRESRunMin is always 0).
In my slurm.conf, I think the relevant lines are
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu
PriorityFlags=MAX_TRES
PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.…
[View More]125G,GRES/gpu=9.6"
PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
I currently have one recently finished job and one running job. sacct gives
$ sacct --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50
JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax
------------ ---------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1
154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+
155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1155.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1
billing=9 seems correct to me, since I have 1 GPU allocated, which has the largest score of 9.6. However, sshare doesn't show anything in TRESRunMins
sshare --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110
Account User RawShares FairShare RawUsage EffectvUsage TRESRunMins
-------------------- ---------- ---------- ---------- ----------- ------------- --------------------------------------------------------------------------------------------------------------
root 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
abrol_group 2000 0 0.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
luchko_group 2000 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 luchko_group tluchko 1 0.333333 21589714 1.000000 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and slurmdbd is running.
Thank you,
Tyler
Sent with [Proton Mail](https://proton.me/) secure email.
[View Less]
Hi SchedMD,
I'm sure they will eventually, but do you know when the slides of the
SLUG'24 presentation will be available online at
https://slurm.schedmd.com/publications.html, like previous editions'?
Thanks!
--
Kilian
Hi
I'm using dynamic nodes with "slurmd -Z" with slurm 23.11.1.
Firstly I find that when you do "scontrol show node" it shows the NODEADDR as ip rather than the NODENAME. Because I'm playing around with running this in containers on docker swarm I find this ip can be wrong. I can force it with scontrol update however after a while something updates it to something else again. Does anybody know if this is done by slurmd or slurmctld or something else?
How can I stop this from happening?
How can …
[View More]I get the node to register with the hostname rather than ip?
cheers,
Jakub
[View Less]
Hello,
*Issue 1:*
I am using slurm version 24.05.1 , my slurmd has a single node where I
connect multiple gres by enabling the overscribe feature.
I am able to use the advance reservation of gres only using *gres** name*
(tres=gres/gpu:*SYSTEM12*).
i.e while in reservation period , if other users submits job with SYSTEM12
, then slurm places this job in queue
*user1@host$ srun --gres=gpu:SYSTEM12:1 hostname*
*srun: job 333 queued and waiting for resources *
but when other users just submit …
[View More]a job without any system name , slurm
jobs goes through on that gres immediately even though it is reserved.
*user1@host$ srun --gres=gpu:1 hostname
*
*mylinux.wbi.com <http://mylinux.wbi.com> *
Also I can see GresUsed in busy mode using "*scontrol show node -d*" ,
this means the job is running on Gres/GPU and not on cpu etc.
Same way , job submission based on Feature "rev1 in my case" is also going
through even though it is reserved for other users in multiple partition
slurm.
*snippet of slurm.conf file*
NodeName=cluster01 NodeAddr=cluster Port=6002CPUs=8 Boards=1
SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 Feature="rev1"
Gres=gpu:SYSTEM12:1 RealMemory=64171 State=IDLE
*Issue 2:*
while execution , Slurm o/p's some extra prints in the srun output
user1@host$ srun --gres=gpu:1 hostname
srun: error: extract_net_cred: net_cred not provided
srun: error: Malformed RPC of type RESPONSE_NODE_ALIAS_ADDRS(3017)
received
srun: error:
slurm_unpack_received_msg: [[inv1715771615.nxdi.us-aus01.nxp.com]:41242]
Header lengths are longer than data received
*mylinux.wbi.com <http://mylinux.wbi.com>*
Regards,
MS
[View Less]
Dear slurm-user list,
I have a cloud node that is powered up and down on demand. Rarely it can
happen that slurm's resumeTimeout is reached and the node is therefore
powered down. We have set ReturnToService=2 in order to avoid the node
being marked down, because the instance behind that node is created on
demand and therefore after a failure nothing stops the system to start
the node again as it is a different instance.
I thought this would be enough, but apparently the node is still marked
…
[View More]with "NOT_RESPONDING" which leads to slurm not trying to schedule on it.
After a while NOT_RESPONDING is removed, but I would like to move it
directly from within my fail script if possible so that the node can
return to service immediately and not be blocked by "NOT_RESPONDING".
Best regards,
Xaver
[View Less]
OS: CentOS 8.5
Slurm: 22.05
Recently upgraded to 22.05. Upgrade was successful, but after a while I started to see the following messages in the slurmdbd.log file:
error: We have more time than is possible (9344745+7524000+0)(16868745) > 12362400 for cluster CLUSTERNAME(3434) from 2024-09-18T13:00:00 - 2024-09-18T14:00:00 tres 1 (this may happen if oversubscription of resources is allowed without Gang)
We do have partitions with overlapping nodes, but do not have "Suspend,Gang" set as the …
[View More]global PreemptMode mode. It is currently set to requeue.
I have also check sacct and there are no runaway jobs listed.
Oversubscription is not enabled on any of the queues as well.
Do I need to modify my slurm config to address or is this an error condition caused by the upgrade?
Thank you,
SS
[View Less]
Hello,
is it possible to change a pending job from --exclusive to
--exclusive=user? I tried scontrol update jobid=... oversubscribe=user,
but it seems to only accept yes or no.
Gerhard
Hello
We have another batch of new users and some more batches of large array jobs with very short runtimes due to errors in the jobs or just by design. Trying to deal with these issues, Setting ArrayTaskThrottle and user education, I had a thought that it would be very nice to have a limit on how many jobs can start in a given minute for users, so if they posted a 200000 array job with 15 second tasks then the scheduler wouldn't launch more than a 100 or 200 per minute and be less likely to …
[View More]bog down, but if they had longer runtimes (1 hour +) it would take a few extra minutes to start using all the resources they are allowed to but not add much overall delay to the whole set of jobs.
I thought about adding something to our CLI filter, but usually these jobs are asking for a runtime of 3-4 hours even though they run for <30 seconds so the submit options don't indicate the problem jobs ahead of time.
We currently limit our users to %80 of the available resources which is way more than slurm needs to bog down with fast turnover jobs, but we have users who complain that they can't use that other 20% when the cluster is not busy so putting in lower default restrictions is not currently an option.
Has this already been discussed and isn't feasible for technical reasons? (Not finding anything like this yet searching the archives)
I think slurm used have a feature request severity on their bug submission site. Is there a severity level they prefer to have suggested requests like this?
Thanks
[View Less]
Dear all SLUG attendees!
The information about which buildings/addresses the SLUG reception and
presentations are to be held is not very visible on
the https://slug24.splashthat.com. There is a map there with all locations
(https://www.google.com/maps/d/u/0/edit?mid=1bcGaTiW0TNB5noQsjQ3ulctzKuqlGrQ…),
but I've gotten questions about it, so:
The reception on Wednesday will be held in the top floor of Oslo Science Park
(Forskningsparken). Address: Gaustadalléen 21. There will be someone
in …
[View More]the reception who can point you in the right direction.
The presentations will be held in auditorium 3 in Helga Engs Hus ("Helga
Eng's House"). Address: Sem Sælands vei 7. Lunch will be in the
canteen in the same building.
The closest subway station to both these buildings is Blindern Subway
Station (Blindern T-banestasjon).
Looking forward to see you there!
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
[View Less]