[slurm-users] Re: sinfo not listing any partitions

28 Nov 2024


      Hi Kent,
on your management node could you run:
systemctl status slurmctld
and check your 'Nodename=....' and 'PartitionName=...' in 
/etc/slurm.conf ? In my slurm.conf I have a more detailed description 
and the Nodename Keyword start with an upper case (do'nt know if 
slurm.conf is case sensitive) :
NodeName=kareline-0-[0-3]  Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 
RealMemory=47900
and it looks like your nodes description is not understood by slurm.
Patrick
Le 27/11/2024 à 17:46, Ryan Novosielski via slurm-users a écrit :
...
At this point, I’d probably crank up the logging some and see what 
it’s saying in slurmctld.log.
--
#BlackLivesMatter
____
|| \UTGERS, |---------------------------*O*---------------------------
||_// the State |         Ryan Novosielski - novosirj@rutgers.edu
|| \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark
     `'
...
On Nov 27, 2024, at 11:38, Kent L. Hanson Kent.Hanson@inl.gov wrote:
Hey Ryan,
I have restarted the slurmctld and slurmd services several times. I 
hashed the slurm.conf files. They are the same. I ran “sinfo -a” as 
root with the same result.
Thanks,
Kent
*From:*Ryan Novosielski novosirj@rutgers.edu
*Sent:*Wednesday, November 27, 2024 9:31 AM
*To:*Kent L. Hanson Kent.Hanson@inl.gov
*Cc:*slurm-users@lists.schedmd.com
*Subject:*Re: [slurm-users] sinfo not listing any partitions
If you’re sure you’ve restarted everything after the config change, 
are you also sure that you don’t have that stuff hidden from your 
current user? You can try -a to rule that out. Or run as root.
--
#BlackLivesMatter
____
|| \UTGERS file://utgers/, 
|---------------------------*O*---------------------------
||_// the State |         Ryan Novosielski - novosirj@rutgers.edu
|| \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark
     `'
On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users
<slurm-users@lists.schedmd.com> wrote:
I am doing a new install of slurm 24.05.3 I have all the packages
built and installed on headnode and compute node with the same
munge.key, slurm.conf, and gres.conf file. I was able to run
munge and unmunge commands to test munge successfully. Time is
synced with chronyd. I can’t seem to find any useful errors in
the logs. For some reason when I run sinfo no nodes are listed. I
just see the headers for each column. Has anyone seen this or
know what a next step of troubleshooting would be? I’m new to
this and not sure where to go from here. Thanks for any and all help!
The odd output I am seeing
[username@headnode ~] sinfo
PARTITION AVAIL    TIMELIMIT NODES   STATE NODELIST
*/(Nothing is output showing status of partition or nodes)/*
Slurm.conf
ClusterName=slurmkvasir
SlurmctldHost=kadmin2
MpiDefault=none
ProctrackType=proctrack/cgroup
PrologFlags=contain
ReturnToService=2
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/cgroup
MinJobAge=600
SchedulerType=sched/backfill
SelectType=select/cons_tres
PriorityType=priority/multifactor
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu,cpu,node
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmLogFile=/var/log/slurm/slurmd.log
nodeName=k[001-448]
PartitionName=default Nodes=k[001-448] Default=YES
MaxTime=INFINITE State=up
Slurmctld.log
Error: Configured MailProg is invalid
Slurmctld version 24.05.3 started on cluster slurmkvasir
Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld:
Regisetering slurmctld at port 8617
Error: read_slurm_conf: default partition not set.
Revovered state of 448 nodes
Down nodes: k[002-448]
Recovered information about 0 jobs
Revovered state of 0 reservations
Read_slurm_conf: backup_controller not specified
Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure
Running as primary controller
Slurmd.log
Error: Node configuration differs from hardware: CPUS=1:40(hw)
Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw)
ThreadsPerCore:1:1(hw)
CPU frequency setting not configured for this node
Slurmd version 24.05.3started
Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700
CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201
uptime 166740 CPUSpecList=(null) FeaturesAvail=(null)
FeaturesActive=(null)
Error: _/forward/_thread: failed to k019 (10.142.0.119:6818):
Connection timed out
*/(Above line repeated 20 or so times for different nodes.)/*
*//*
Thanks,

Kent Hanson

--
slurm-users mailing list --slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>
To unsubscribe send an email
toslurm-users-leave@lists.schedmd.com
<mailto:slurm-users-leave@lists.schedmd.com>

2025

2024

[slurm-users] Re: sinfo not listing any partitions