sinfo not listing any partitions - slurm-users

27 Nov 2024

On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users
<slurm-users@lists.schedmd.com> wrote:
I am doing a new install of slurm 24.05.3 I have all the packages
built and installed on headnode and compute node with the same
munge.key, slurm.conf, and gres.conf file. I was able to run
munge and unmunge commands to test munge successfully. Time is
synced with chronyd. I can’t seem to find any useful errors in
the logs. For some reason when I run sinfo no nodes are listed. I
just see the headers for each column. Has anyone seen this or
know what a next step of troubleshooting would be? I’m new to
this and not sure where to go from here. Thanks for any and all help!
The odd output I am seeing
[username@headnode ~] sinfo
PARTITION AVAIL    TIMELIMIT NODES   STATE NODELIST
*/(Nothing is output showing status of partition or nodes)/*
Slurm.conf
ClusterName=slurmkvasir
SlurmctldHost=kadmin2
MpiDefault=none
ProctrackType=proctrack/cgroup
PrologFlags=contain
ReturnToService=2
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/cgroup
MinJobAge=600
SchedulerType=sched/backfill
SelectType=select/cons_tres
PriorityType=priority/multifactor
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu,cpu,node
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmLogFile=/var/log/slurm/slurmd.log
nodeName=k[001-448]
PartitionName=default Nodes=k[001-448] Default=YES
MaxTime=INFINITE State=up
Slurmctld.log
Error: Configured MailProg is invalid
Slurmctld version 24.05.3 started on cluster slurmkvasir
Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld:
Regisetering slurmctld at port 8617
Error: read_slurm_conf: default partition not set.
Revovered state of 448 nodes
Down nodes: k[002-448]
Recovered information about 0 jobs
Revovered state of 0 reservations
Read_slurm_conf: backup_controller not specified
Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure
Running as primary controller
Slurmd.log
Error: Node configuration differs from hardware: CPUS=1:40(hw)
Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw)
ThreadsPerCore:1:1(hw)
CPU frequency setting not configured for this node
Slurmd version 24.05.3started
Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700
CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201
uptime 166740 CPUSpecList=(null) FeaturesAvail=(null)
FeaturesActive=(null)
Error: _/forward/_thread: failed to k019 (10.142.0.119:6818):
Connection timed out
*/(Above line repeated 20 or so times for different nodes.)/*
*//*
Thanks,

Kent Hanson

--
slurm-users mailing list --slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>
To unsubscribe send an email
toslurm-users-leave@lists.schedmd.com
<mailto:slurm-users-leave@lists.schedmd.com>