If you’re sure you’ve restarted everything after the config change, are you also sure that you don’t have that stuff hidden from your current user? You can try -a to rule that out. Or run as root.

--
#BlackLivesMatter
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novosirj@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
     `'

On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users <slurm-users@lists.schedmd.com> wrote:

I am doing a new install of slurm 24.05.3 I have all the packages built and installed on headnode and compute node with the same munge.key, slurm.conf, and gres.conf file. I was able to run munge and unmunge commands to test munge successfully. Time is synced with chronyd. I can’t seem to find any useful errors in the logs. For some reason when I run sinfo no nodes are listed. I just see the headers for each column. Has anyone seen this or know what a next step of troubleshooting would be? I’m new to this and not sure where to go from here. Thanks for any and all help!
 
The odd output I am seeing
[username@headnode ~] sinfo
PARTITION AVAIL    TIMELIMIT NODES   STATE   NODELIST
 
(Nothing is output showing status of partition or nodes)
 
 
Slurm.conf
 
ClusterName=slurmkvasir
SlurmctldHost=kadmin2
MpiDefault=none
ProctrackType=proctrack/cgroup
PrologFlags=contain
ReturnToService=2
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/cgroup
MinJobAge=600
SchedulerType=sched/backfill
SelectType=select/cons_tres
PriorityType=priority/multifactor
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu,cpu,node
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmLogFile=/var/log/slurm/slurmd.log
nodeName=k[001-448]
PartitionName=default Nodes=k[001-448] Default=YES MaxTime=INFINITE State=up
 
Slurmctld.log
 
Error: Configured MailProg is invalid
Slurmctld version 24.05.3 started on cluster slurmkvasir
Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Regisetering slurmctld at port 8617
Error: read_slurm_conf: default partition not set.
Revovered state of 448 nodes
Down nodes: k[002-448]
Recovered information about 0 jobs
Revovered state of 0 reservations
Read_slurm_conf: backup_controller not specified
Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure
Running as primary controller
 
Slurmd.log
 
Error: Node configuration differs from hardware: CPUS=1:40(hw) Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw) ThreadsPerCore:1:1(hw)
CPU frequency setting not configured for this node
Slurmd version 24.05.3started
Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700
CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201 uptime 166740 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
Error: _forward_thread: failed to k019 (10.142.0.119:6818): Connection timed out
(Above line repeated 20 or so times for different nodes.)
 
Thanks,

Kent Hanson

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com