[slurm-users] Removing a node

Mahmood Naderan mahmood.nt at gmail.com
Wed Oct 17 05:14:38 MDT 2018


Hi,
I have removed a node, but the squeue command doesn't work and it
seems that it still searches for the missing node.

[root at rocks7 home]# > /var/log/slurm/slurmctld.log
[root at rocks7 home]# systemctl restart slurmctld
[root at rocks7 home]#  systemctl restart slurmd
[root at rocks7 home]# rocks sync slurm
slurm_load_ctl_conf error: Unable to contact slurm controller (connect failure)
[root at rocks7 home]# cat /var/log/slurm/slurmctld.log
[2018-10-17T14:41:36.682] slurmctld version 17.11.5 started on cluster jupiter
[2018-10-17T14:41:37.212] layouts: no layout to initialize
[2018-10-17T14:41:37.216] layouts: loading entities/relations information
[2018-10-17T14:41:37.216] error: _find_node_record(751): lookup
failure for compute-0-6
[2018-10-17T14:41:37.216] error: Node compute-0-6 has vanished from
configuration
[2018-10-17T14:41:37.216] Recovered state of 7 nodes
[2018-10-17T14:41:37.216] Down nodes: compute-0-4
[2018-10-17T14:41:37.216] Recovered JobID=1440 State=0x1 NodeCnt=0 Assoc=59
[2018-10-17T14:41:37.216] recovered job step 1442.0
[2018-10-17T14:41:37.216] Recovered JobID=1442 State=0x1 NodeCnt=0 Assoc=76
[2018-10-17T14:41:37.216] recovered job step 1443.0
[2018-10-17T14:41:37.216] Recovered JobID=1443 State=0x1 NodeCnt=0 Assoc=77
[2018-10-17T14:41:37.216] Recovered information about 3 jobs
[2018-10-17T14:41:37.216] error: _find_node_record(751): lookup
failure for compute-0-6
[2018-10-17T14:41:37.216] error: build_part_bitmap: invalid node name
compute-0-6
[2018-10-17T14:41:37.217] fatal: Invalid node names in partition EMERALD
[root at rocks7 home]# cat /etc/slurm/parts
PartitionName=WHEEL RootOnly=yes Priority=1000 Nodes=ALL
PartitionName=DIAMOND AllowAccounts=monthly Nodes=compute-0-[0-1]
PartitionName=EMERALD AllowAccounts=em1,z1,z2,em4,z3,z5,z9
Nodes=compute-0-[2-5],rocks7
PartitionName=RUBY AllowAccounts=y8,y10 Nodes=compute-0-[3-5]
PartitionName=NOLIMIT AllowAccounts=nl Nodes=compute-0-[4-5]
[root at rocks7 home]#




Any idea? Something zombie is still there.

Regards,
Mahmood



More information about the slurm-users mailing list