[slurm-users] 4 sockets but "

Diego Zuccato diego.zuccato at unibo.it
Wed Jul 21 09:01:25 UTC 2021


Uff... A bit mangled... Correcting and resending.

Il 21/07/2021 08:18, Diego Zuccato ha scritto:
> Il 20/07/2021 18:02, mercan ha scritto:
> Hi Ahmet.
> 
>> Did you check slurmctld log for a complain about the host line. if the 
>> slumctld can not recognize a parameter, may be it give up processing 
>> whole host line.
> Yup. Nothing there :(
> 
> [2021-07-21T08:13:14.984] slurmctld version 18.08.5-2 started on cluster 
> oph
> [2021-07-21T08:13:16.990] error: _shutdown_bu_thread:send/recv 
> str957-cluster2: Connection timed out
> [2021-07-21T08:13:17.809] layouts: no layout to initialize
> [2021-07-21T08:13:17.828] error: read_slurm_conf: default partition not 
> set.
> [2021-07-21T08:13:17.829] layouts: loading entities/relations information
> [2021-07-21T08:13:17.829] Recovered state of 34 nodes
> [2021-07-21T08:13:17.829] Down nodes: str957-mtx-[21-22]
> [2021-07-21T08:13:17.829] Recovered JobId=33656 Assoc=377
> [...cut...]
> [2021-07-21T08:13:17.831] Recovered information about 45 jobs
> [2021-07-21T08:13:17.831] cons_res: select_p_node_init
> [2021-07-21T08:13:17.831] cons_res: preparing for 8 partitions
> [2021-07-21T08:13:17.832] Recovered state of 0 reservations
> [2021-07-21T08:13:17.833] cons_res: select_p_reconfigure
> [2021-07-21T08:13:17.833] cons_res: select_p_node_init
> [2021-07-21T08:13:17.833] cons_res: preparing for 8 partitions
> [2021-07-21T08:13:17.833] Running as primary controller
> [2021-07-21T08:13:17.833] Registering slurmctld at port 6817 with slurmdbd.
> [2021-07-21T08:13:18.220] No parameter for mcs plugin, default values set
> [2021-07-21T08:13:18.220] mcs: MCSParameters = (null). ondemand set.
> [2021-07-21T08:13:23.226] 
> SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2 
> 
> [2021-07-21T08:13:23.226] _build_node_list: No nodes satisfy JobId=33762 
> requirements in partition b6
> [2021-07-21T08:13:23.227] _build_node_list: No nodes satisfy JobId=33808 
> requirements in partition b4
> 
> (str957-cluster2 is the second frontend/login node that I've had to take offline 
> for an unrelated problem).
And str957-mtx-[21-22] are not yet installed.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



More information about the slurm-users mailing list