[slurm-users] how to check what slurm is doing when job pending with reason=none?

taleintervenor at sjtu.edu.cn taleintervenor at sjtu.edu.cn
Wed Jun 16 10:39:08 UTC 2021


Hello,

 

Recently we notice a strange delay from job-submitting to job-start while
the partition is sure to have enough idle nodes to meet the job's demand. To
avoid interference, we use the 4-node debug partition for test, which does
not have any other job to run. And the test job script is also as simple as
possible:

 

#!/bin/bash

 

#SBATCH --job-name=test

#SBATCH --partition=debug

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --cpus-per-task=1

#SBATCH --output=%j.out

#SBATCH --error=%j.err

 

hostname

sleep 1000

echo end

 

But after submit, this job still stay at PENDING state for about 30-60s and
during the pending time sacct shows the REASON is "None". We have also
checked the slurmctld.log at server and slurmd.log at client node with debug
log level. Both of them have nothing useful to figure out the pending
reason. 

 

So is there any way to make slurm explain in detail why the job didn't start
immediately or what it was doing during the job pending time?

 

 

Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210616/d9f29c52/attachment.htm>


More information about the slurm-users mailing list