[slurm-users] Analyzing a stuck job

Christopher Samuel chris at csamuel.org
Thu Feb 14 16:30:42 UTC 2019


On 2/14/19 8:02 AM, Mahmood Naderan wrote:

> One job is in RH state which means JobHoldMaxRequeue.
> The output file, specified by --output shows nothing suspicious.
> Is there any way to analyze the stuck job?

This happens when a job fails to start for MAX_BATCH_REQUEUE times 
(which is 5 at the moment).

Check your controller and slurmd logs to see what goes wrong when Slurm 
tries to start it.

All the best,
Chris



More information about the slurm-users mailing list