Oct 06 00:57:43 pgpu008.chicagobooth.edu slurmd[3709622]: slurmd: error: prolog failed: rc:230 output:Successfully started proces>
Oct 06 00:57:43 pgpu008.chicagobooth.edu slurmd[3709622]: slurmd: error: [job 20398] prolog failed status=230:0
Oct 06 00:57:43 pgpu008 slurmd[3709622]: slurmd: Job 20398 already killed, do not launch batch job
Oct 06 13:06:23 pgpu008 systemd[1]: Stopping Slurm node daemon...
Oct 06 13:06:23 pgpu008 slurmd[3709622]: slurmd: Caught SIGTERM. Shutting down.
Oct 06 13:06:23 pgpu008 slurmd[3709622]: slurmd: Slurmd shutdown completing
Currently, now the job 20398 that is getting killed in the picture above is in the state "Launch failed requeue held" after I resume the node.
Fritz Ratnasamy
Data Scientist
Information Technology