[slurm-users] Odd prolog Error?
Jason Simms
jsimms1 at swarthmore.edu
Tue Apr 11 16:48:31 UTC 2023
Hello all,
Regularly I'm seeing array jobs fail, and the only log info from the
compute node is this:
[2023-04-11T11:41:12.336] error: /opt/slurm/prolog.sh: exited with status
0x0100
[2023-04-11T11:41:12.336] error: [job 26090] prolog failed status=1:0
[2023-04-11T11:41:12.336] Job 26090 already killed, do not launch batch job
The contents of prolog.sh are incredibly simple:
#!/bin/bash
loginctl enable-linger $SLURM_JOB_USER
I can't sort out what may be going on here. An example script from a job
that can result in this error is here:
#!/bin/bash
#SBATCH -t 2:00:00
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -p compute
#SBATCH --array=1-100
#SBATCH -o tempOut/MSO-%j-%a.log
module load python3/python3
python3 runVoltage.py $SLURM_ARRAY_TASK_ID
Any insight would be welcome! This is really frustrating because it's
constantly causing nodes to drain.
Warmest regards,
Jason
--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230411/b104f488/attachment.htm>
More information about the slurm-users
mailing list