[slurm-users] Jobs Immediately Fail for Certain Users

Jason Simms simmsj at lafayette.edu
Tue Jul 7 15:14:33 UTC 2020

Hello all,

Two users on my system experience job failures every time they submit a job
via sbatch. When I run their exact submission script, or when I create a
local system user and launch from there, the jobs run fine. Here is an
example of what I see in the slurmd log:

[2020-07-06T15:02:41.284] task_p_slurmd_batch_request: 1421
[2020-07-06T15:02:41.284] task/affinity: job 1421 CPU input mask for node:
[2020-07-06T15:02:41.284] task/affinity: job 1421 CPU final HW mask for
node: 0x00000F0000
[2020-07-06T15:02:41.295] _run_prolog: prolog with lock for job 1421 ran
for 0 seconds
[2020-07-06T15:02:41.295] error: [job 1421] prolog failed status=1:0
[2020-07-06T15:02:41.295] Job 1421 already killed, do not launch batch job

The prolog file is simply:

loginctl enable-linger $SLURM_JOB_USER

There seems to be some reason why certain users always encounter this, but
I can't figure out why. Their accounts are no "different" than anyone else
(not in a different group, etc.), so I don't think permissions are an issue.

Anyway, the job failure immediately puts the node into a DRAINED/DRAINING
state (which is expected). But for now, these users cannot submit any jobs
at all.

Any insights would be welcomed!

Warmest regards,

*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200707/a981e7c6/attachment.htm>

More information about the slurm-users mailing list