[slurm-users] [EXT] Jobs Immediately Fail for Certain Users

Sean Crosby scrosby at unimelb.edu.au
Wed Jul 8 00:43:53 UTC 2020


Hi Jason,

What happens when you try to run that command on the node? Is the exit
status of the command 0?

e.g. for my servers, where lingering is masked, I get

[root at thespian-gpgpu001 ~]# loginctl enable-linger scrosby
Could not enable linger: Unit is masked.
[root at thespian-gpgpu001 ~]# echo $?
1

Sean

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Wed, 8 Jul 2020 at 01:14, Jason Simms <simmsj at lafayette.edu> wrote:

> *UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.*
> ------------------------------
> Hello all,
>
> Two users on my system experience job failures every time they submit a
> job via sbatch. When I run their exact submission script, or when I create
> a local system user and launch from there, the jobs run fine. Here is an
> example of what I see in the slurmd log:
>
> [2020-07-06T15:02:41.284] task_p_slurmd_batch_request: 1421
> [2020-07-06T15:02:41.284] task/affinity: job 1421 CPU input mask for node:
> 0x00000F0000
> [2020-07-06T15:02:41.284] task/affinity: job 1421 CPU final HW mask for
> node: 0x00000F0000
> [2020-07-06T15:02:41.295] _run_prolog: prolog with lock for job 1421 ran
> for 0 seconds
> [2020-07-06T15:02:41.295] error: [job 1421] prolog failed status=1:0
> [2020-07-06T15:02:41.295] Job 1421 already killed, do not launch batch job
>
> The prolog file is simply:
>
> #!/bin/bash
> loginctl enable-linger $SLURM_JOB_USER
>
> There seems to be some reason why certain users always encounter this, but
> I can't figure out why. Their accounts are no "different" than anyone else
> (not in a different group, etc.), so I don't think permissions are an issue.
>
> Anyway, the job failure immediately puts the node into a DRAINED/DRAINING
> state (which is expected). But for now, these users cannot submit any jobs
> at all.
>
> Any insights would be welcomed!
>
> Warmest regards,
> Jason
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200708/77cb0cd2/attachment.htm>


More information about the slurm-users mailing list