[slurm-users] PrologFlags=Contain significantly changing job activity on compute nodes

Baker D.J. D.J.Baker at soton.ac.uk
Wed Dec 12 03:55:40 MST 2018


Hello,

I wondered if someone could please help us to understand why the PrologFlags=contain flag is causing jobs to fail and draining compute nodes. We are, by the way, using slurm 18.08.0. Has anyone else seem this behaviour?

I'm currently experimenting with PrologFlags=contain. I've found that the addition of this flag in the slurm.conf radically changes the behaviour of jobs on the compute nodes.

When PrologFlags=contain is commented out in the slurm.conf jobs are assigned to the compute node and start/execute as expected. Here is the relevant extract from the slurmd logs on that node..

[2018-12-12T09:51:40.748] _run_prolog: run job script took usec=4
[2018-12-12T09:51:40.748] _run_prolog: prolog with lock for job 243317 ran for 0 seconds
[2018-12-12T09:51:40.748] Launching batch job 243317 for UID 57337
[2018-12-12T09:51:40.762] [243317.batch] task/cgroup: /slurm/uid_57337/job_243317: alloc=0MB mem.limit=193080MB memsw.limit=unlimited
[2018-12-12T09:51:40.763] [243317.batch] task/cgroup: /slurm/uid_57337/job_243317/step_batch: alloc=0MB mem.limit=193080MB memsw.limit=unlimited

When PrologFlags=contain is activated I find the following...

-- I don't see the "_run_prolog" and the "task/cgroup" messages in the slurmd logs
-- The job prolog fails, the job fails and the job output is owned by root
-- The compute node is drained.

sinfo -lN | grep red017 ....
red017         1     batch*     drained   40   2:20:1 190000        0      1   (null) batch job complete f

Here is the extract from the slurmd logs

[2018-12-12T09:56:54.564] error: Waiting for JobId=243321 prolog has failed, giving up after 50 sec
[2018-12-12T09:56:54.565] Could not launch job 243321 and not able to requeue it, cancelling job

Best regards,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181212/488a633d/attachment.html>


More information about the slurm-users mailing list