Now what would be causing this? The srun just hangs and these are the only logs from slurmctld:
[2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node007
[2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node006
[2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node005
[2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node009
[2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node008

[2024-02-24T23:43:21.183] _slurm_rpc_complete_job_allocation: JobId=563 error Job/step already completing or completed

[465.extern] error: common_file_write_content: unable to open '/sys/fs/cgroup/system.slice/slurmstepd.scope/job_463/step_extern/user/cgroup.freeze' for writing: Permission denied

On Sat, Feb 24, 2024 at 12:09 PM Robert Kudyba <rkudyba@fordham.edu> wrote:
<<<Traditionally /tmp and /var/tmp have been 1777<<<


Ah yes thanks for pointing that out. Hope this helps someone down the line...perhaps the error detection could be more explicit in slurmctld?

On Sat, Feb 24, 2024, 12:07 PM Chris Samuel via slurm-users <slurm-users@lists.schedmd.com> wrote:
On 24/2/24 06:14, Robert Kudyba via slurm-users wrote:

> For now I just set it to chmod 777 on /tmp and that fixed the errors. Is
> there a better option?

Traditionally /tmp and /var/tmp have been 1777 (that "1" being the
sticky bit, originally invented to indicate that the OS should attempt
to keep a frequently used binary in memory but then adopted to indicate
special handling of a world writeable directory so users can only unlink
objects they own and not others).

Hope that helps!

All the best,
Chris
--
Chris Samuel  :  https://urldefense.proofpoint.com/v2/url?u=http-3A__www.csamuel.org_&d=DwICAg&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=1dr8K8YEcCyc4UDmIvmXWNuOled6fEZ424zSwluePPfhXD2Q5JVklrCrDUQU-mSW&s=ZbSiWLCu-81ZY1xhscjqczszYgOmqxUbVa6f2qUEd-o&e=   :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com