[slurm-users] Disable --no-allocate support for a node/SlurmD

Wed Jun 14 14:28:02 UTC 2023

> job_submit.lua allows you to view (and edit!) all job parameters that
> are known at submit time, including the option to refuse a configuration
> by returning `slurm.ERROR`instead of `slurm.SUCCESS`. The common way to
> filter for interactive jobs in job_submit.lua is checking whether
> job_desc.script is nil or an empty string (i.e. the job submission
> doesn't have a script attached to it). You can do a lot more within
> job_submit.lua - I know of multiple sites (including the cluster I'm
> maintaining) that use it to, for example, automatically sort jobs into
> the correct partition(s) according to their resource requirements.
Thanks for the suggestion.

However as I understand it this requires additionally trusting the node 
where those scripts are running on,
which is, I guess, the one running SlurmCtlD.

> All in all, these two interfaces are (imho) much better suited for the
> kind of task you're suggesting (checking job parameters, refusing
> specific job configurations) than prolog scripts, since technically by
> the time the prolog scripts are starting, the job configuration has
> already been finalized and accepted by the scheduler.
The reason we are using Prolog scripts is that they are running on the 
very node the job will be running on.
So we make that one "secure" (or at least harden it by e.g. disabling 
SSH access and restricting any other connections).
Then anything running on this node has a high trust level, e.g. the 
SlurmD and the Prolog script.
If required the node could be rebooted with a fixed image after each job 
removing any potential compromise.
That isn't feasible for the SlurmCtlD as that would affect the whole 
cluster and unrelated jobs.

Hence the checks (for example filtering out interactive jobs, but also 
some additional authentication) should be done on the hardened node(s).

It would work if there wasn't a way to circumvent the Prolog. So ideally 
I'd like to have a configuration option for the SlurmD such that it 
doesn't accept such jobs.
As the SlurmD config is on the node it can also be considered secure.

So while I fully agree that those plugins are better suited and likely 
easier to use
I fear that it is much easier to prevent them from running and hence 
bypass those restrictions
than having something (local) at the level of the SlurmD.

Please correct me if I misunderstood anything.

Kind Regards,
Alexander Grund