[slurm-users] Disable --no-allocate support for a node/SlurmD

Wed Jun 14 15:32:13 UTC 2023

Hi,

> Thanks for the suggestion.
>
> However as I understand it this requires additionally trusting the 
> node where those scripts are running on,
> which is, I guess, the one running SlurmCtlD.
>
> The reason we are using Prolog scripts is that they are running on the 
> very node the job will be running on.
> So we make that one "secure" (or at least harden it by e.g. disabling 
> SSH access and restricting any other connections).
> Then anything running on this node has a high trust level, e.g. the 
> SlurmD and the Prolog script.
> If required the node could be rebooted with a fixed image after each 
> job removing any potential compromise.
> That isn't feasible for the SlurmCtlD as that would affect the whole 
> cluster and unrelated jobs.
>
> Hence the checks (for example filtering out interactive jobs, but also 
> some additional authentication) should be done on the hardened node(s).
>
> It would work if there wasn't a way to circumvent the Prolog. So 
> ideally I'd like to have a configuration option for the SlurmD such 
> that it doesn't accept such jobs.
> As the SlurmD config is on the node it can also be considered secure.
>
> So while I fully agree that those plugins are better suited and likely 
> easier to use
> I fear that it is much easier to prevent them from running and hence 
> bypass those restrictions
> than having something (local) at the level of the SlurmD.
>
> Please correct me if I misunderstood anything.

Ah okay,  so your requirements include completely insulating (some) jobs 
from outside access, including root? I've seen this kind of requirements 
on e.g. working non-defaced medical data - generally a tough problem imo 
because this level of data security seems more or less incompatible with 
the idea of a multi-user HPC system.

I remember that this year's ZKI-AK Supercomputing spring meeting had 
Sebastian Krey from GWDG presenting the KISSKI ("KI-Servicezentrum für 
Sensible und Kritische Infrastrukturen", https://kisski.gwdg.de/ ) 
project, which works in this problem domain, are you involved in that? 
The setup with containerization and 'node hardening' sounds very similar 
to me.

Re "preventing the scripts from running": I'd say it's about as easy as 
to otherwise manipulate any job submission that goes through slurmctld 
(e.g. by editing slurm.conf), so without knowing your exact use case and 
requirements, I can't think of a simple solution.

Kind regards,
René Sitt

-- 
Dipl.-Chem. René Sitt
Hessisches Kompetenzzentrum für Hochleistungsrechnen
Philipps-Universität Marburg
Hans-Meerwein-Straße
35032 Marburg

Tel. +49 6421 28 23523
sittr at hrz.uni-marburg.de
www.hkhlr.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4239 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230614/95d55016/attachment.bin>