[slurm-users] Make "srun --pty bash -i" always schedule immediately
Renfro, Michael
Renfro at tntech.edu
Thu Jun 11 13:28:56 UTC 2020
That’s close to what we’re doing, but without dedicated nodes. We have three back-end partitions (interactive, any-interactive, and gpu-interactive), but the users typically don’t have to consider that, due to our job_submit.lua plugin.
All three partitions have a default of 2 hours, 1 core, 2 GB RAM, but users could request more cores and RAM (but not as much as a batch job — we used https://hpcbios.readthedocs.io/en/latest/HPCBIOS_05-05.html as a starting point).
If a GPU is requested, the job goes into the gpu-interactive partition and is limited to 16 cores per node (we have 28 cores per GPU node, but GPU jobs can’t keep them all busy)
If less than 12 cores per node is requested, the job goes into the any-interactive partition and could be handled on any of our GPU or non-GPU nodes.
If more than 12 cores per node is requested, the job goes into the interactive partition and is handled by only a non-GPU node.
I haven’t needed to QOS the interactive partitions, but that’s not a bad idea.
> On Jun 11, 2020, at 8:19 AM, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>
> Generally the way we've solved this is to set aside a specific set of
> nodes in a partition for interactive sessions. We deliberately scale
> the size of the resources so that users will always run immediately and
> we also set a QoS on the partition to make it so that no one user can
> dominate the partition.
>
> -Paul Edmon-
>
> On 6/11/2020 8:49 AM, Loris Bennett wrote:
>> Hi Manual,
>>
>> "Holtgrewe, Manuel" <manuel.holtgrewe at bihealth.de> writes:
>>
>>> Hi,
>>>
>>> is there a way to make interactive logins where users will use almost no resources "always succeed"?
>>>
>>> In most of these interactive sessions, users will have mostly idle shells running and do some batch job submissions. Is there a way to allocate "infinite virtual cpus" on each node that can only be allocated to
>>> interactive jobs?
>> I have never done this but setting "OverSubscribe" in the appropriate
>> place might be what you are looking for.
>>
>> https://slurm.schedmd.com/cons_res_share.html
>>
>> Personally, however, I would be a bit wary of doing this. What if
>> someone does start a multithreaded process on purpose or by accident?
>>
>> Wouldn't just using cgroups on your login node achieve what you want?
>>
>> Cheers,
>>
>> Loris
>>
>
More information about the slurm-users
mailing list