[slurm-users] Make "srun --pty bash -i" always schedule immediately

Thu Jun 11 13:47:00 UTC 2020

Spare capacity is critical. At our scale, the few dozen cores that were typically left idle in our GPU nodes handles the vast majority of interactive work.

> On Jun 11, 2020, at 8:38 AM, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
> 
> External Email Warning
> 
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> 
> ________________________________
> 
> That's pretty slick.  We just have a test, gpu_test, and remotedesktop
> partition set up for those purposes.
> 
> What the real trick is making sure you have sufficient spare capacity
> that you can deliberately idle for these purposes.  If we were a smaller
> shop with less hardware I wouldn't be able to set aside as much hardware
> for this.  If that was the case I would likely go the route of a single
> server with oversubscribe.
> 
> You could try to do it with an active partition with no deliberately
> idle resources, but then you will want to make sure that your small jobs
> are really small and won't impact larger work.  I don't necessarily
> recommend that.  A single node with oversubscribe should be sufficient.
> If you can't spare a single node then a VM would do the job.
> 
> -Paul Edmon-
> 
> On 6/11/2020 9:28 AM, Renfro, Michael wrote:
>> That’s close to what we’re doing, but without dedicated nodes. We have three back-end partitions (interactive, any-interactive, and gpu-interactive), but the users typically don’t have to consider that, due to our job_submit.lua plugin.
>> 
>> All three partitions have a default of 2 hours, 1 core, 2 GB RAM, but users could request more cores and RAM (but not as much as a batch job — we used https://hpcbios.readthedocs.io/en/latest/HPCBIOS_05-05.html as a starting point).
>> 
>> If a GPU is requested, the job goes into the gpu-interactive partition and is limited to 16 cores per node (we have 28 cores per GPU node, but GPU jobs can’t keep them all busy)
>> 
>> If less than 12 cores per node is requested, the job goes into the any-interactive partition and could be handled on any of our GPU or non-GPU nodes.
>> 
>> If more than 12 cores per node is requested, the job goes into the interactive partition and is handled by only a non-GPU node.
>> 
>> I haven’t needed to QOS the interactive partitions, but that’s not a bad idea.
>> 
>>> On Jun 11, 2020, at 8:19 AM, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>>> 
>>> Generally the way we've solved this is to set aside a specific set of
>>> nodes in a partition for interactive sessions.  We deliberately scale
>>> the size of the resources so that users will always run immediately and
>>> we also set a QoS on the partition to make it so that no one user can
>>> dominate the partition.
>>> 
>>> -Paul Edmon-
>>> 
>>> On 6/11/2020 8:49 AM, Loris Bennett wrote:
>>>> Hi Manual,
>>>> 
>>>> "Holtgrewe, Manuel" <manuel.holtgrewe at bihealth.de> writes:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> is there a way to make interactive logins where users will use almost no resources "always succeed"?
>>>>> 
>>>>> In most of these interactive sessions, users will have mostly idle shells running and do some batch job submissions. Is there a way to allocate "infinite virtual cpus" on each node that can only be allocated to
>>>>> interactive jobs?
>>>> I have never done this but setting "OverSubscribe" in the appropriate
>>>> place might be what you are looking for.
>>>> 
>>>>   https://slurm.schedmd.com/cons_res_share.html
>>>> 
>>>> Personally, however, I would be a bit wary of doing this.  What if
>>>> someone does start a multithreaded process on purpose or by accident?
>>>> 
>>>> Wouldn't just using cgroups on your login node achieve what you want?
>>>> 
>>>> Cheers,
>>>> 
>>>> Loris
>>>> 
>