[slurm-users] Keep CPU Jobs Off GPU Nodes
René Sitt
sittr at hrz.uni-marburg.de
Wed Mar 29 08:08:12 UTC 2023
Hello,
maybe some additional notes:
While the cited procedure works great in general, it gets more
complicated for heterogeneous setups, i.e. if you have several GPU types
defined in gres.conf, since the 'tres_per_<x>' fields can then take the
form of either 'gres:gpu:N' or 'gres:gpu:<type>:N' - depending on
whether the job script specifies a GPU type or not.
Of course, you could omit the GPU type definition in gres.conf and
define the type as a node feature instead, as long as no nodes contain
multiple different GPU types.
Since the latter is the case in our cluster, I instead opted to check
only for the existence of 'gpu' in the 'tres_per_<x>' fields and to not
bother with parsing the actual number of GPUs. However, there is an
interesting edge case here, as users are free to set --gpus=0 - either
one has to filter for that specifically, or instruct one's users to not
do that.
Kind Regards,
René Sitt
Am 29.03.23 um 08:57 schrieb Ward Poelmans:
> Hi,
>
> We have a dedicated partitions for GPUs (their name ends with _gpu)
> and simply forbid a job that is not requesting GPU resources to use
> this partition:
>
> local function job_total_gpus(job_desc)
> -- return total number of GPUs allocated to the job
> -- there are many ways to request a GPU. This comes from the
> job_submit example in the slurm source
> -- a GPU resource is either nil or "gres:gpu:N", with N the number
> of GPUs requested
>
> -- pick relevant job resources for GPU spec (undefined resources
> can show limit values)
> gpu_specs = {
> ['tres_per_node'] = 1,
> ['tres_per_task'] = 1,
> ['tres_per_socket'] = 1,
> ['tres_per_job'] = 1,
> }
>
> -- number of nodes
> if job_desc['min_nodes'] < 0xFFFFFFFE then
> gpu_specs['tres_per_node'] = job_desc['min_nodes'] end
> -- number of tasks
> if job_desc['num_tasks'] < 0xFFFFFFFE then
> gpu_specs['tres_per_task'] = job_desc['num_tasks'] end
> -- number of sockets
> if job_desc['sockets_per_node'] < 0xFFFE then
> gpu_specs['tres_per_socket'] = job_desc['sockets_per_node'] end
> gpu_specs['tres_per_socket'] = gpu_specs['tres_per_socket'] *
> gpu_specs['tres_per_node']
>
> gpu_options = {}
> for tres_name, _ in pairs(gpu_specs) do
> local num_gpus = string.match(tostring(job_desc[tres_name]),
> "^gres:gpu:([0-9]+)") or 0
> gpu_options[tres_name] = tonumber(num_gpus)
> end
> -- calculate total GPUs
> for tres_name, job_res in pairs(gpu_specs) do
> local num_gpus = gpu_options[tres_name]
> if num_gpus > 0 then
> total_gpus = num_gpus * tonumber(job_res)
> return total_gpus
> end
> end
> return 0
> end
>
>
>
> function slurm_job_submit(job_desc, part_list, submit_uid)
> local total_gpus = job_total_gpus(job_desc)
> slurm.log_debug("Job total number of GPUs: %s",
> tostring(total_gpus));
>
> if total_gpus == 0 then
> for partition in string.gmatch(tostring(job_desc.partition),
> '([^,]+)') do
> if string.match(partition, '_gpu$') then
> slurm.log_user(string.format('ERROR: GPU partition %s
> is not allowed for non-GPU jobs.', partition))
> return ESLURM_INVALID_GRES
> end
> end
> end
>
> return slurm.SUCCESS
> end
>
>
>
> Ward
>
> On 29/03/2023 01:24, Frank Pari wrote:
>> Well, I wanted to avoid using lua. But, it looks like that's going
>> to be the easiest way to do this without having to create a separate
>> partition for the GPUs. Basically, check for at least one gpu in the
>> job submission and if none exclude all GPU nodes for the job.
>>
>> image.png
>>
>> Now I'm wondering how to auto-gen the list of nodes with GPUs, so I
>> don't have to remember to update job_submit.lua everytime we get new
>> GPU nodes.
>>
>> -F
>>
>> On Tue, Mar 28, 2023 at 4:06 PM Frank Pari <parif at bc.edu
>> <mailto:parif at bc.edu>> wrote:
>>
>> Hi all,
>>
>> First, thank you all for participating in this list. I've
>> learned so much by just following in other's threads. =)
>>
>> I'm looking at creating a scavenger partition with idle resources
>> from CPU and GPU nodes and I'd like to keep this to one partition.
>> But, I don't want CPU only jobs using up resources on the GPU nodes.
>>
>> I've seen suggestions for job/lua scripts. But, I'm wondering if
>> there's any other way to ensure a job has requested at least 1 gpu
>> for the scheduler to assign that job to a GPU node.
>>
>> Thanks in advance!
>>
>> -Frank
>>
>
--
Dipl.-Chem. René Sitt
Hessisches Kompetenzzentrum für Hochleistungsrechnen
Philipps-Universität Marburg
Hans-Meerwein-Straße
35032 Marburg
Tel. +49 6421 28 23523
sittr at hrz.uni-marburg.de
www.hkhlr.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4239 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230329/e33cd323/attachment.bin>
More information about the slurm-users
mailing list