[slurm-users] Keep CPU Jobs Off GPU Nodes

Wed Mar 29 06:57:43 UTC 2023

Hi,

We have a dedicated partitions for GPUs (their name ends with _gpu) and simply forbid a job that is not requesting GPU resources to use this partition:

local function job_total_gpus(job_desc)
     -- return total number of GPUs allocated to the job
     -- there are many ways to request a GPU. This comes from the job_submit example in the slurm source
     -- a GPU resource is either nil or "gres:gpu:N", with N the number of GPUs requested

     -- pick relevant job resources for GPU spec (undefined resources can show limit values)
     gpu_specs = {
         ['tres_per_node'] = 1,
         ['tres_per_task'] = 1,
         ['tres_per_socket'] = 1,
         ['tres_per_job'] = 1,
     }

     -- number of nodes
     if job_desc['min_nodes'] < 0xFFFFFFFE then gpu_specs['tres_per_node'] = job_desc['min_nodes'] end
     -- number of tasks
     if job_desc['num_tasks'] < 0xFFFFFFFE then gpu_specs['tres_per_task'] = job_desc['num_tasks'] end
     -- number of sockets
     if job_desc['sockets_per_node'] < 0xFFFE then gpu_specs['tres_per_socket'] = job_desc['sockets_per_node'] end
     gpu_specs['tres_per_socket'] = gpu_specs['tres_per_socket'] * gpu_specs['tres_per_node']

     gpu_options = {}
     for tres_name, _ in pairs(gpu_specs) do
         local num_gpus = string.match(tostring(job_desc[tres_name]), "^gres:gpu:([0-9]+)") or 0
         gpu_options[tres_name] = tonumber(num_gpus)
     end
     -- calculate total GPUs
     for tres_name, job_res in pairs(gpu_specs) do
         local num_gpus = gpu_options[tres_name]
         if num_gpus > 0 then
             total_gpus = num_gpus * tonumber(job_res)
             return total_gpus
         end
     end
     return 0
end

function slurm_job_submit(job_desc, part_list, submit_uid)
     local total_gpus = job_total_gpus(job_desc)
     slurm.log_debug("Job total number of GPUs: %s", tostring(total_gpus));

     if total_gpus == 0 then
         for partition in string.gmatch(tostring(job_desc.partition), '([^,]+)') do
             if string.match(partition, '_gpu$') then
                 slurm.log_user(string.format('ERROR: GPU partition %s is not allowed for non-GPU jobs.', partition))
                 return ESLURM_INVALID_GRES
             end
         end
     end

     return slurm.SUCCESS
end

Ward

On 29/03/2023 01:24, Frank Pari wrote:
> Well, I wanted to avoid using lua.  But, it looks like that's going to be the easiest way to do this without having to create a separate partition for the GPUs.  Basically, check for at least one gpu in the job submission and if none exclude all GPU nodes for the job.
> 
> image.png
> 
> Now I'm wondering how to auto-gen the list of nodes with GPUs, so I don't have to remember to update job_submit.lua everytime we get new GPU nodes.
> 
> -F
> 
> On Tue, Mar 28, 2023 at 4:06 PM Frank Pari <parif at bc.edu <mailto:parif at bc.edu>> wrote:
> 
>     Hi all,
> 
>     First, thank you all for participating in this list.  I've learned so much by just following in other's threads.  =)
> 
>     I'm looking at creating a scavenger partition with idle resources from CPU and GPU nodes and I'd like to keep this to one partition.  But, I don't want CPU only jobs using up resources on the GPU nodes.
> 
>     I've seen suggestions for job/lua scripts.  But, I'm wondering if there's any other way to ensure a job has requested at least 1 gpu for the scheduler to assign that job to a GPU node.
> 
>     Thanks in advance!
> 
>     -Frank
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4741 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230329/8baed843/attachment.bin>