I’ll start with the question of “why spread the jobs out more than required?” and move on to why the other items didn’t work:
1. exclusive only ensures that others’ jobs don’t run on a node with your jobs, and does nothing about other jobs you own. 2. spread-job distributes the work of one job across multiple nodes, but does nothing about multiple jobs 3. distribution also distributes the work of one job
You might get something similar to what you want by changing the scheduler to use CR_LLN instead of CR_Core_Memory (or whatever you’re using), but that’ll potentially have serious side effects for others’ jobs.
So back to the original question: why *not* pack 20 jobs onto fewer nodes if those nodes have the capacity to run the full set of jobs? You shouldn’t have a constraint with memory or CPUs. Are you trying to spread out an I/O load somehow? Networking?
From: Oren via slurm-users slurm-users@lists.schedmd.com Date: Tuesday, December 3, 2024 at 1:35 PM To: slurm-users@schedmd.com slurm-users@schedmd.com Subject: [slurm-users] How can I make sure my user have only one job per node (Job array --exclusive=user,)
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
________________________________ Hi, I have a cluster of 20-nodes, and I want to run a jobarray on that cluster, but I want each node to get one job per node.
When I do the following: #!/bin/bash #SBATCH --job-name=process_images_train # Job name #SBATCH --time=50:00:00 # Time limit hrs:min:sec #SBATCH --tasks=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=50000 #SBATCH --array=0-19# 19 # Job array with 20 jobs (0 to 19)
I get 10 jobs in node #1 and 10 jobs in node #20, I want a job in each node.
I've tried: #SBATCH --exclusive=user Also #SBATCH --spread-job #SBATCH --distribution=cyclic
Nothing changes, node#1 got 10 jobs and node#2 got 10 jobs.