I’ll start with the question of “why spread the jobs out more than required?” and move on to why the other items didn’t work:
- exclusive only ensures that others’ jobs don’t run on a node with your jobs, and does nothing about other jobs you own.
- spread-job distributes the work of one job across multiple nodes, but does nothing about multiple jobs
- distribution also distributes the work of one job
You might get something similar to what you want by changing the scheduler to use CR_LLN instead of CR_Core_Memory (or whatever you’re using), but that’ll potentially have serious side effects for others’
jobs.
So back to the original question: why *not* pack 20 jobs onto fewer nodes if those nodes have the capacity to run the full set of jobs? You shouldn’t have a constraint with memory or CPUs. Are you trying
to spread out an I/O load somehow? Networking?
From:
Oren via slurm-users <slurm-users@lists.schedmd.com>
Date: Tuesday, December 3, 2024 at 1:35 PM
To: slurm-users@schedmd.com <slurm-users@schedmd.com>
Subject: [slurm-users] How can I make sure my user have only one job per node (Job array --exclusive=user,)
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
Hi,
I have a cluster of 20-nodes, and I want to run a jobarray on that cluster, but I want each node to get one job per node.
#SBATCH --job-name=process_images_train # Job name
#SBATCH --time=50:00:00 # Time limit hrs:min:sec
#SBATCH --cpus-per-task=4
#SBATCH --array=0-19# 19 # Job array with 20 jobs (0 to 19)
I get 10 jobs in node #1 and 10 jobs in node #20, I want a job in each node.
I've tried:
#SBATCH --exclusive=user
#SBATCH --distribution=cyclic
Nothing changes, node#1 got 10 jobs and node#2 got 10 jobs.
Thanks