[slurm-users] unpredictable behavior of slurm multi-cluster

mohammed shambakey shambakey1 at gmail.com
Sun Oct 8 13:05:56 UTC 2023


I have 2 slurm clusters: cluster A with 3 compute nodes, each node has 32
CPUs; Cluster B with 4 compute nodes, each node has 8 CPUs. I'm using slurm
multicluster on clusters A and B. I tried to run Nas Parallel Benchmarks
(sp.A.x) on them. Initially, I tried to benchmark the execution time and
memory requirements (with valgrind) for the "sp.A.x" benchmark on the
compute nodes on both clusters (without slurm). The obtained execution
times are about 30 and 80 sec on clusters A and B, respectively. Then, I
tried to run 3 sbatch jobs on each cluster (with limited memory to each job
using the "--mem" option). Each job runs an instance of sp.A.x. But, the
execution time of the sbatch jobs are much larger than the original
benchmarks: it ranges between 34-67 sec on cluster A, and 100-250 sec on
cluster B, in comparison to 30 and 80 sec of the original benchmarks.
Afterwards, I removed the memory limit from the submitted jobs, then
execution time of each sbatch job is reduced to the normal values, but
ofcourse, each job allocates a whole node which increased the total finish
time of all jobs.

I don't understand why the execution time of each job was increased in the
first place?

Also, the multicluster documentation at "
https://slurm.schedmd.com/multi_cluster.html" says:
Slurm will immediately submit the job to the cluster that offers the
earliest start time subject its queue of pending and running jobs. Slurm
will make no subsequent effort to migrate the job to a different cluster
(from the list) whose resources become available when running jobs finish
before their scheduled end times.
So, I thought that each cluster would be allocated a number of jobs
(approximately) equal to the number of CPUs at the cluster before
submission to the other cluster, but this wasn't the case. I can't follow
the allocation of resources based on the scheduling decision. For
illustration, I'm including an Excel sheet with the allocation of clusters
and nodes. I just can't predict the allocation of jobs to resources.

