<div dir="ltr"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Hi<br><br>I have 2 slurm clusters: cluster A with 3 compute nodes, each node has 32 CPUs; Cluster B with 4 compute nodes, each node has 8 CPUs. I'm using slurm multicluster on clusters A and B. I tried to run Nas Parallel Benchmarks (sp.A.x) on them. Initially, I tried to benchmark the execution time and memory requirements (with valgrind) for the "sp.A.x" benchmark on the compute nodes on both clusters (without slurm). The obtained execution times are about 30 and 80 sec on clusters A and B, respectively. Then, I tried to run 3 sbatch jobs on each cluster (with limited memory to each job using the "--mem" option). Each job runs an instance of sp.A.x. But, the execution time of the sbatch jobs are much larger than the original benchmarks: it ranges between 34-67 sec on cluster A, and 100-250 sec on cluster B, in comparison to 30 and 80 sec of the original benchmarks. Afterwards, I removed the memory limit from the submitted jobs, then execution time of each sbatch job is reduced to the normal values, but ofcourse, each job allocates a whole node which increased the total finish time of all jobs.<br><br>I don't understand why the execution time of each job was increased in the first place?<br><br>Also, the multicluster documentation at "<a href="https://slurm.schedmd.com/multi_cluster.html">https://slurm.schedmd.com/multi_cluster.html</a>" says:<br>"<br>Slurm will immediately submit the job to the cluster that offers the earliest start time subject its queue of pending and running jobs. Slurm will make no subsequent effort to migrate the job to a different cluster (from the list) whose resources become available when running jobs finish before their scheduled end times.<br>"<br>So, I thought that each cluster would be allocated a number of jobs (approximately) equal to the number of CPUs at the cluster before submission to the other cluster, but this wasn't the case. I can't follow the allocation of resources based on the scheduling decision. For illustration, I'm including an Excel sheet with the allocation of clusters and nodes. I just can't predict the allocation of jobs to resources.<br><br>Regards<br></div></div></div>