<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi fellow slurm users - I’ve been using slurm happily for a few months, but now I feel like it’s gone crazy, and I’m wondering if anyone can explain what’s going on.  I have a trivial batch script which I submit multiple times, and ends up with different numbers of nodes allocated. Does anyone have any idea why?  <div class=""><br class=""></div><div class="">Here’s the output:</div><div class=""><br class=""><div class=""><div class=""><div class=""><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class="">tin 2028 : cat t<br class="">#!/bin/bash<br class="">#SBATCH --ntasks=72<br class="">#SBATCH --exclusive<br class="">#SBATCH --partition=n2019<br class="">#SBATCH --ntasks-per-core=1<br class="">#SBATCH --time=00:10:00<br class=""><br class="">echo test<br class="">sleep 600</div><div class=""><br class="">tin 2029 : sbatch t<br class="">Submitted batch job 407758<br class="">tin 2030 : sbatch t<br class="">Submitted batch job 407759<br class="">tin 2030 : sbatch t<br class="">Submitted batch job 407760</div><div class=""><br class="">tin 2030 : squeue -l -u bernstei<br class="">Wed Mar 27 17:30:51 2019<br class="">             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)<br class="">            407760     n2019        t bernstei  RUNNING       0:03     10:00      3 compute-4-[16-18]<br class="">            407758     n2019        t bernstei  RUNNING       0:06     10:00      2 compute-4-[29-30]<br class="">            407759     n2019        t bernstei  RUNNING       0:06     10:00      2 compute-4-[21,28]<br class=""></div></blockquote><br class=""></div></div><div class="">All the compute-4-* nodes have 36 physical cores, 72 hyperthreads.<br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">If I look at the SLURM_* variables, all the jobs show </div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class="">SLURM_NPROCS=72</div><div class="">SLURM_NTASKS=72</div><div class="">SLURM_CPUS_ON_NODE=72</div><div class="">SLURM_NTASKS_PER_CORE=1</div></blockquote>but for some reason the job that ends up on 3 nodes, and only that one, shows</div></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class=""><div class="">SLURM_JOB_CPUS_PER_NODE=72(x3)</div></div></blockquote><div class=""><div class=""><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class="">SLURM_TASKS_PER_NODE=24(x3)<br class=""></div></blockquote>while the others show the expected</div><div class=""><blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;" class="">SLURM_JOB_CPUS_PER_NODE=72(x2)<br class=""></blockquote><blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;" class=""></blockquote><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class="">SLURM_TASKS_PER_NODE=36(x2)<br class=""></div></blockquote><br class=""></div><div class="">I’m using CentOS 7 (via NPACI Rocks) and slurm 18.08.0 via the rocks slurm roll.</div><div class=""><br class=""></div><div class=""><span class="Apple-tab-span" style="white-space:pre">                                                                              </span>thanks,</div><div class=""><span class="Apple-tab-span" style="white-space:pre">                                                                           </span>Noam</div></div></body></html>