[slurm-users] Questions about dynamic nodes

Groner, Rob rug262 at psu.edu
Tue Sep 27 15:26:25 UTC 2022


I have 2 nodes that offer a "gc" feature.  Node t-gc-1202 is "normal", and node t-gc-1201 is dynamic.  I can successfully remove t-gc-1201 and bring it back dynamically.  Once I bring it back, that node appears JUST LIKE the "normal" node in the sinfo output, as seen here:

[rug262 at testsch (RC) slurm] sinfo -o "%20N  %10c  %10m  %25f  %10G "
NODELIST              CPUS        MEMORY      AVAIL_FEATURES             GRES
t-sc-[1101-1104]      48          358400      nogpu,sc                   (null)
t-gc-1201             48          385420      gpu,gc,a100                gpu:2(S:0-
t-gc-1202             48          358400      gpu,gc,a100                gpu:2
t-ic-1051             36          500000      ic,a40                     (null)

When I execute a job requiring 24 CPUs and the gc feature, then it runs on t-gc-1202 only.  If I sbatch 3 of the same jobs at once, then 2 run on t-gc-1202 and the 3rd is pending for resources.

[rug262 at testsch (RC) slurm] squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               405 open-requ gpu_test   rug262 PD       0:00      1 (Resources)
               404 open-requ gpu_test   rug262  R       0:06      1 t-gc-1202
               403 open-requ gpu_test   rug262  R       0:07      1 t-gc-1202

Both nodes show up in the partitions and show idle before starting the jobs:

[rug262 at testsch (RC) slurm] sinfo
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
open*            up 2-00:00:00      4   idle t-sc-[1101-1104]
open-requeue     up 2-00:00:00      6   idle t-gc-[1201-1202],t-sc-[1101-1104]
intr             up 2-00:00:00      1   idle t-ic-1051
sla-prio         up   infinite      6   idle t-gc-[1201-1202],t-sc-[1101-1104]
burst            up   infinite      4   idle t-sc-[1101-1104]
burst-requeue    up   infinite      6   idle t-gc-[1201-1202],t-sc-[1101-1104]
debug            up   infinite      7   idle t-gc-[1201-1202],t-ic-1051,t-sc-[1101-1104]


So my 2 questions:

  1.  How do I get my dynamic node to be utilized like the non-dynamic nodes?
  2.  I want to have a DIFFERENT feature on my dynamic node, that is not present in the "normal" nodes.  When a job is submitted that requires the feature of the dynamic node, I need the job to suspend until the dynamic node becomes available.  How do I go about setting that up?

Thanks.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220927/4698095a/attachment.htm>


More information about the slurm-users mailing list