[slurm-users] slurm only looking in "default" partition during scheduling

Durai Arasan arasan.durai at gmail.com
Tue May 12 14:47:04 UTC 2020


Hi,
We have a cluster with 2 slave nodes. These are the slurm.conf lines
describing nodes and partitions:




*NodeName=slurm-gpu-1 NodeAddr=192.168.0.200  Procs=16 Gres=gpu:2
State=UNKNOWNNodeName=slurm-gpu-2 NodeAddr=192.168.0.124  Procs=1
Gres=gpu:0 State=UNKNOWNPartitionName=gpu Nodes=slurm-gpu-1 Default=YES
MaxTime=INFINITE State=UPPartitionName=compute Nodes=slurm-gpu-2
Default=YES MaxTime=INFINITE State=UP*

Running sinfo gives the following:



*PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELISTgpu          up
infinite      1   idle slurm-gpu-1compute*     up   infinite      1   idle
slurm-gpu-2*

When I request a gpu job to be run using the following command:

*srun --gres=gpu:2 nvidia-smi*

I get the error:


*srun: error: Unable to allocate resources: Requested node configuration is
not available*

and in slurmctld.log these are the entries:


*[2020-05-12T14:33:47.578] _pick_best_nodes: JobId=55 never runnable in
partition compute[2020-05-12T14:33:47.578] _slurm_rpc_allocate_resources:
Requested node configuration is not available *

It seems like slurm is looking only in the partition "compute" and not in
the other partitions.
Even if I explicitly specify the gpu node to srun it fails:


*srun --nodelist=slurm-gpu-1 nvidia-smi*

I get the same error:


*srun: error: Unable to allocate resources: Requested node configuration is
not available*

and in slurmctld.log:


*[2020-05-12T14:38:57.242] No nodes satisfy requirements for JobId=56 in
partition compute[2020-05-12T14:38:57.242] _slurm_rpc_allocate_resources:
Requested node configuration is not available*

It is still looking in partition "compute" even after specifying the node
to srun.

But when I specify a partition, it works:

*srun -p gpu nvidia-smi*

But I would not like to specify the partition and would like slurm to
select nodes based on the options specified in the srun command. Does
anyone understand what is wrong in the setup?

Thanks,
Durai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200512/792200dc/attachment.htm>


More information about the slurm-users mailing list