[slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

Dean Schulze dean.w.schulze at gmail.com
Mon Feb 3 23:59:19 UTC 2020


When I run an sbatch script with the line

#SBATCH --gres=gpu:gp100:1

it runs.  When I change it to

#SBATCH --gres=gpu:gp100:3

it fails with "Requested node configuration is not available".  But I have
a node with 4 gp100s available.  Here's my slurm.conf:

NodeName=liqidos-dean-node1 CPUs=2 Boards=1 SocketsPerBoard=2
CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770 Gres=gpu:gp100:4

That node has a gres.conf with these lines:

Name=gpu Type=gp100  File=/dev/nvidia0
Name=gpu Type=gp100  File=/dev/nvidia1
Name=gpu Type=gp100  File=/dev/nvidia2
Name=gpu Type=gp100  File=/dev/nvidia3

The character devices all exist in /dev.

What's the controller complaining about?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200203/5a3545f5/attachment.htm>


More information about the slurm-users mailing list