[slurm-users] sbatch script won't accept --gres that requires more than 1 gpu
Dean Schulze
dean.w.schulze at gmail.com
Mon Feb 3 23:59:19 UTC 2020
When I run an sbatch script with the line
#SBATCH --gres=gpu:gp100:1
it runs. When I change it to
#SBATCH --gres=gpu:gp100:3
it fails with "Requested node configuration is not available". But I have
a node with 4 gp100s available. Here's my slurm.conf:
NodeName=liqidos-dean-node1 CPUs=2 Boards=1 SocketsPerBoard=2
CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770 Gres=gpu:gp100:4
That node has a gres.conf with these lines:
Name=gpu Type=gp100 File=/dev/nvidia0
Name=gpu Type=gp100 File=/dev/nvidia1
Name=gpu Type=gp100 File=/dev/nvidia2
Name=gpu Type=gp100 File=/dev/nvidia3
The character devices all exist in /dev.
What's the controller complaining about?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200203/5a3545f5/attachment.htm>
More information about the slurm-users
mailing list