[slurm-users] Slurm 20.02.5 problems with --gres=gpu:1 and -c >1

Danny Marc Rotscher danny.rotscher at tu-dresden.de
Fri Nov 6 09:21:48 UTC 2020


Hello,

yesterday we upgrade our cluster from Slurm 20.02.2 to 20.02.5 and recognized some problems with the usage of gpus and more than one cpu per task.
I could reproduce that problem in a little Docker container, which description you could find on the following link.
https://github.com/bikerdanny/docker-centos-slurm/tree/gres-bug <https://github.com/bikerdanny/docker-centos-slurm/tree/gres-bug>

I created a separate branch (gres-bug) for reproducing that problem, please checkout the README.md.

Could anybody of you tell me, what do we wrong, how can we solve that problem?
We also found out that using „--cpus-per-gpu“ instead of „--cpus-per-task“ works with more than 1.

Kind regards and stay healthy
Danny Rotscher


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201106/b35242d0/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5092 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201106/b35242d0/attachment.bin>


More information about the slurm-users mailing list