[slurm-users] gres:gpu managment
Daniel Vecerka
vecerka at fel.cvut.cz
Thu May 23 08:11:26 UTC 2019
Jobs ends on the same GPU. If I run CUDA deviceQuery in the sbatch I get:
Device PCI Domain ID / Bus ID / location ID: 0 / 97 / 0
Device PCI Domain ID / Bus ID / location ID: 0 / 97 / 0
Device PCI Domain ID / Bus ID / location ID: 0 / 97 / 0
Device PCI Domain ID / Bus ID / location ID: 0 / 97 / 0
Our cgroup.conf :
/etc/slurm/cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
Daniel
On 23.05.2019 9:54, Aaron Jackson wrote:
> Do jobs actually end up on the same GPU though? cgroups will always
> refer to the first allocated GPU as 0, so it is not unexpected for each
> job have CUDA_VISIBLE_DEVICES set to 0. Make sure you have the following
> in /etc/cgroup.conf
>
> ConstrainDevices=yes
>
> Aaron
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3726 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190523/501c0978/attachment.bin>
More information about the slurm-users
mailing list