[slurm-users] GPUs not available after making use of all threads?
Sebastian Schmutzhard-Höfler
sebastian.schmutzhard-hoefler at univie.ac.at
Thu Feb 9 23:31:12 UTC 2023
Dear all,
we have a node with 2 x 64 CPUs (with two threads each) and 8 GPUs,
running slurm 22.05.5
In order to make use of individual threads, we changed|
|
|SelectTypeParameters=CR_Core||
NodeName=nodename CPUs=256 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 |
to
|SelectTypeParameters=CR_CPU NodeName=nodename CPUs=256|
We are now able to allocate individual threads to jobs, despite the
following error in slurmd.log:
error: Node configuration differs from hardware: CPUs=256:256(hw) Boards=1:1(hw) SocketsPerBoard=256:2(hw) CoresPerSocket=1:64(hw) ThreadsPerCore=1:2(hw)
However, it appears that since this change, we can only make use of 4
out of the 8 GPUs.
The output of "sinfo -o %G" might be relevant.
In the first situation it was
$ sinfo -o %G
GRES
gpu:A100:8(S:0,1)
Now it is:
$ sinfo -o %G
GRES
gpu:A100:8(S:0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126)
||Has anyone faced this or a similar issue and can give me some directions?
Best wishes
Sebastian
||
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230210/e79a3eed/attachment.htm>
More information about the slurm-users
mailing list