thanks, that was the bug, now it works


On 16.10.24 15:25, Groner, Rob wrote:
Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at 32000 and the nodes only have 31000 real memory available.

Rob


From: Jörg Striewski via slurm-users <slurm-users@lists.schedmd.com>
Sent: Wednesday, October 16, 2024 4:05 AM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Problem with nodes with 1 gpu
 
i cannot send jobs to nodes with one gpu, i don't find the bug in my
configuration. can someone help me ?

in slurm.conf    GresTypes=gpu is set

this are some nodes in slurm.conf

NodeName=gpu-[001-003]      CPUs=8    SocketsPerBoard=1
CoresPerSocket=4   RealMemory=31000   Gres=gpu:1080:1
NodeName=gpu-[010-019]      CPUs=16   SocketsPerBoard=1
CoresPerSocket=8   RealMemory=64000   Gres=gpu:1080:2

the partition for this gpu nodes is

# General GPU partitions
PartitionName=GPU   Nodes=gpu-[001-003,010-019] AllowAccounts=staff 
PreemptMode=REQUEUE  PriorityTier=0 DefMemPerGPU=32000  DefCpuPerGPU=8 
CpuBind=none TRESBillingWeights="GRES/gpu=1000"  GraceTime=300

this are the entries for some nodes in gres.conf

NodeName=gpu-[001-003]   Name=gpu   Type=1080   File=/dev/nvidia0
NodeName=gpu-[010-019]   Name=gpu   Type=1080 File=/dev/nvidia[0-1]

when i send a job with sbatch to gpu-001

#SBATCH --job-name=hello
#SBATCH --ntasks-per-node=1
#SBATCH --output=hello_%A.out
#SBATCH --time=00:10:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=striewski@ismll.de
#SBATCH --partition=GPU
#SBATCH --nodelist=gpu-001
#SBATCH --gres=gpu:1

[...]

i get the error

sbatch: error: Batch job submission failed: Requested node configuration
is not available

when i send the job to a node with 2 gpu's it runs with no error, just
setting --nodelist=gpu-12

has someone a hint what i made wrong ?


Mit freundlichen Grüßen / kind regards

--
Jörg Striewski

Information Systems and Machine Learning Lab (ISMLL)
Institute of Computer Science
University of Hildesheim Germany
post address: Universitätsplatz 1, D-31141Hildesheim, Germany
visitor address: Samelsonplatz 1, D-31141 Hildesheim,Germany
Tel.(+49) 05121 / 883-40392
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ismll.uni-hildesheim.de%2F&data=05%7C02%7Crug262%40psu.edu%7C27ff9611a1bb425f391f08dcedb9a7b4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638646628848815045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=kpE%2BFiIm8PUznv8mx7jCJpOP1U1VQZaJnZO06%2FM%2FRZQ%3D&reserved=0


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Mit freundlichen Grüßen / kind regards

-- 
Jörg Striewski

Information Systems and Machine Learning Lab (ISMLL)
Institute of Computer Science
University of Hildesheim Germany
post address: Universitätsplatz 1, D-31141Hildesheim, Germany
visitor address: Samelsonplatz 1, D-31141 Hildesheim,Germany
Tel.(+49) 05121 / 883-40392
http://www.ismll.uni-hildesheim.de