> Every job will need at least 1 core just to run 
> and if there are only 4 cores on the machine,
> one would expect a max of 4 jobs to run.

I have 3500+ GPU cores available. You mean each GPU job requires at least one CPU? Can't we run a job with just GPU without any CPUs? This sbatch script requires 100 GPU cores, can;t we run 35 in parallel?

#! /usr/bin/env bash

#SBATCH --output="%j.out"
#SBATCH --error="%j.error"
#SBATCH --partition=pgpu
#SBATCH --gres=shard:100

sleep 10
echo "Current date and time: $(date +"%Y-%m-%d %H:%M:%S")"
echo "Running..."
sleep 10






On Thu, Jun 20, 2024 at 11:23 PM Brian Andrus via slurm-users <slurm-users@lists.schedmd.com> wrote:
Well, if I am reading this right, it makes sense.

Every job will need at least 1 core just to run and if there are only 4
cores on the machine, one would expect a max of 4 jobs to run.

Brian Andrus

On 6/20/2024 5:24 AM, Arnuld via slurm-users wrote:
> I have a machine with a quad-core CPU and an Nvidia GPU with 3500+
> cores.  I want to run around 10 jobs in parallel on the GPU (mostly
> are CUDA based jobs).
>
> PROBLEM: Each job asks for only 100 shards (runs usually for a minute
> or so), then I should be able to run 3500/100 = 35 jobs in
> parallel but slurm runs only 4 jobs in parallel keeping the rest in
> the queue.
>
> I have this in slurm.conf and gres.conf:
>
> # GPU
> GresTypes=gpu,shard
> # COMPUTE NODES
> PartitionName=pzero Nodes=ALL Default=YES MaxTime=INFINITE State=UP`
> PartitionName=pgpu Nodes=hostgpu MaxTime=INFINITE State=UP
> NodeName=hostgpu NodeAddr=x.x.x.x Gres=gpu:gtx_1080_ti:1,shard:3500
> CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1
> RealMemory=64255 State=UNKNOWN
> ----------------------
> Name=gpu Type=gtx_1080_ti File=/dev/nvidia0 Count=1
> Name=shard Count=3500  File=/dev/nvidia0
>
>
>

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com