[slurm-users] Running 2 jobs on one node uses the same cores, 300x slowdown
Anne Hammond
hammond at txcorp.com
Wed Nov 24 00:33:00 UTC 2021
We are running slurm 20.11.2-1 from CentOS 7 rpms.
The queue is set up to allow OverSubscribe:
NodeName=ne[04-09] CPUs=32 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
PartitionName=neon-noSMT Nodes=ne[04-09] Default=NO MaxTime=3-00:00:00 DefaultTime=4:00:00 State=UP OverSubscribe=YES
I requested a user submit the first job:
#SBATCH --partition=neon-noSMT
#SBATCH --job-name="ns072"
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --time=24:00:00
#SBATCH --exclusive
#SBATCH --error=ns072.err
#SBATCH --output=ns072.out
#SBATCH --mail-type=ALL # NONE, BEGIN, END, FAIL, REQUEUE, ALL
#SBATCH --mail-user=u <mailto:mail-user=tgjenkins at txcorp.com>ser at corp.com
I requested the user submit the second job using the same SBATCH
commands as above, but adding:
#SBATCH —-oversubscribe
and the command to run the second job on the same node
as the first job:
sbatch —nodelist={node running first job} run.sbatch
Note each job only uses 8 ntasks/cores, out of 32 available.
When he submits the second job, the first job slows
down to 300x slower.
If I login to the node running the 2 jobs, only the top 8
cores/ntasks are being used, not 8 for each job.
These are the SCHEDULING parameters from /etc/slurm/slurm.conf:
# SCHEDULING
# out 29Dec20
#FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
SelectTypeParameters=CR_ONE_TASK_PER_CORE
Is there a different parameter I should be looking at?
Thanks in advance,
Anne Hammond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211123/ed621796/attachment-0001.htm>
More information about the slurm-users
mailing list