[slurm-users] fail when trying to set up selection=con_res
Williams, Jenny Avis
jennyw at email.unc.edu
Tue Nov 28 18:45:41 MST 2017
We run in that manner using this config on v.3.10.0-693.5.2.el7.x86_64 This is slurm 17.02.4
Do your compute nodes have hyperthreading enabled ?
AuthType=auth/munge
CryptoType=crypto/munge
AccountingStorageEnforce=limits,qos,safe
AccountingStoragePort=ANumber AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment='yes'
AccountingStorageUser=slurm
CacheGroups=0
EnforcePartLimits='yes'
FastSchedule=1
GresTypes=gpu
InactiveLimit=0
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
KillWait=30
Licenses=mplus:1
MaxArraySize=40001
MaxJobCount=350000
MinJobAge=300
MpiDefault=none
PriorityDecayHalfLife=14-0
PriorityFavorSmall='no'
PriorityFlags=fair_tree
PriorityMaxAge=60-0
PriorityType=priority/multifactor
PriorityWeightAge=1000
PriorityWeightFairshare=10000
PriorityWeightJobSize=1000
PriorityWeightPartition=1000
PriorityWeightQOS=1000
ProctrackType=proctrack/cgroup
RebootProgram=/usr/sbin/reboot
ReturnToService=2
SallocDefaultCommand='"srun -n1 -N1 --gres=gpu:0 --mem-per-cpu=0 --pty --preserve-env --mpi=none $SHELL"'
SchedulerPort=ANumber
SchedulerParameters=kill_invalid_depend
SchedulerType=sched/backfill
SelectTypeParameters=CR_CPU_Memory
SelectType=select/cons_res
SlurmctldDebug=3
SlurmctldPort=NumberRange
SlurmctldTimeout=120
SlurmdDebug=3
SlurmdPort=ANumber
SlurmdTimeout=300
SlurmUser=slurm
SwitchType=switch/none
TaskPlugin=task/cgroup
Waittime=0
ANumber are port numbers or ranges.
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Ethan Van Matre
Sent: Tuesday, November 28, 2017 7:32 PM
To: slurm-users at lists.schedmd.com
Subject: [slurm-users] fail when trying to set up selection=con_res
I've been trying to setup a slurm cluster with con_res enabled. No luck.
Running on ubuntu 16.04
When using linear selection all works as expected. Jobs are schedules and run their course then exit. Exclusive use of the node is granted.
We would like to schedule based on cpu (cores actually) and set thus:
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
#SelectType=select/linear
SelectType=select/cons_res
SelectTypeParameters=CR_CORE
#SelectTypeParameters=CR_CPU
When launching jobs we more than one at a time per node but the jobs become hung in a COMPLETING state. Not sure if they ever started.
Can anyone point me to how to set up slurm so that allocation is on a cpu (core) basis with as many jobs as there are cores running on each node?
Regards
Ethan VanMatre
Informatics Research Analyst
Institute on Development and Disability
Oregon Health & Science University
CSLU - GH40
3181 SW Sam Jackson Park Rd
Portland, OR 97239
(503) 346-3764
vanmatre at ohsu.edu<mailto:vanmatre at ohsu.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171129/e9a6d328/attachment-0001.html>
More information about the slurm-users
mailing list