[slurm-users] Simple free for all cluster
Renfro, Michael
Renfro at tntech.edu
Fri Oct 2 13:49:44 UTC 2020
Depending on the users who will be on this cluster, I'd probably adjust the partition to have a defined, non-infinite MaxTime, and maybe a lower DefaultTime. Otherwise, it would be very easy for someone to start a job that reserves all cores until the nodes get rebooted, since all they have to do is submit a job with no explicit time limit (which would then use DefaultTime, which itself has a default value of MaxTime).
On 10/2/20, 7:37 AM, "slurm-users on behalf of John H" <slurm-users-bounces at lists.schedmd.com on behalf of jsh at SDF.ORG> wrote:
Hi All
Hope you are all keeping well in these difficult times.
I have setup a small Slurm cluster of 8 compute nodes (4 x 1-core CPUs, 16GB RAM) without scheduling or accounting as it isn't really needed.
I'm just looking for confirmation it's configured correctly to allow the controller to 'see' all resource and allocate incoming jobs to the most readily available node in the cluster. I can see
jobs are being delivered to different nodes but want to ensure I haven't inadvertently done anything to render it sub optimal (even in such a simple use case!)
Thanks very much for any assistance, here is my cfg:
#
# SLURM.CONF
ControlMachine=slnode1
BackupController=slnode2
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm-llnl
SwitchType=switch/none
TaskPlugin=task/none
#
# TIMERS
MinJobAge=86400
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_MEMORY
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
# COMPUTE NODES
NodeName=slnode[1-8] CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=16017
PartitionName=sl Nodes=slnode[1-8] Default=YES MaxTime=INFINITE State=UP
John
More information about the slurm-users
mailing list