[slurm-users] Simple free for all cluster

Renfro, Michael Renfro at tntech.edu
Fri Oct 2 13:49:44 UTC 2020


Depending on the users who will be on this cluster, I'd probably adjust the partition to have a defined, non-infinite MaxTime, and maybe a lower DefaultTime. Otherwise, it would be very easy for someone to start a job that reserves all cores until the nodes get rebooted, since all they have to do is submit a job with no explicit time limit (which would then use DefaultTime, which itself has a default value of MaxTime). 

On 10/2/20, 7:37 AM, "slurm-users on behalf of John H" <slurm-users-bounces at lists.schedmd.com on behalf of jsh at SDF.ORG> wrote:

    Hi All

    Hope you are all keeping well in these difficult times.

    I have setup a small Slurm cluster of 8 compute nodes (4 x 1-core CPUs, 16GB RAM) without scheduling or accounting as it isn't really needed.

    I'm just looking for confirmation it's configured correctly to allow the controller to 'see' all resource and allocate incoming jobs to the most readily available node in the cluster. I can see
    jobs are being delivered to different nodes but want to ensure I haven't inadvertently done anything to render it sub optimal (even in such a simple use case!)

    Thanks very much for any assistance, here is my cfg:

    #
    # SLURM.CONF
    ControlMachine=slnode1
    BackupController=slnode2
    MpiDefault=none
    ProctrackType=proctrack/pgid
    ReturnToService=1
    SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
    SlurmctldPort=6817
    SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
    SlurmdPort=6818
    SlurmdSpoolDir=/var/spool/slurmd
    SlurmUser=slurm
    StateSaveLocation=/var/spool/slurm-llnl
    SwitchType=switch/none
    TaskPlugin=task/none
    #
    # TIMERS
    MinJobAge=86400
    #
    # SCHEDULING
    FastSchedule=1
    SchedulerType=sched/backfill
    SelectType=select/cons_res
    SelectTypeParameters=CR_CPU_MEMORY
    #
    # LOGGING AND ACCOUNTING
    AccountingStorageType=accounting_storage/none
    ClusterName=cluster
    JobAcctGatherType=jobacct_gather/none
    SlurmctldDebug=3
    SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
    SlurmdDebug=3
    SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
    #
    # COMPUTE NODES
    NodeName=slnode[1-8] CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=16017
    PartitionName=sl Nodes=slnode[1-8] Default=YES MaxTime=INFINITE State=UP

    John



More information about the slurm-users mailing list