[slurm-users] Scripts run slower in slurm?

Tue Dec 15 07:20:04 UTC 2020

Hi,

I made a short script in python to test if slurm was correctly limiting the
number of CPUs available to each partition. The script is as follows:
import multiprocessing as mp
import time as t

def fibonacci(n):
    n = int(n)
    def fibon(a,b,n,result):
        c = a+b
        result.append(c)
        if c < n:
            fibon(b,c,n,result)
        return result
    return fibon(0,1,n,[])

def calcnfib(n):
    res = fibonacci(n)
    return res[-1]

def benchmark(pool):
    t0 = t.time()
    out = pool.map(calcnfib, range(1000000, 1000000000,1000))
    tf = t.time()
    return str(tf-t0)

pool = mp.Pool(4)
print("4: " + benchmark(pool))

pool = mp.Pool(32)
print("32: " + benchmark(pool))

pool = mp.Pool(64)
print("64: " + benchmark(pool))

pool = mp.Pool(128)
print("128: " + benchmark(pool))

It is called using the following submission script:
#!/bin/bash
#SBATCH --partition=full
#SBATCH --job-name="Large"
source testenv1/bin/activate
python3 multithread_example.py

The slurm out file reads
4: 5.660163640975952
32: 5.762076139450073
64: 5.8220226764678955
128: 5.85421347618103

However, if I run
source testenv1/bin/activate
python3 multithread_example.py

I find faster and more expected behavior
4: 1.5878620147705078
32: 0.34162330627441406
64: 0.24987316131591797
128: 0.2247719764709472

For reference my slurm configuration file is
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
#SlurmctldHost=localhost
ControlMachine=localhost

#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/home/slurm/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/home/slurm/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurm/slurmd/
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/home/slurm/spool/
SwitchType=switch/none
TaskPlugin=task/affinity

# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core

# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=info
SlurmctldLogFile=/home/slurm/log/slurmctld.log
#SlurmdDebug=info
#SlurmdLogFile=

# COMPUTE NODES
NodeName=localhost CPUs=128 RealMemory=257682 Sockets=1 CoresPerSocket=64
ThreadsPerCore=2 State=UNKNOWN
PartitionName=full Nodes=localhost Default=YES MaxTime=INFINITE State=UP
PartitionName=half Nodes=localhost Default=NO MaxTime=INFINITE State=UP
MaxNodes=1 MaxCPUsPerNode=64 MaxMemPerNode=128841

Here is my cgroup.conf file as well
CgroupAutomount=yes
ConstrainCores=no
ConstrainRAMSpace=no

If anyone has any suggestions for what might be going wrong and why the
script takes much longer when run with slurm, please let me know!

Best,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201214/8795c1c5/attachment-0001.htm>