Hi Everyone,

 

I’m a new to slurm administration and looking for a bit of help!

 

Just added Accounting to an existing cluster but job information is not being added to the Accounting Mariadb. When I submit a test job it gets scheduled fine and its visible with squeue, I get nothing returned from sacct!

 

I have turned up the logging to debug5 on both slurmctld and slurmdbd logs and can’t see any errors. I believe all the comms are ok between slurmctld and slurmdbd as when I enter the sacct command I can see the database is being queried but returning nothing, because nothing has been added to the tables. The cluster tables were created fine when I ran

 

#sacctmgr add cluster ny5ktt

 

$ sacct

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode

------------ ---------- ---------- ---------- ---------- ---------- --------

 

# tail -f slurmdbd.log

[2024-10-17T12:34:45.232] debug:  REQUEST_PERSIST_INIT: CLUSTER:ny5ktt VERSION:9216 UID:10001 IP:10.202.233.117 CONN:10

[2024-10-17T12:34:45.232] debug2: accounting_storage/as_mysql: acct_storage_p_get_connection: acct_storage_p_get_connection: request new connection 1

[2024-10-17T12:34:45.233] debug2: Attempting to connect to localhost:3306

[2024-10-17T12:34:45.274] debug2: DBD_GET_JOBS_COND: called

[2024-10-17T12:34:45.317] debug2: DBD_FINI: CLOSE:1 COMMIT:0

[2024-10-17T12:34:45.317] debug4: accounting_storage/as_mysql: acct_storage_p_commit: got 0 commits

 

The Mariadb is running on it own node with slurmdbd and munged for authentication. I haven’t setup any accounts, users, asssociations or enforcements yet. On my lab cluster, jobs were visible in the database without these being setup. I guess I must be missing something simple in the config that is stopping jobs being reported to slurmdbd.

 

Master Node packages

# rpm -qa |grep slurm

slurm-slurmdbd-20.11.9-1.el8.x86_64

slurm-libs-20.11.9-1.el8.x86_64

slurm-20.11.9-1.el8.x86_64

slurm-slurmd-20.11.9-1.el8.x86_64

slurm-perlapi-20.11.9-1.el8.x86_64

slurm-doc-20.11.9-1.el8.x86_64

slurm-contribs-20.11.9-1.el8.x86_64

slurm-slurmctld-20.11.9-1.el8.x86_64

 

Database Node packages

# rpm -qa |grep slurm

slurm-slurmdbd-20.11.9-1.el8.x86_64

slurm-20.11.9-1.el8.x86_64

slurm-libs-20.11.9-1.el8.x86_64

slurm-devel-20.11.9-1.el8.x86_64

 

slurm.conf

#

# See the slurm.conf man page for more information.

#

ClusterName=ny5ktt

ControlMachine=ny5-pr-kttslurm-01

ControlAddr=10.202.233.71

#BackupController=

#BackupAddr=

#

AuthType=auth/munge

#CheckpointType=checkpoint/none

CryptoType=crypto/munge

#DisableRootJobs=NO

#EnforcePartLimits=NO

#Epilog=

#EpilogSlurmctld=

#FirstJobId=1

#MaxJobId=999999

#GresTypes=

#GroupUpdateForce=0

#GroupUpdateTime=600

#JobCheckpointDir=/var/slurm/checkpoint

#JobCredentialPrivateKey=

#JobCredentialPublicCertificate=

#JobFileAppend=0

#JobRequeue=1

#JobSubmitPlugins=

#KillOnBadExit=0

#LaunchType=launch/slurm

#Licenses=foo*4,bar

MailProg=/bin/true

MaxJobCount=200000

#MaxStepCount=40000

#MaxTasksPerNode=128

MpiDefault=none

#MpiParams=ports=#-#

#PluginDir=

#PlugStackConfig=

#PrivateData=jobs

ProctrackType=proctrack/cgroup

#Prolog=

#PrologFlags=

#PrologSlurmctld=

#PropagatePrioProcess=0

#PropagateResourceLimits=

#PropagateResourceLimitsExcept=

#RebootProgram=

ReturnToService=1

#SallocDefaultCommand=

SlurmctldPidFile=/var/run/slurm/slurmctld.pid

SlurmctldPort=6817

SlurmdPidFile=/var/run/slurm/slurmd.pid

SlurmdPort=6818

SlurmdSpoolDir=/var/spool/slurm/d

SlurmUser=slurm

#SlurmdUser=root

#SrunEpilog=

#SrunProlog=

StateSaveLocation=/var/spool/slurm/ctld

SwitchType=switch/none

#TaskEpilog=

TaskPlugin=task/none

#TaskPluginParam=

#TaskProlog=

#TopologyPlugin=topology/tree

#TmpFS=/tmp

#TrackWCKey=no

#TreeWidth=

#UnkillableStepProgram=

#UsePAM=0

#

#

# TIMERS

#BatchStartTimeout=10

#CompleteWait=0

#EpilogMsgTime=2000

#GetEnvTimeout=2

#HealthCheckInterval=0

#HealthCheckProgram=

InactiveLimit=0

KillWait=30

#MessageTimeout=10

#ResvOverRun=0

#MinJobAge=300

#MinJobAge=43200

# CHG0057915

MinJobAge=14400

# CHG0057915

#MaxJobCount=50000

#MaxJobCount=100000

#OverTimeLimit=0

SlurmctldTimeout=120

SlurmdTimeout=300

#UnkillableStepTimeout=60

#VSizeFactor=0

Waittime=0

#

#

# SCHEDULING

DefMemPerCPU=3000

#FastSchedule=1

#MaxMemPerCPU=0

#SchedulerTimeSlice=30

SchedulerType=sched/backfill

SelectType=select/cons_tres

#SelectTypeParameters=CR_Core

#SelectTypeParameters=CR_CPU

SelectTypeParameters=CR_CPU_Memory

# ECR CHG0056915 10/14/2023

MaxArraySize=5001

#

#

# JOB PRIORITY

#PriorityFlags=

#PriorityType=priority/basic

#PriorityDecayHalfLife=

#PriorityCalcPeriod=

#PriorityFavorSmall=

#PriorityMaxAge=

#PriorityUsageResetPeriod=

#PriorityWeightAge=

#PriorityWeightFairshare=

#PriorityWeightJobSize=

#PriorityWeightPartition=

#PriorityWeightQOS=

#

#

# LOGGING AND ACCOUNTING

#AccountingStorageEnforce=0

#AccountingStorageEnforce=limits

AccountingStorageHost=ny5-pr-kttslurmdb-01.ktt.schonfeld.com

#AccountingStorageLoc=

#AccountingStoragePass=

#AccountingStoragePort=

#AccountingStorageType=accounting_storage/none

AccountingStorageType=accounting_storage/slurmdbd

#AccountingStorageUser=

AccountingStoreJobComment=YES

#DebugFlags=

#JobCompHost=

#JobCompLoc=

#JobCompPass=

#JobCompPort=

JobCompType=jobcomp/none

#JobCompUser=

#JobContainerType=job_container/none

JobAcctGatherFrequency=60

JobAcctGatherType=jobacct_gather/none

SlurmctldDebug=info

SlurmctldLogFile=/var/log/slurm/slurmctld.log

SlurmdDebug=info

SlurmdLogFile=/var/log/slurm/slurmd.log

#SlurmdLogFile=

#SlurmSchedLogFile=

#SlurmSchedLogLevel=

#

#

# POWER SAVE SUPPORT FOR IDLE NODES (optional)

#SuspendProgram=

#ResumeProgram=

#SuspendTimeout=

#ResumeTimeout=

#ResumeRate=

#SuspendExcNodes=

#SuspendExcParts=

#SuspendRate=

#SuspendTime=

#

#

# COMPUTE NODES

##using fqdn since the ctld domain is different. Can't use regex since it's not at the end

##save 17 and 18 as headnodes

#NodeName=ny5-dv-kttres-17 Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400

#NodeName=ny5-dv-kttres-18 Sockets=1 CoresPerSocket=14 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400

NodeName=ny5-dv-kttres-19 Sockets=1 CoresPerSocket=12 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400

NodeName=ny5-dv-kttres-[20-21] Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400

NodeName=ny5-dv-kttres-[01-16] Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 Feature=HyperThread RealMemory=233472

NodeName=ny5-dv-kttres-[22-35] Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 Feature=HyperThread RealMemory=346884

PartitionName=ktt_slurm_light_1 Nodes=ny5-dv-kttres-[19-21] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

PartitionName=ktt_slurm_medium_1 Nodes=ny5-dv-kttres-[01-08] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

PartitionName=ktt_slurm_medium_2 Nodes=ny5-dv-kttres-[09-16] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

PartitionName=ktt_slurm_medium_3 Nodes=ny5-dv-kttres-[22-28] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

PartitionName=ktt_slurm_medium_4 Nodes=ny5-dv-kttres-[29-35] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

PartitionName=ktt_slurm_large_1 Nodes=ny5-dv-kttres-[01-16] Default=YES MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

PartitionName=ktt_slurm_large_2 Nodes=ny5-dv-kttres-[22-35] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

 

Slurmdbd.conf

AuthType=auth/munge

DbdAddr=10.202.233.72

DbdHost=ny5-pr-kttslurmdb-01

DebugLevel=debug5

LogFile=/var/log/slurm/slurmdbd.log

PidFile=/tmp/slurmdbd.pid

StorageType=accounting_storage/mysql

StorageHost=localhost

#StorageHost=10.234.132.57

StorageUser=slurm

SlurmUser=slurm

StoragePass=xxxxxxx

#StorageUser=slurm

#StorageLoc=slurm_acct_db

 

Database tables

 

MariaDB [slurm_acct_db]> show tables;

+--------------------------------+

| Tables_in_slurm_acct_db        |

+--------------------------------+

| acct_coord_table               |

| acct_table                     |

| clus_res_table                 |

| cluster_table                  |

| convert_version_table          |

| federation_table               |

| ny5ktt_assoc_table             |

| ny5ktt_assoc_usage_day_table   |

| ny5ktt_assoc_usage_hour_table  |

| ny5ktt_assoc_usage_month_table |

| ny5ktt_event_table             |

| ny5ktt_job_table               |

| ny5ktt_last_ran_table          |

| ny5ktt_resv_table              |

| ny5ktt_step_table              |

| ny5ktt_suspend_table           |

| ny5ktt_usage_day_table         |

| ny5ktt_usage_hour_table        |

| ny5ktt_usage_month_table       |

| ny5ktt_wckey_table             |

| ny5ktt_wckey_usage_day_table   |

| ny5ktt_wckey_usage_hour_table  |

| ny5ktt_wckey_usage_month_table |

| qos_table                      |

| res_table                      |

| table_defs_table               |

| tres_table                     |

| txn_table                      |

| user_table                     |

+--------------------------------+

 

Many Thanks

 

Adrian

 



Disclaimer

Schonfeld Strategic Advisors (UK) LLP (“SSA UK”) is authorised and regulated by The Financial Conduct Authority. SSA UK is a limited liability partnership in England and Wales (No: OC420598) and its registered office is at 78 St. James's Street, London, SW1A 1JB. The contents of this message, including any attachments, are meant solely for the intended recipient and may be confidential, privileged, or otherwise protected from disclosure. If you receive this message in error, immediately alert the sender by reply e-mail, delete it and any attachments or copies from your systems, and do not read, disclose, distribute, or otherwise use the information contained herein. We do not waive any confidentiality or privilege if this message was misdirected. This e-mail does not constitute an offer to sell or a solicitation to buy any securities or an offer of any investment advisory services. If you reply to this email please note that we invest in securities and do not want to receive material, non-public information and you are instructed not to communicate any such information to us. We do not agree to keep confidential any information you provide nor restrict our trading activity, except as agreed pursuant to a written confidentiality agreement duly executed by us. We reserve the right to monitor and review the content of all messages sent to or from this e-mail address.