[slurm-users] Accounting Information from slurmdbd does not reach slurmctld

Thu Mar 19 11:05:09 UTC 2020

Hi everyone,

we currently have a problem with our SLURM setup for a small cluster of 7 machines. The problem is that the accounted core usage is not correctly used for the share computation. I have set up a minimal (not) working example.

In this example, we have one cluster to which we have added an account 'iasteam‘ as well as some users with the sacctmgr tool. Right after executing the corresponding commands and running 'sshare -al' we get the following output:

Account   	User      	RawShares  	NormShares    RawUsage  	NormUsage  	EffectvUsage	FairShare    	LevelFS                    
------------ 		---------- 	---------- 		----------- 		----------- 		----------- 		------------- 	---------- 		---------- 
root                                     			0.000000  	0                  				1.000000                                                      
 root       		root       	1   			0.500000       	0                  	0.000000      	0.000000   	1.000000        	inf                                
 iasteam                         	1    			0.500000       	0                  	0.000000      	0.000000                          	inf                                
  iasteam       	carvalho 	1    			0.250000       	0                  				0.000000     	0.000000   	0.000000                                
  iasteam       	hany       	1    			0.250000       	0                 				0.000000     	0.000000   	0.000000                                
  iasteam        	pascal    	1    			0.250000        	0                  				0.000000     	0.000000   	0.000000                                
  iasteam       	stark       	1    			0.250000        	0                  				0.000000     	0.000000   	0.000000                               

One thing that I think is already strange here is that the ‚FairShare' value is set to zero and no ‚NormUsage' appears. But anyways, after executing the following commands:

	sudo systemctl stop slurmctld
	sudo systemctl restart slurmdbd
	sudo systemctl start slurmctld

I get an output that looks better to me

Account   	User      	RawShares  	NormShares    RawUsage  	NormUsage  	EffectvUsage	FairShare    	LevelFS                    
------------ 		---------- 	---------- 		----------- 		----------- 		----------- 		------------- 	---------- 		---------- 
root                                     			0.000000  	0                  				1.000000                                                      
 root       		root       	1   			0.500000       	0                  	0.000000      	0.000000   	1.000000        	inf                                
 iasteam                         	1    			0.500000       	0                  	0.000000      	0.000000                          	inf                                
  iasteam       	carvalho 	1    			0.250000       	0                  	0.000000		0.000000     	1.000000   	inf
  iasteam       	hany       	1    			0.250000       	0                 	0.000000		0.000000     	1.000000   	inf
  iasteam        	pascal    	1    			0.250000        	0                  	0.000000		0.000000     	1.000000   	inf
  iasteam       	stark       	1    			0.250000        	0                  	0.000000		0.000000     	1.000000   	inf                              

The next thing I did was to run a job with the user pascal, cancelling after ~3:33 minutes on a node with 32 cores. When I then execute 

	sudo sacct -u pascal -o User,UserCPU,CPUTimeRAW,JobID

I get the following output:

User   	UserCPU		CPUTimeRAW	JobID 
--------- 	---------- 		---------- 			------------ 
pascal  	02:53.154       	6816 			776_2        
           	02:53.154       	6848 			776_2.batch

Dividing 6848 by 32 yields 214 seconds, which is 3:34 minutes. So this calculation checks out. The problem now is, that this data is not reflected in the call to 'sshare -al', which still yields

Account   	User      	RawShares  	NormShares    RawUsage  	NormUsage  	EffectvUsage	FairShare    	LevelFS                    
------------ 		---------- 	---------- 		----------- 		----------- 		----------- 		------------- 	---------- 		---------- 
root                                     			0.000000  	0                  				1.000000                                                      
 root       		root       	1   			0.500000       	0                  	0.000000      	0.000000   	1.000000        	inf                                
 iasteam                         	1    			0.500000       	0                  	0.000000      	0.000000                          	inf                                
  iasteam       	carvalho 	1    			0.250000       	0                  	0.000000		0.000000     	1.000000   	inf
  iasteam       	hany       	1    			0.250000       	0                 	0.000000		0.000000     	1.000000   	inf
  iasteam        	pascal    	1    			0.250000        	0                  	0.000000		0.000000     	1.000000   	inf
  iasteam       	stark       	1    			0.250000        	0                  	0.000000		0.000000     	1.000000   	inf     

Even after waiting a night (assuming that the update of the data for sshare may be asynchronous), 'sshare -al' still shows the incorrect usage. I think this is because of some communication failure between slurmdbd and slurmctld, as sacct uses the data from slurmdbd while sshare seems to use data from slurmctld (at least it is not possible to run sshare if slurmctld is not running).

Is this some common misconfiguration of our SLURM setup or is there some other strange thing going on? We already realized that there was a similar question asked in the developer mailing list 6 years ago:

https://slurm-dev.schedmd.narkive.com/nvLr2Rzl/sshare-and-sacct

However, there was not real answer given why this happened. So we thought that maybe this time someone may have an idea.

Best
Pascal

P.S.: Here is the slurm config that we are using, as well as the slurmdbd config:

slurm.conf:
ControlMachine=mn01
ControlAddr=192.168.1.1

MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/none

# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear

# ACCOUNTING
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
AccountingStoragePort=6819
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=10
AccountingStorageEnforce=associations
AccountingStorageUser=slurm
ClusterName=iascluster

# PRIORITY
PriorityType=priority/multifactor
PriorityDecayHalfLife=0
PriorityUsageResetPeriod=MONTHLY
PriorityFavorSmall=NO
PriorityMaxAge=1-0

PriorityWeightAge=500000
PriorityWeightFairshare=1000000
PriorityWeightJobSize=0
PriorityWeightPartition=0
PriorityWeightQOS=0

# LOGGING
SlurmctldDebug=debug
SlurmctldLogFile=var/log/slurm/slurmctld.log

SlurmdDebug=debug
SlurmdLogFile=var/log/slurm/slurmd.log

# COMPUTE NODES
NodeName=cn0[1-7] NodeAddr=192.168.1.1[1-7] RealMemory=64397 Sockets=1 CoresPerSocket=16 ThreadsPerCore=2 Gres=gpu:rtx2080:1
PartitionName=amd Nodes=cn0[1-7] Default=YES MaxTime=INFINITE State=UP

slurmdbd.conf:
AuthType=auth/munge
AuthInfo=/var/run/munge/munge.socket.2
DbdHost=localhost
DbdPort=6819
StorageHost=localhost
StorageLoc=slurm_acct_db
StoragePass=[OURPASSWORD]
StorageType=accounting_storage/mysql
StorageUser=slurm
DebugLevel=debug
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurm-llnl/slurmdbd.pid
SlurmUser=slurm