[slurm-users] slurmdbd purge not working

Paul Edmon pedmon at cfa.harvard.edu
Thu Apr 4 14:13:47 UTC 2019


We ran into this problem in the past.  I know that fixes were put in to 
deal with large purges as a result of our problems but I don't recall 
what version they ended up in, likely newer than 15.08.0.

A solution that can work is to walk up the time so that instead of one 
large purge you do several smaller purges.  That at least worked for us 
in the past.

-Paul Edmon-

On 4/4/19 9:38 AM, Julien Rey wrote:
> Hello,
>
> Our slurm accounting database is growing bigger and bigger (more than 
> 100Gb) and is never being purged. We are running slurm 15.08.0-0pre1. 
> I would like to upgrade to a more recent version of the slurmdbd, but 
> my fear is that it may break everything during the update of the 
> database.
>
> Here is our slurmdbd.conf :
>
> AuthType=auth/munge
> AuthInfo=/var/run/munge/munge.socket.2
> DbdHost=localhost
> DebugLevel=6
> StorageHost=localhost
> StorageLoc=slurm_acct_db
> StoragePass=shazaam
> StorageType=accounting_storage/mysql
> StorageUser=slurm
> LogFile=/var/log/slurm-llnl/slurmdbd.log
> PidFile=/var/run/slurm-llnl/slurmdbd.pid
> SlurmUser=slurm
> ArchiveDir=/home/joule/archives
> PurgeEventAfter=18
> PurgeJobAfter=18
> PurgeResvAfter=1
> PurgeStepAfter=1
> PurgeSuspendAfter=1
>
> I tried to purge it manually using this command but the slurmdbd 
> daemon ends up crashing and it doesn't remove anything:
>
> sacctmgr archive dump Directory=/home/joule/archives/ 
> PurgeJobAfter=365days
>
> sacctmgr: error: slurmdbd: Getting response to message type 1459
> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>  Problem dumping archive: Unspecified error
>
> Sometimes I have to restart the mysql daemon (we are running mysql 
> 5.5.39-1). The /var/log/slurm-llnl/slurmdbd.log shows nothings. The 
> mysql logs are empty.
>
> I tried to increase these values in my.cnf but so far no success :
>
> innodb_buffer_pool_size        = 32G
> innodb_lock_wait_timeout    = 3600
>
> Is there any way to solve this issue ? Otherwise, what would be the 
> procedure for deleting the database records altogether and starting on 
> a fresh new one ?
>
> Thanks in advance.
> -- 
> Julien REY
>
> Plate-forme RPBS
> Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
> Université Paris Diderot - Paris VII
> tel : 01 57 27 83 95



More information about the slurm-users mailing list