[slurm-users] slurmdbd purge not working

Paul Edmon pedmon at cfa.harvard.edu
Fri Apr 5 14:10:32 UTC 2019


Did it just time out, or did that failure happen immediately.  If 
immediate you may be in a situation where you are hitting a bug. It 
"should" be safe to upgrade to a later version of 15.08.*. There may be 
fixes in there related to that.  I would look at the changelog though 
just to see if there is any database work that was done.

-Paul Edmon-

On 4/5/19 9:05 AM, Julien Rey wrote:
> Hi Paul, thanks for your advice. Actually I already tried what you 
> suggested. No matter what value do I put after PurgeJobAfter I always 
> end up with the same error:
>
> sacctmgr archive dump Directory=/home/joule/archives/ PurgeJobAfter=1days
> sacctmgr: error: slurmdbd: Getting response to message type 1459
> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>  Problem dumping archive: Unspecified error
>
> sacctmgr archive dump Directory=/home/joule/archives/ 
> PurgeJobAfter=48months
> sacctmgr: error: slurmdbd: Getting response to message type 1459
> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>  Problem dumping archive: Unspecified error
>
> Has anyone tried to truncate tables by hand directly in the mysql 
> command line ?
>
> Le 04/04/2019 16:13, Paul Edmon a écrit :
>> We ran into this problem in the past.  I know that fixes were put in 
>> to deal with large purges as a result of our problems but I don't 
>> recall what version they ended up in, likely newer than 15.08.0.
>>
>> A solution that can work is to walk up the time so that instead of 
>> one large purge you do several smaller purges.  That at least worked 
>> for us in the past.
>>
>> -Paul Edmon-
>>
>> On 4/4/19 9:38 AM, Julien Rey wrote:
>>> Hello,
>>>
>>> Our slurm accounting database is growing bigger and bigger (more 
>>> than 100Gb) and is never being purged. We are running slurm 
>>> 15.08.0-0pre1. I would like to upgrade to a more recent version of 
>>> the slurmdbd, but my fear is that it may break everything during the 
>>> update of the database.
>>>
>>> Here is our slurmdbd.conf :
>>>
>>> AuthType=auth/munge
>>> AuthInfo=/var/run/munge/munge.socket.2
>>> DbdHost=localhost
>>> DebugLevel=6
>>> StorageHost=localhost
>>> StorageLoc=slurm_acct_db
>>> StoragePass=shazaam
>>> StorageType=accounting_storage/mysql
>>> StorageUser=slurm
>>> LogFile=/var/log/slurm-llnl/slurmdbd.log
>>> PidFile=/var/run/slurm-llnl/slurmdbd.pid
>>> SlurmUser=slurm
>>> ArchiveDir=/home/joule/archives
>>> PurgeEventAfter=18
>>> PurgeJobAfter=18
>>> PurgeResvAfter=1
>>> PurgeStepAfter=1
>>> PurgeSuspendAfter=1
>>>
>>> I tried to purge it manually using this command but the slurmdbd 
>>> daemon ends up crashing and it doesn't remove anything:
>>>
>>> sacctmgr archive dump Directory=/home/joule/archives/ 
>>> PurgeJobAfter=365days
>>>
>>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>>>  Problem dumping archive: Unspecified error
>>>
>>> Sometimes I have to restart the mysql daemon (we are running mysql 
>>> 5.5.39-1). The /var/log/slurm-llnl/slurmdbd.log shows nothings. The 
>>> mysql logs are empty.
>>>
>>> I tried to increase these values in my.cnf but so far no success :
>>>
>>> innodb_buffer_pool_size        = 32G
>>> innodb_lock_wait_timeout    = 3600
>>>
>>> Is there any way to solve this issue ? Otherwise, what would be the 
>>> procedure for deleting the database records altogether and starting 
>>> on a fresh new one ?
>>>
>>> Thanks in advance.
>>> -- 
>>> Julien REY
>>>
>>> Plate-forme RPBS
>>> Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
>>> Université Paris Diderot - Paris VII
>>> tel : 01 57 27 83 95
>>
>
>



More information about the slurm-users mailing list