[slurm-users] slurmdbd purge not working

Julien Rey julien.rey at univ-paris-diderot.fr
Fri Apr 5 14:28:53 UTC 2019


The failure occurs after a few minutes (~10).

And we are running out of space on the slurm controller. The mysql 
daemon is at 100% CPU usage all the time. This issue is becoming critical.

Le 05/04/2019 16:10, Paul Edmon a écrit :
> Did it just time out, or did that failure happen immediately.  If 
> immediate you may be in a situation where you are hitting a bug. It 
> "should" be safe to upgrade to a later version of 15.08.*. There may 
> be fixes in there related to that.  I would look at the changelog 
> though just to see if there is any database work that was done.
>
> -Paul Edmon-
>
> On 4/5/19 9:05 AM, Julien Rey wrote:
>> Hi Paul, thanks for your advice. Actually I already tried what you 
>> suggested. No matter what value do I put after PurgeJobAfter I always 
>> end up with the same error:
>>
>> sacctmgr archive dump Directory=/home/joule/archives/ 
>> PurgeJobAfter=1days
>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>>  Problem dumping archive: Unspecified error
>>
>> sacctmgr archive dump Directory=/home/joule/archives/ 
>> PurgeJobAfter=48months
>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>>  Problem dumping archive: Unspecified error
>>
>> Has anyone tried to truncate tables by hand directly in the mysql 
>> command line ?
>>
>> Le 04/04/2019 16:13, Paul Edmon a écrit :
>>> We ran into this problem in the past.  I know that fixes were put in 
>>> to deal with large purges as a result of our problems but I don't 
>>> recall what version they ended up in, likely newer than 15.08.0.
>>>
>>> A solution that can work is to walk up the time so that instead of 
>>> one large purge you do several smaller purges. That at least worked 
>>> for us in the past.
>>>
>>> -Paul Edmon-
>>>
>>> On 4/4/19 9:38 AM, Julien Rey wrote:
>>>> Hello,
>>>>
>>>> Our slurm accounting database is growing bigger and bigger (more 
>>>> than 100Gb) and is never being purged. We are running slurm 
>>>> 15.08.0-0pre1. I would like to upgrade to a more recent version of 
>>>> the slurmdbd, but my fear is that it may break everything during 
>>>> the update of the database.
>>>>
>>>> Here is our slurmdbd.conf :
>>>>
>>>> AuthType=auth/munge
>>>> AuthInfo=/var/run/munge/munge.socket.2
>>>> DbdHost=localhost
>>>> DebugLevel=6
>>>> StorageHost=localhost
>>>> StorageLoc=slurm_acct_db
>>>> StoragePass=shazaam
>>>> StorageType=accounting_storage/mysql
>>>> StorageUser=slurm
>>>> LogFile=/var/log/slurm-llnl/slurmdbd.log
>>>> PidFile=/var/run/slurm-llnl/slurmdbd.pid
>>>> SlurmUser=slurm
>>>> ArchiveDir=/home/joule/archives
>>>> PurgeEventAfter=18
>>>> PurgeJobAfter=18
>>>> PurgeResvAfter=1
>>>> PurgeStepAfter=1
>>>> PurgeSuspendAfter=1
>>>>
>>>> I tried to purge it manually using this command but the slurmdbd 
>>>> daemon ends up crashing and it doesn't remove anything:
>>>>
>>>> sacctmgr archive dump Directory=/home/joule/archives/ 
>>>> PurgeJobAfter=365days
>>>>
>>>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>>>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>>>>  Problem dumping archive: Unspecified error
>>>>
>>>> Sometimes I have to restart the mysql daemon (we are running mysql 
>>>> 5.5.39-1). The /var/log/slurm-llnl/slurmdbd.log shows nothings. The 
>>>> mysql logs are empty.
>>>>
>>>> I tried to increase these values in my.cnf but so far no success :
>>>>
>>>> innodb_buffer_pool_size        = 32G
>>>> innodb_lock_wait_timeout    = 3600
>>>>
>>>> Is there any way to solve this issue ? Otherwise, what would be the 
>>>> procedure for deleting the database records altogether and starting 
>>>> on a fresh new one ?
>>>>
>>>> Thanks in advance.
>>>> -- 
>>>> Julien REY
>>>>
>>>> Plate-forme RPBS
>>>> Modélisation Computationnelle des Interactions Protéines-Ligand 
>>>> (CMPLI)
>>>> Université Paris Diderot - Paris VII
>>>> tel : 01 57 27 83 95
>>>
>>
>>
>


-- 
Julien REY

Plate-forme RPBS
Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
Université Paris Diderot - Paris VII
tel : 01 57 27 83 95




More information about the slurm-users mailing list