[slurm-users] slurmdbd purge not working
Julien Rey
julien.rey at univ-paris-diderot.fr
Fri Apr 5 14:28:53 UTC 2019
The failure occurs after a few minutes (~10).
And we are running out of space on the slurm controller. The mysql
daemon is at 100% CPU usage all the time. This issue is becoming critical.
Le 05/04/2019 16:10, Paul Edmon a écrit :
> Did it just time out, or did that failure happen immediately. If
> immediate you may be in a situation where you are hitting a bug. It
> "should" be safe to upgrade to a later version of 15.08.*. There may
> be fixes in there related to that. I would look at the changelog
> though just to see if there is any database work that was done.
>
> -Paul Edmon-
>
> On 4/5/19 9:05 AM, Julien Rey wrote:
>> Hi Paul, thanks for your advice. Actually I already tried what you
>> suggested. No matter what value do I put after PurgeJobAfter I always
>> end up with the same error:
>>
>> sacctmgr archive dump Directory=/home/joule/archives/
>> PurgeJobAfter=1days
>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>> Problem dumping archive: Unspecified error
>>
>> sacctmgr archive dump Directory=/home/joule/archives/
>> PurgeJobAfter=48months
>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>> Problem dumping archive: Unspecified error
>>
>> Has anyone tried to truncate tables by hand directly in the mysql
>> command line ?
>>
>> Le 04/04/2019 16:13, Paul Edmon a écrit :
>>> We ran into this problem in the past. I know that fixes were put in
>>> to deal with large purges as a result of our problems but I don't
>>> recall what version they ended up in, likely newer than 15.08.0.
>>>
>>> A solution that can work is to walk up the time so that instead of
>>> one large purge you do several smaller purges. That at least worked
>>> for us in the past.
>>>
>>> -Paul Edmon-
>>>
>>> On 4/4/19 9:38 AM, Julien Rey wrote:
>>>> Hello,
>>>>
>>>> Our slurm accounting database is growing bigger and bigger (more
>>>> than 100Gb) and is never being purged. We are running slurm
>>>> 15.08.0-0pre1. I would like to upgrade to a more recent version of
>>>> the slurmdbd, but my fear is that it may break everything during
>>>> the update of the database.
>>>>
>>>> Here is our slurmdbd.conf :
>>>>
>>>> AuthType=auth/munge
>>>> AuthInfo=/var/run/munge/munge.socket.2
>>>> DbdHost=localhost
>>>> DebugLevel=6
>>>> StorageHost=localhost
>>>> StorageLoc=slurm_acct_db
>>>> StoragePass=shazaam
>>>> StorageType=accounting_storage/mysql
>>>> StorageUser=slurm
>>>> LogFile=/var/log/slurm-llnl/slurmdbd.log
>>>> PidFile=/var/run/slurm-llnl/slurmdbd.pid
>>>> SlurmUser=slurm
>>>> ArchiveDir=/home/joule/archives
>>>> PurgeEventAfter=18
>>>> PurgeJobAfter=18
>>>> PurgeResvAfter=1
>>>> PurgeStepAfter=1
>>>> PurgeSuspendAfter=1
>>>>
>>>> I tried to purge it manually using this command but the slurmdbd
>>>> daemon ends up crashing and it doesn't remove anything:
>>>>
>>>> sacctmgr archive dump Directory=/home/joule/archives/
>>>> PurgeJobAfter=365days
>>>>
>>>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>>>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>>>> Problem dumping archive: Unspecified error
>>>>
>>>> Sometimes I have to restart the mysql daemon (we are running mysql
>>>> 5.5.39-1). The /var/log/slurm-llnl/slurmdbd.log shows nothings. The
>>>> mysql logs are empty.
>>>>
>>>> I tried to increase these values in my.cnf but so far no success :
>>>>
>>>> innodb_buffer_pool_size = 32G
>>>> innodb_lock_wait_timeout = 3600
>>>>
>>>> Is there any way to solve this issue ? Otherwise, what would be the
>>>> procedure for deleting the database records altogether and starting
>>>> on a fresh new one ?
>>>>
>>>> Thanks in advance.
>>>> --
>>>> Julien REY
>>>>
>>>> Plate-forme RPBS
>>>> Modélisation Computationnelle des Interactions Protéines-Ligand
>>>> (CMPLI)
>>>> Université Paris Diderot - Paris VII
>>>> tel : 01 57 27 83 95
>>>
>>
>>
>
--
Julien REY
Plate-forme RPBS
Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
Université Paris Diderot - Paris VII
tel : 01 57 27 83 95
More information about the slurm-users
mailing list