[slurm-users] slurmdbd purge not working
Paul Edmon
pedmon at cfa.harvard.edu
Fri Apr 5 14:10:32 UTC 2019
Did it just time out, or did that failure happen immediately. If
immediate you may be in a situation where you are hitting a bug. It
"should" be safe to upgrade to a later version of 15.08.*. There may be
fixes in there related to that. I would look at the changelog though
just to see if there is any database work that was done.
-Paul Edmon-
On 4/5/19 9:05 AM, Julien Rey wrote:
> Hi Paul, thanks for your advice. Actually I already tried what you
> suggested. No matter what value do I put after PurgeJobAfter I always
> end up with the same error:
>
> sacctmgr archive dump Directory=/home/joule/archives/ PurgeJobAfter=1days
> sacctmgr: error: slurmdbd: Getting response to message type 1459
> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
> Problem dumping archive: Unspecified error
>
> sacctmgr archive dump Directory=/home/joule/archives/
> PurgeJobAfter=48months
> sacctmgr: error: slurmdbd: Getting response to message type 1459
> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
> Problem dumping archive: Unspecified error
>
> Has anyone tried to truncate tables by hand directly in the mysql
> command line ?
>
> Le 04/04/2019 16:13, Paul Edmon a écrit :
>> We ran into this problem in the past. I know that fixes were put in
>> to deal with large purges as a result of our problems but I don't
>> recall what version they ended up in, likely newer than 15.08.0.
>>
>> A solution that can work is to walk up the time so that instead of
>> one large purge you do several smaller purges. That at least worked
>> for us in the past.
>>
>> -Paul Edmon-
>>
>> On 4/4/19 9:38 AM, Julien Rey wrote:
>>> Hello,
>>>
>>> Our slurm accounting database is growing bigger and bigger (more
>>> than 100Gb) and is never being purged. We are running slurm
>>> 15.08.0-0pre1. I would like to upgrade to a more recent version of
>>> the slurmdbd, but my fear is that it may break everything during the
>>> update of the database.
>>>
>>> Here is our slurmdbd.conf :
>>>
>>> AuthType=auth/munge
>>> AuthInfo=/var/run/munge/munge.socket.2
>>> DbdHost=localhost
>>> DebugLevel=6
>>> StorageHost=localhost
>>> StorageLoc=slurm_acct_db
>>> StoragePass=shazaam
>>> StorageType=accounting_storage/mysql
>>> StorageUser=slurm
>>> LogFile=/var/log/slurm-llnl/slurmdbd.log
>>> PidFile=/var/run/slurm-llnl/slurmdbd.pid
>>> SlurmUser=slurm
>>> ArchiveDir=/home/joule/archives
>>> PurgeEventAfter=18
>>> PurgeJobAfter=18
>>> PurgeResvAfter=1
>>> PurgeStepAfter=1
>>> PurgeSuspendAfter=1
>>>
>>> I tried to purge it manually using this command but the slurmdbd
>>> daemon ends up crashing and it doesn't remove anything:
>>>
>>> sacctmgr archive dump Directory=/home/joule/archives/
>>> PurgeJobAfter=365days
>>>
>>> sacctmgr: error: slurmdbd: Getting response to message type 1459
>>> sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>>> Problem dumping archive: Unspecified error
>>>
>>> Sometimes I have to restart the mysql daemon (we are running mysql
>>> 5.5.39-1). The /var/log/slurm-llnl/slurmdbd.log shows nothings. The
>>> mysql logs are empty.
>>>
>>> I tried to increase these values in my.cnf but so far no success :
>>>
>>> innodb_buffer_pool_size = 32G
>>> innodb_lock_wait_timeout = 3600
>>>
>>> Is there any way to solve this issue ? Otherwise, what would be the
>>> procedure for deleting the database records altogether and starting
>>> on a fresh new one ?
>>>
>>> Thanks in advance.
>>> --
>>> Julien REY
>>>
>>> Plate-forme RPBS
>>> Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
>>> Université Paris Diderot - Paris VII
>>> tel : 01 57 27 83 95
>>
>
>
More information about the slurm-users
mailing list