[slurm-users] slurmdbd purge not working
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Apr 5 15:03:35 UTC 2019
On 4/5/19 4:28 PM, Julien Rey wrote:
> The failure occurs after a few minutes (~10).
>
> And we are running out of space on the slurm controller. The mysql
> daemon is at 100% CPU usage all the time. This issue is becoming critical.
...
>>>>> Our slurm accounting database is growing bigger and bigger (more
>>>>> than 100Gb) and is never being purged. We are running slurm
>>>>> 15.08.0-0pre1. I would like to upgrade to a more recent version of
>>>>> the slurmdbd, but my fear is that it may break everything during
>>>>> the update of the database.
>>>>>
>>>>> Here is our slurmdbd.conf :
>>>>>
>>>>> AuthType=auth/munge
>>>>> AuthInfo=/var/run/munge/munge.socket.2
>>>>> DbdHost=localhost
>>>>> DebugLevel=6
>>>>> StorageHost=localhost
>>>>> StorageLoc=slurm_acct_db
>>>>> StoragePass=shazaam
>>>>> StorageType=accounting_storage/mysql
>>>>> StorageUser=slurm
>>>>> LogFile=/var/log/slurm-llnl/slurmdbd.log
>>>>> PidFile=/var/run/slurm-llnl/slurmdbd.pid
>>>>> SlurmUser=slurm
>>>>> ArchiveDir=/home/joule/archives
>>>>> PurgeEventAfter=18
>>>>> PurgeJobAfter=18
>>>>> PurgeResvAfter=1
>>>>> PurgeStepAfter=1
>>>>> PurgeSuspendAfter=1
>>>>>
>>>>> I tried to purge it manually using this command but the slurmdbd
>>>>> daemon ends up crashing and it doesn't remove anything:
One more observation: You are using the default monthly intervals (18
means 18months). A monthly purge operation can be a huge amount of work
for a database of your size, and you certainly want to cut down the
amount of work required during the purges.
It is probably a good idea to try out a series of daily purges starting
with:
PurgeEventAfter=2000days
PurgeJobAfter=2000days
PurgeResvAfter=2000days
PurgeStepAfter=2000days
PurgeSuspendAfter=2000days
If this works well over a few days, decrease the purge interval 2000days
little by little and try again (1800, 1500, etc) until you after many
iterations come down to the desired final purge intervals.
See some further details in
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters
Best regards,
Ole
More information about the slurm-users
mailing list