[slurm-users] slurmdbd purge not working

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Apr 5 15:03:35 UTC 2019


On 4/5/19 4:28 PM, Julien Rey wrote:
> The failure occurs after a few minutes (~10).
> 
> And we are running out of space on the slurm controller. The mysql 
> daemon is at 100% CPU usage all the time. This issue is becoming critical.
...
>>>>> Our slurm accounting database is growing bigger and bigger (more 
>>>>> than 100Gb) and is never being purged. We are running slurm 
>>>>> 15.08.0-0pre1. I would like to upgrade to a more recent version of 
>>>>> the slurmdbd, but my fear is that it may break everything during 
>>>>> the update of the database.
>>>>>
>>>>> Here is our slurmdbd.conf :
>>>>>
>>>>> AuthType=auth/munge
>>>>> AuthInfo=/var/run/munge/munge.socket.2
>>>>> DbdHost=localhost
>>>>> DebugLevel=6
>>>>> StorageHost=localhost
>>>>> StorageLoc=slurm_acct_db
>>>>> StoragePass=shazaam
>>>>> StorageType=accounting_storage/mysql
>>>>> StorageUser=slurm
>>>>> LogFile=/var/log/slurm-llnl/slurmdbd.log
>>>>> PidFile=/var/run/slurm-llnl/slurmdbd.pid
>>>>> SlurmUser=slurm
>>>>> ArchiveDir=/home/joule/archives
>>>>> PurgeEventAfter=18
>>>>> PurgeJobAfter=18
>>>>> PurgeResvAfter=1
>>>>> PurgeStepAfter=1
>>>>> PurgeSuspendAfter=1
>>>>>
>>>>> I tried to purge it manually using this command but the slurmdbd 
>>>>> daemon ends up crashing and it doesn't remove anything:

One more observation:  You are using the default monthly intervals (18 
means 18months).  A monthly purge operation can be a huge amount of work 
for a database of your size, and you certainly want to cut down the 
amount of work required during the purges.

It is probably a good idea to try out a series of daily purges starting 
with:

PurgeEventAfter=2000days
PurgeJobAfter=2000days
PurgeResvAfter=2000days
PurgeStepAfter=2000days
PurgeSuspendAfter=2000days

If this works well over a few days, decrease the purge interval 2000days 
little by little and try again (1800, 1500, etc) until you after many 
iterations come down to the desired final purge intervals.

See some further details in 
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters

Best regards,
Ole



More information about the slurm-users mailing list