[slurm-users] SlurmdSpoolDir full

Xaver Stiensmeier xaverstiensmeier at gmx.de
Sun Dec 10 08:53:33 UTC 2023


Hello Brian Andrus,

we ran 'df -h' to determine the amount of free space I mentioned below.
I also should add that at the time we inspected the node, there was
still around 38 GB of space left - however, we were unable to watch the
remaining space while the error occurred so maybe the large file(s) got
removed immediately.

I will take a look at /var/log. That's a good idea. I don't think that
there will be anything unusual, but it's something I haven't thought
about yet (the reason of the error being somewhere else).

Best regards
Xaver

On 10.12.23 00:41, Brian Andrus wrote:
> Xaver,
>
> It is likely your /var or /var/spool mount.
> That may be a separate partition or part of your root partition. It is
> the partition that is full, not the directory itself. So the cause
> could very well be log files in /var/log. I would check to see what
> (if any) partitions are getting filled on the node. You can run 'df
> -h' and see some info that would get you started.
>
> Brian Andrus
>
> On 12/8/2023 7:00 AM, Xaver Stiensmeier wrote:
>> Dear slurm-user list,
>>
>> during a larger cluster run (the same I mentioned earlier 242 nodes), I
>> got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
>> directory on the workers that is used for job state information
>> (https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir). However,
>> I was unable to find more precise information on that dictionary. We
>> compute all data on another volume so SlurmdSpoolDir has roughly 38 GB
>> of free space where nothing is intentionally put during the run. This
>> error only occurred on very few nodes.
>>
>> I would like to understand what Slurmd is placing in this dir that fills
>> up the space. Do you have any ideas? Due to the workflow used, we have a
>> hard time reconstructing the exact scenario that caused this error. I
>> guess, the "fix" is to just pick a bit larger disk, but I am unsure
>> whether Slurm behaves normal here.
>>
>> Best regards
>> Xaver Stiensmeier
>>
>>
>



More information about the slurm-users mailing list