[slurm-users] SlurmdSpoolDir full

Brian Andrus toomuchit at gmail.com
Sat Dec 9 23:41:54 UTC 2023


Xaver,

It is likely your /var or /var/spool mount.
That may be a separate partition or part of your root partition. It is 
the partition that is full, not the directory itself. So the cause could 
very well be log files in /var/log. I would check to see what (if any) 
partitions are getting filled on the node. You can run 'df -h' and see 
some info that would get you started.

Brian Andrus

On 12/8/2023 7:00 AM, Xaver Stiensmeier wrote:
> Dear slurm-user list,
>
> during a larger cluster run (the same I mentioned earlier 242 nodes), I
> got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
> directory on the workers that is used for job state information
> (https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir). However,
> I was unable to find more precise information on that dictionary. We
> compute all data on another volume so SlurmdSpoolDir has roughly 38 GB
> of free space where nothing is intentionally put during the run. This
> error only occurred on very few nodes.
>
> I would like to understand what Slurmd is placing in this dir that fills
> up the space. Do you have any ideas? Due to the workflow used, we have a
> hard time reconstructing the exact scenario that caused this error. I
> guess, the "fix" is to just pick a bit larger disk, but I am unsure
> whether Slurm behaves normal here.
>
> Best regards
> Xaver Stiensmeier
>
>



More information about the slurm-users mailing list