[slurm-users] Slurm Crashing - File has zero size

Brian Andrus toomuchit at gmail.com
Thu Oct 28 20:51:37 UTC 2021

You may have space, but do you have enough inodes?

Two different things to look at when trying to see why you cannot write 
to a disk.

Also verify that it is writeable by SlurmUser.

If something happened and it automatically remounted itself as 
read-only, that can do it too.

Brian Andrus

On 10/28/2021 11:57 AM, Pedro Luiz de Castro wrote:
> Hello all
> Since yesterday we’ve been having some trouble with slurm where it 
> crashes and isn’t able to recover.
> I’ve managed to track the fault to a zero sized file, launching 
> slurmctld -Dvvvv
> slurmctld: File 
> /mnt/nfs/lobo/IMM-NFS/slurm/hash.4/job.2044004/environment has zero size
> That’s the StateSaveLocation, so the environment file for this 
> particular job is not getting correctly created.
> I don’t believe it’s a space issue as there’s about 2TB of free space 
> on this mountpoint.
> Shouldn’t be permissions either, as other jobs run fine and get completed.
> For now I’ve been launching slurmctld -i to work around this issue, 
> killing the job in question.
> This way slurm can still be running for our users.
> Any ideas where I should look next to try and troubleshoot this issue?
> Thanks for all the help in advance.
> Best regards,
> *Pedro Luiz de Castro*
> IT Support & System Administrator
> Information Systems
> iMM_JLA_horizontal_RGB_cor_positivo
> Faculdade de Medicina, Universidade de Lisboa
> Avenida Professor Egas Moniz, 1649​-​028, Lisboa, Portugal
> iMM Lisboa general contact (+​351) ​217 ​999 ​411 - ext: 47356
> *imm.medicina*​*.ulisboa*​*.pt*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211028/9435cb7c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 3792 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211028/9435cb7c/attachment.jpg>

More information about the slurm-users mailing list