[slurm-users] srun fails with "srun: error: Security violation, slurm message from uid" if delay in job starting

Mark Dixon mark.c.dixon at durham.ac.uk
Tue Dec 14 08:49:27 UTC 2021


Hi all,

Sorry for the noise, this was down to a problem with our configless setup.

Really must start running slurmd everywhere and get rid of the 
compute-only version of slurm.conf...

Cheers,

Mark

On Mon, 13 Dec 2021, Mark Dixon wrote:

> [EXTERNAL EMAIL]
>
> Hi all,
>
> Just wondering if anyone else had seen this.
>
> Running slurm 21.08.2, we're seeing srun work normally if it is able to
> run immediately. However, if there is a delay in job start, for example
> after a wait for another job to end, srun fails. e.g.
>
>   [test at foo ~]$ srun -p test --pty bash
>   [test at bar ~]$ exit
>   exit
>   [test at foo ~]$
>
>   [test at foo ~]$ sbatch -p test --exclusive sleep.sh
>   Submitted batch job 3407
>   [test at foo ~]$ srun -p test --pty bash
>  srun:  job 3409 queued and waiting for resources
>  srun: error:  Security violation, slurm message from uid 456
>  srun: error:  Security violation, slurm message from uid 456
>  srun: error:  Job allocation 3409 has been revoked
>   [test at foo ~]$
>
> With --slurmd-debug=verbose, I see:
>
>  srun:  job 3390 queued and waiting for resources
>  srun: error:  Security violation, slurm message from uid 456
>  srun: error:  Security violation, slurm message from uid 456
>  srun: error:  Job allocation 3390 has been revoked
>
> Meanwhile, the slurmd log shows:
>
> [2021-12-13T13:08:06.028] Job 3390 already killed, do not launch extern step
>
>
> Any ideas, please?
>
> Thanks!
>
> Mark
>
>
>



More information about the slurm-users mailing list