[slurm-users] srun fails with "srun: error: Security violation, slurm message from uid" if delay in job starting

Mark Dixon mark.c.dixon at durham.ac.uk
Mon Dec 13 13:41:26 UTC 2021


Hi all,

Just wondering if anyone else had seen this.

Running slurm 21.08.2, we're seeing srun work normally if it is able to 
run immediately. However, if there is a delay in job start, for example 
after a wait for another job to end, srun fails. e.g.

   [test at foo ~]$ srun -p test --pty bash
   [test at bar ~]$ exit
   exit
   [test at foo ~]$

   [test at foo ~]$ sbatch -p test --exclusive sleep.sh
   Submitted batch job 3407
   [test at foo ~]$ srun -p test --pty bash
   srun: job 3409 queued and waiting for resources
   srun: error: Security violation, slurm message from uid 456
   srun: error: Security violation, slurm message from uid 456
   srun: error: Job allocation 3409 has been revoked
   [test at foo ~]$

With --slurmd-debug=verbose, I see:

   srun: job 3390 queued and waiting for resources
   srun: error: Security violation, slurm message from uid 456
   srun: error: Security violation, slurm message from uid 456
   srun: error: Job allocation 3390 has been revoked

Meanwhile, the slurmd log shows:

[2021-12-13T13:08:06.028] Job 3390 already killed, do not launch extern step


Any ideas, please?

Thanks!

Mark



More information about the slurm-users mailing list