[slurm-users] is there a way to delay the scheduling.

Ryan Novosielski novosirj at rutgers.edu
Fri Aug 28 16:45:40 UTC 2020


Sounds like you’re sort of the poster-child for this section of the documentation:

https://slurm.schedmd.com/high_throughput.html — note that it’s possible for this to be version specific, so look for this file in the “archive” section of the website if you need other than 20.02.

--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Aug 28, 2020, at 6:30 AM, navin srivastava <navin.altair at gmail.com> wrote:
> 
> Hi Team,
> 
> facing one issue. several users submitting 20000 job in a single batch job which is very short jobs( says 1-2 sec). so while submitting more job slurmctld become unresponsive and started giving message
> 
> ending job 6e508a88155d9bec40d752c8331d7ae8 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)
> Sending job 6e51ed0e322c87802b0f3a2f23a7967f to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)
> Sending job 6e638939f90cd59e60c23b8450af9839 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)
> Sending job 6e6acf36bc7e1394a92155a95feb1c92 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)
> Sending job 6e6c646a29f0ad4e9df35001c367a9f5 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)
> Sending job 6ebcecb4c27d88f0f48d402e2b079c52 to queue.
> 
> even that time the load of cpu started consuming more than 100%  of slurmctld process.
> I found that the node is not able to acknowledge immediately to server. it is moving from comp to idle.
> so in my thought delay a scheduling cycle will help here. any idea how it can be done.
> 
> so is there any other solution available for such issues.
> 
> Regards
> Navin.
> 
> 
> 



More information about the slurm-users mailing list