[slurm-users] is there a way to delay the scheduling.
Maciej Pawlik
maciej.pawlik.xml at gmail.com
Fri Aug 28 16:38:53 UTC 2020
Hey,
you can use the 'defer' scheduler parameter:
https://slurm.schedmd.com/sched_config.html if you don't require immediate
start of jobs.
best regards
Maciej Pawlik
pt., 28 sie 2020 o 12:32 navin srivastava <navin.altair at gmail.com>
napisał(a):
> Hi Team,
>
> facing one issue. several users submitting 20000 job in a single batch job
> which is very short jobs( says 1-2 sec). so while submitting more job
> slurmctld become unresponsive and started giving message
>
> ending job 6e508a88155d9bec40d752c8331d7ae8 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm
> controller (connect failure)
> Sending job 6e51ed0e322c87802b0f3a2f23a7967f to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm
> controller (connect failure)
> Sending job 6e638939f90cd59e60c23b8450af9839 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm
> controller (connect failure)
> Sending job 6e6acf36bc7e1394a92155a95feb1c92 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm
> controller (connect failure)
> Sending job 6e6c646a29f0ad4e9df35001c367a9f5 to queue.
> sbatch: error: Batch job submission failed: Unable to contact slurm
> controller (connect failure)
> Sending job 6ebcecb4c27d88f0f48d402e2b079c52 to queue.
>
> even that time the load of cpu started consuming more than 100% of
> slurmctld process.
> I found that the node is not able to acknowledge immediately to server. it
> is moving from comp to idle.
> so in my thought delay a scheduling cycle will help here. any idea how it
> can be done.
>
> so is there any other solution available for such issues.
>
> Regards
> Navin.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200828/1b9c8719/attachment.htm>
More information about the slurm-users
mailing list