[slurm-users] Possible bug with Prologslurmctld and Epilogslurmctld scripts?

Brian Andrus toomuchit at gmail.com
Mon Sep 27 17:37:07 UTC 2021


Those would be considered separate for each job.

You may want to have your prolog check to see if there is an epilogue 
running and wait for the epilogue to be done before starting its prolog 
work.

Brian Andrus

On 9/27/2021 9:15 AM, Joe Teumer wrote:
> Should the Prologslurmctld script only run after the Epilogslurmctld 
> script finishes?
>
> Below you can see JobA runs and completes.
> While Epilogslurmctld (from JobA Node A) is executing on the Slurm 
> controller the Prologslurmctld script for the next job (from Job B 
> Node A) is also running on the Slurm controller.
>
> This breaks our workflow as we are expecting the next Prologslurmctld 
> script to only run when the prior job is 100% completed (initial 
> Prologslurmctld completes AND Job completes AND Job Epilogslurmctld 
> completes).
>
> Prologslurmctld > Here Prolog is starting for Job B (ID 812)
> *2021-09-27 15:42:58,746* | INFO | Starting...
>
> Epilogslurmctld > Here Epilog is starting and ending for Job A (ID 811)
> *2021-09-27 15:42:56,694* | INFO | Starting...
> *2021-09-27 15:43:01,756* | INFO | Exiting 0 after main
>
> [2021-09-27T15:42:50.224] debug:  sched/backfill: _attempt_backfill: 
> beginning
> [2021-09-27T15:42:50.224] debug:  sched/backfill: _attempt_backfill: 1 
> jobs to backfill
> [2021-09-27T15:42:56.653] _job_complete: JobId=811 WEXITSTATUS 0
> [2021-09-27T15:42:56.653] debug:  email msg to root: Slurm Job_id=811 
> Name=JobA_BIOS_fixedfreq_1067mclk_nps1.ini Ended, Run time 00:22:36, 
> COMPLETED, ExitCode 0
> *[2021-09-27T15:42:56.657]* _job_complete: JobId=811 done
> [2021-09-27T15:42:58.703] debug:  sched: Running job scheduler for 
> full queue.
> *[2021-09-27T15:42:58.704]* debug:  email msg to root: Slurm 
> Job_id=812 Name=JobA_BIOS_fixedfreq_1600mclk_nps1.ini Began, Queued 
> time 00:22:38
> [2021-09-27T15:42:58.704] sched: Allocate JobId=812 NodeList=xxx 
> #CPUs=256 Partition=xxx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210927/697ed552/attachment.htm>


More information about the slurm-users mailing list