Hello all,
I've used the "scontrol write batch_script" command to output the job submission script from completed jobs in the past, but for some reason, no matter which job I specify, it tells me it is invalid. Any way to troubleshoot this? Alternatively, is there another way - even if a manual database query - to recover the job script, assuming it exists in the database?
sacct --jobs=38960 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 38960 amr_run_v+ tsmith2lab tsmith2lab 72 COMPLETED 0:0 38960.batch batch tsmith2lab 40 COMPLETED 0:0 38960.extern extern tsmith2lab 72 COMPLETED 0:0 38960.0 hydra_pmi+ tsmith2lab 72 COMPLETED 0:0
scontrol write batch_script 38960 job script retrieval failed: Invalid job id specified
Warmest regards, Jason
Are you using the job_script storage option? If so then you should be able to get at it by doing:
sacct -B j JOBID
https://slurm.schedmd.com/sacct.html#OPT_batch-script
-Paul Edmon-
On 2/16/2024 2:41 PM, Jason Simms via slurm-users wrote:
Hello all,
I've used the "scontrol write batch_script" command to output the job submission script from completed jobs in the past, but for some reason, no matter which job I specify, it tells me it is invalid. Any way to troubleshoot this? Alternatively, is there another way - even if a manual database query - to recover the job script, assuming it exists in the database?
sacct --jobs=38960 JobID JobName Partition Account AllocCPUS State ExitCode
38960 amr_run_v+ tsmith2lab tsmith2lab 72 COMPLETED 0:0 38960.batch batch tsmith2lab 40 COMPLETED 0:0 38960.extern extern tsmith2lab 72 COMPLETED 0:0 38960.0 hydra_pmi+ tsmith2lab 72 COMPLETED 0:0
scontrol write batch_script 38960 job script retrieval failed: Invalid job id specified
Warmest regards, Jason
-- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research Computing Swarthmore College Information Technology Services (610) 328-8102 Schedule a meeting: https://calendly.com/jlsimms
Yes, that is what we are also doing and it works well. Note that requesting a batch script for another user, one sees nothing (rather than an error message saying that one does not have permissions)
On Fri, Feb 16, 2024 at 12:48 PM Paul Edmon via slurm-users < slurm-users@lists.schedmd.com> wrote:
Are you using the job_script storage option? If so then you should be able to get at it by doing:
sacct -B j JOBID
https://slurm.schedmd.com/sacct.html#OPT_batch-script
-Paul Edmon- On 2/16/2024 2:41 PM, Jason Simms via slurm-users wrote:
Hello all,
I've used the "scontrol write batch_script" command to output the job submission script from completed jobs in the past, but for some reason, no matter which job I specify, it tells me it is invalid. Any way to troubleshoot this? Alternatively, is there another way - even if a manual database query - to recover the job script, assuming it exists in the database?
sacct --jobs=38960 JobID JobName Partition Account AllocCPUS State ExitCode
38960 amr_run_v+ tsmith2lab tsmith2lab 72 COMPLETED 0:0 38960.batch batch tsmith2lab 40 COMPLETED 0:0 38960.extern extern tsmith2lab 72 COMPLETED 0:0 38960.0 hydra_pmi+ tsmith2lab 72 COMPLETED 0:0
scontrol write batch_script 38960 job script retrieval failed: Invalid job id specified
Warmest regards, Jason
-- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research Computing Swarthmore College Information Technology Services (610) 328-8102 Schedule a meeting: https://calendly.com/jlsimms
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Are you absolutely certain you’ve done it before for completed jobs? I would not expect that to work for completed jobs, with the possible exception of very recently completed jobs (or am I thinking of Torque?).
Other replies mention the relatively new feature (21.08?) to store the job script in the database. Be mindful of the database implications here (I believe I have had conversations about this recently with some experienced sites on this mailing list).
-- #BlackLivesMatter ____ || \UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'
On Feb 16, 2024, at 14:41, Jason Simms via slurm-users slurm-users@lists.schedmd.com wrote:
Hello all,
I've used the "scontrol write batch_script" command to output the job submission script from completed jobs in the past, but for some reason, no matter which job I specify, it tells me it is invalid. Any way to troubleshoot this? Alternatively, is there another way - even if a manual database query - to recover the job script, assuming it exists in the database?
sacct --jobs=38960 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 38960 amr_run_v+ tsmith2lab tsmith2lab 72 COMPLETED 0:0 38960.batch batch tsmith2lab 40 COMPLETED 0:0 38960.extern extern tsmith2lab 72 COMPLETED 0:0 38960.0 hydra_pmi+ tsmith2lab 72 COMPLETED 0:0
scontrol write batch_script 38960 job script retrieval failed: Invalid job id specified
Warmest regards, Jason
-- Jason L. Simms, Ph.D., M.P.H. Manager of Research Computing Swarthmore College Information Technology Services (610) 328-8102 Schedule a meeting: https://calendly.com/jlsimms
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com