IMO the recommended method does not work well for jobs that already have a starttime in the future,and does not change the reason to something that explicitly lets you know the starttime was changed to put the job on hold; so it is problematic to identify jobs and release them as the starttime might have been set for other reasons. So a "magic number" starttime that is easy to identify and not likely to have been an actual value would be useful, instead of something like "now+duration", or additionally setting a comment field indicating the job is being held would help.
I have not used the Priority attribute all that much yet. Is it a bug that releasing a job makes the Priority very high? Do other installations see that behavior? I see several mentions of users only being able to reduce the Priority of their jobs.
There are scontrol subcommands uhold/hold/release/requeuehold that are ignored when describing how to place a job on hold in FAQ 21; and it is never explained why the method described therein is the best method, it just states it is. Does anyone know why the FAQ method is better than using the subcommands? Is it because the PRIORITY and/or NICE values are not altered (maybe)? The question is also about Running but the answer is justabout Starting and not Suspending which is not quite as clear (I think "running" should be "starting" to make that clear; and/or how to suspend should be described as well).If the answer is not clear to anyone, I might turn this into a request for clarification in theSlurm bugzilla as a documentation change request but wanted to see if this was already clear to anyone and I am missing something.From FAQ:21. How can I temporarily prevent a job from running (e.g. place it into a hold state)?The easiest way to do this is to change a job's earliest begin time(optionally set at job submit time using the --begin option). The examplebelow places a job into hold state (preventing its initiation for 30 days)and later permitting it to start now.<METHOD I>$ scontrol update JobId=1234 StartTime=now+30days... later ...$ scontrol update JobId=1234 StartTime=nowNote: Empirically in METHOD I the JobId can be a <job_list> , which Iinitially thought required single JobIDs.No explanation is given on why METHOD I is best; and there are other methodsthat seem more intuitive. I wonder what isundesirable about the following method which I have been using -- using the scontrol(1) subcommands hold/uhold/release/requeuehold.<METHOD II>$ scontrol hold <job_list> # advantage to administrator as user cannot change$ scontrol uhold <job_list>$ scontrol release <job_list>Examples:$ scontrol uhold jobname=JOB_NAME$ scontrol uhold '[100-200],300,500'Using uhold the "Reason" changes to something easily identifying thejob is being held, as "Reason=None" became "Reason=JobHeldUser whichseems better that Method I in that regard.The downside might be PRIORITY changed to zero and then went to avery large value when released?Another method appears to be that setting PRIORITY to zero alsoplaces jobs in hold.<METHOD III>$ scontrol update jobid=373 Priority=0$ scontrol release jobid=373 # sets to a very high value$ scontrol update jobid=373 Priority=11111 # put back to lower desired valueOnce lowered, does an optional setting prevent a user from raising PRIORITY(?)The manual saysOnly the Slurm administrator or root can increase job's priority.At least on my machine the "release" buts the priority to a very high value, and a regular user can lower the value back to the (probably) lower original value.I did not see it happening but there are some statements in the documentation that make me think not only PRIORITY but perhaps the NICE value might be changed by METHOD II and METHOD III, although I could not get the NICE value to be inadvertently changed.Sent with Proton Mail secure email.