[slurm-users] How to have an array job name include the array task ID
Alex Reynolds
areynolds at altius.org
Tue Apr 17 01:04:07 MDT 2018
Hello all,
I am submitting a job to a SLURM scheduler, which contains an array of
small jobs.
For example, here's a script that simply prints out the date and hostname
of the compute node from within a heredoc:
-------------------
#!/bin/bash
...(variables)...
sbatch --parsable --partition=${jobPartition} --array=1-${jobArrayCount}
--job-name=${jobName}.%a --output=${jobName}.stdout.%a.%j
--error=${jobName}.stderr.%a.%j --mem-per-cpu=${jobMem} --export=ALL <<"EOF"
#!/bin/bash
stamp=`date && hostname`
echo -e "Child array job [${SLURM_ARRAY_TASK_ID}]:\n${stamp}"
EOF
exit 0
-------------------
The filenames of the output and error logs from this job contain the
correct array task ID (1 through ${jobArrayCount}, represented with the %a
variable) and parent job ID (represented with the %j variable).
However, the job name (${jobName}.%a) only expands the ${jobName} variable,
and it prints the %a value as a string literal — that is, it is left
untranslated to the array task ID.
For example, if "jobName=foo", then the use of --job-name=${jobName}.%a
results in the scheduler using the job name "foo.%a", instead of "foo.1",
"foo.2", and so on, up to the number of child jobs in the array.
As output and error logs can use the %a array task ID variable, is there a
way to get the job name assignment to use this variable as well?
Another thing I tried was to move the job name assignment within the
heredoc block:
-------------------
#!/bin/bash
...(variables)...
sbatch --parsable --partition=${jobPartition} --array=1-${jobArrayCount}
--output=${jobName}.stdout.%a.%j --error=${jobName}.stderr.%a.%j
--mem-per-cpu=${jobMem} --export=ALL <<"EOF"
#!/bin/bash
#SBATCH --job-name="${jobName}.${SLURM_ARRAY_TASK_ID}"
stamp=`date && hostname`
echo -e "Child array job [${SLURM_ARRAY_TASK_ID}]:\n${stamp}"
EOF
exit 0
-------------------
In this case, the job name is rendered literally as the string
"${jobName}.${SLURM_ARRAY_TASK_ID}".
A third thing that I tried was to rename the job name via `scontrol`, after
the fact, which works but only if the job is in the scheduler and only if
it is running:
-------------------
$ scontrol update JobId=${arrayJobId} JobName=${jobName}.${jobArrayTaskId}
-------------------
The `sacct` program does not seem to have keywords that grant access to
array job and task IDs, e.g.:
-------------------
$ sacct -j ${arrayJobId} --format=ArrayJobId,ArrayTaskId --noheader
--parsable2
sacct: error: Invalid field requested: "ArrayJobId"
-------------------
(Keywords are listed here: https://slurm.schedmd.com/sacct.html)
However, it looks like I can use `scontrol` to get the array job and task
IDs, though it is a bit of a hack:
-------------------
$ scontrol show job ${arrayJobId} | grep ArrayTaskId | awk '{i=split($0,a,"
"); j=split(a[3],b,"="); k=split(a[4],c,"="); print c[2]"."b[2]; }'
testArrayChild.1
-------------------
There are a few problems with this approach:
1. I can't rename the array of jobs until they are in the scheduler
2. My method for getting the array task ID is a hack that seems fragile
3. I can't rename the job after it is finished
These issues seem to make this approach difficult to implement in a
reliable way.
My question, ultimately, is: Is there an easier way to have the an array
job name include the array task ID?
Regards,
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180417/c7f246b2/attachment.html>
More information about the slurm-users
mailing list