[slurm-users] How to have an array job name include the array task ID

Alex Reynolds areynolds at altius.org
Tue Apr 17 01:04:07 MDT 2018


Hello all,

I am submitting a job to a SLURM scheduler, which contains an array of
small jobs.

For example, here's a script that simply prints out the date and hostname
of the compute node from within a heredoc:

-------------------
#!/bin/bash
...(variables)...
sbatch --parsable --partition=${jobPartition} --array=1-${jobArrayCount}
--job-name=${jobName}.%a --output=${jobName}.stdout.%a.%j
--error=${jobName}.stderr.%a.%j --mem-per-cpu=${jobMem} --export=ALL <<"EOF"
#!/bin/bash
stamp=`date && hostname`
echo -e "Child array job [${SLURM_ARRAY_TASK_ID}]:\n${stamp}"
EOF
exit 0
-------------------

The filenames of the output and error logs from this job contain the
correct array task ID (1 through ${jobArrayCount}, represented with the %a
variable) and parent job ID (represented with the %j variable).

However, the job name (${jobName}.%a) only expands the ${jobName} variable,
and it prints the %a value as a string literal — that is, it is left
untranslated to the array task ID.

For example, if "jobName=foo", then the use of --job-name=${jobName}.%a
results in the scheduler using the job name "foo.%a", instead of "foo.1",
"foo.2", and so on, up to the number of child jobs in the array.

As output and error logs can use the %a array task ID variable, is there a
way to get the job name assignment to use this variable as well?

Another thing I tried was to move the job name assignment within the
heredoc block:

-------------------
#!/bin/bash
...(variables)...
sbatch --parsable --partition=${jobPartition} --array=1-${jobArrayCount}
--output=${jobName}.stdout.%a.%j --error=${jobName}.stderr.%a.%j
--mem-per-cpu=${jobMem} --export=ALL <<"EOF"
#!/bin/bash
#SBATCH --job-name="${jobName}.${SLURM_ARRAY_TASK_ID}"
stamp=`date && hostname`
echo -e "Child array job [${SLURM_ARRAY_TASK_ID}]:\n${stamp}"
EOF
exit 0
-------------------

In this case, the job name is rendered literally as the string
"${jobName}.${SLURM_ARRAY_TASK_ID}".

A third thing that I tried was to rename the job name via `scontrol`, after
the fact, which works but only if the job is in the scheduler and only if
it is running:

-------------------
$ scontrol update JobId=${arrayJobId} JobName=${jobName}.${jobArrayTaskId}
-------------------

The `sacct` program does not seem to have keywords that grant access to
array job and task IDs, e.g.:

-------------------
$ sacct -j ${arrayJobId} --format=ArrayJobId,ArrayTaskId --noheader
--parsable2
sacct: error: Invalid field requested: "ArrayJobId"
-------------------

(Keywords are listed here: https://slurm.schedmd.com/sacct.html)

However, it looks like I can use `scontrol` to get the array job and task
IDs, though it is a bit of a hack:

-------------------
$ scontrol show job ${arrayJobId} | grep ArrayTaskId | awk '{i=split($0,a,"
"); j=split(a[3],b,"="); k=split(a[4],c,"="); print c[2]"."b[2]; }'
testArrayChild.1
-------------------

There are a few problems with this approach:

1. I can't rename the array of jobs until they are in the scheduler
2. My method for getting the array task ID is a hack that seems fragile
3. I can't rename the job after it is finished

These issues seem to make this approach difficult to implement in a
reliable way.

My question, ultimately, is: Is there an easier way to have the an array
job name include the array task ID?

Regards,
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180417/c7f246b2/attachment.html>


More information about the slurm-users mailing list