[slurm-users] Performance tracking of array tasks

Loris Bennett loris.bennett at fu-berlin.de
Tue May 17 05:46:51 UTC 2022


William Dear <william.dear at i3-corps.com> writes:

> It looks like Brian's suggestion of using SACCT will be the fast answer in the short term so I'll just have to write my own script to aggregate the output.  I was hoping for a canned solution such as XDMoD but haven't found one that quite
> fits our needs.  If there's a list of recommended supporting applications for SLURM I would appreciate that.
> One example of how the canned reporting doesn't meet our needs is that my users self limit their arrays such as "--array=1-12000%100".  Technically, the initial job isn't waiting on anything but itself since it only runs 100 at a time but
> all the pending array jobs still show up as waiting.  If the partition resources are too low and the job is running less than 100 then it actually is waiting on another job.  The challenge will be determining when a job is self limiting vs
> waiting on a different job.

What is the use-case for having users need to self-limit?  We just rely
on the cap for the maximum number of jobs in an array and on fairshare
to do the rest.



> Thanks,
> William Dear
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Loris Bennett <loris.bennett at fu-berlin.de>
> Sent: Monday, May 16, 2022 9:04 AM
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] Performance tracking of array tasks 
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> Hi William,
> William Dear <william.dear at i3-corps.com> writes:
>> Could anyone please recommend methods of tracking the performance of individual tasks in a task array job?  I have installed XDMoD but it is focused solely on the Job level with no information about
>> tasks.
>> My users almost exclusively use task arrays to run embarrassingly parallel jobs.  After the job is complete I would like to see run time and peak RAM usage per task so that we can correctly size the
>> reservations for future jobs.  It would also be very helpful to break this down by node so that I can identify poorly performing nodes.
>> William Dear
> I'm not sure what you mean by a 'task array job'.  A job can have
> multiple tasks within it - I don't think you will be able to get data on
> such individual tasks very easily.  However, a job array is just a sort
> of convenient wrapper around a bunch of jobs.  Each element of a job
> array still has its own job ID, so you can extract job data the same way
> you do for a non-array job.
> Cheers,
> Loris
> --
> Dr. Loris Bennett (Herr/Mr)
> ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de
> _____________________________________
> Confidentiality Notice - The information contained in this e-mail and any attachments to it may be legally privileged and include confidential information. If you are not the intended recipient, be aware that any disclosure,
> distribution or copying of this e-mail or its attachments is prohibited. If you have received this e-mail in error, please notify the sender immediately of that fact by return e-mail and permanently delete the e-mail and any attachments
> to it.

More information about the slurm-users mailing list