I need to account for jobs composed of multiple jobs launched on multiple federated (and non-federated) clusters, which therefore have different job IDs. What are the best practices to prevent users from bypassing this tracking?
NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
Hello,
What is meant here by "tracking"? What information are you looking to gather and track?
I'd say the simplest answer is using sacct, but I am not sure how federated/non-federated setups come into play while using it.
David
On Tue, Aug 27, 2024 at 6:23 AM Di Bernardini, Fabio via slurm-users < slurm-users@lists.schedmd.com> wrote:
I need to account for jobs composed of multiple jobs launched on multiple federated (and non-federated) clusters, which therefore have different job IDs. What are the best practices to prevent users from bypassing this tracking?
NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
For example if a job has to use different clusters with Slurm I am forced to launch it with two sbatch commands:
sbatch -M cluster1 job1 sbatch -m cluster2 job2
This way I get two different jobids. Using sacct I have not found a way to know that the two jobs were launched within the same workflow. I was hoping not to have to add other components such as Nextflow.
From: David drhey@umich.edu Sent: Thursday, August 29, 2024 2:53 PM To: Di Bernardini, Fabio dfabio@amazon.com Cc: slurm-users@lists.schedmd.com Subject: RE: [EXTERNAL] [slurm-users] Best practices for tracking jobs started across multiple clusters for accounting purposes.
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
Hello,
What is meant here by "tracking"? What information are you looking to gather and track?
I'd say the simplest answer is using sacct, but I am not sure how federated/non-federated setups come into play while using it.
David
On Tue, Aug 27, 2024 at 6:23 AM Di Bernardini, Fabio via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> wrote: I need to account for jobs composed of multiple jobs launched on multiple federated (and non-federated) clusters, which therefore have different job IDs. What are the best practices to prevent users from bypassing this tracking?
NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com
-- David Rhey --------------- Advanced Research Computing University of Michigan
NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
Can whatever is running those sbatch commands add a --comment with a shared identifier that AccountingStoreFlags=job_comment would make available in sacct?
Thanks Laura,
your suggestion is similar to the solution I am implementing but using the Extra field of slurmdbd because I think it is less used than the Comment field. As pros I have that Extra field allows 64K characters compared to 1K of Comment field but on the other hand I preclude future use of "SchedulerParameters=extra_constraints" feature.
Anyway thanks for confirming that the approach is correct.
-- Fabio
-----Original Message----- From: Laura Hild lsh@jlab.org Sent: Friday, August 30, 2024 8:37 PM To: Di Bernardini, Fabio dfabio@amazon.com Cc: slurm-users@lists.schedmd.com Subject: RE: [EXTERNAL] [slurm-users] Re: Best practices for tracking jobs started across multiple clusters for accounting purposes.
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
Can whatever is running those sbatch commands add a --comment with a shared identifier that AccountingStoreFlags=job_comment would make available in sacct?
NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico