Hello,
I am attempting to implement a billback model and finding myself stymied by the way that sreport handles job arrays. Basically, when a user submits a large array, their usage includes time that jobs in the back of the array spend waiting their turn. (My #1 user in “sreport user topusage” shows more “used” cpu*minutes than the cluster physically _has_ during that interval.) However, jobs that are idle pending resources are simply regarded as pending; as a result, a “polite” user who submits an array of 1000 jobs running N at a time is penalized over a user who just dumps 1000 loose jobs into the queue. This incentives my users to do exactly what I do not want!
Has anyone tried to bill their users based on the results of sreport? If so, how did you work around this problem? What did you use to determine the # of CPU*Minutes that a user actually allocated on during a given interval?
--
Chip Seraphine Grid Operations For support please use help-grid in email or slack. This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
Chip,
I use 'sacct' rather than sreport and get individual job data. That is ingested into a db and PowerBI, which can then aggregate as needed.
sreport is pretty general and likely not the best for accurate chargeback data.
Brian Andrus
On 3/4/2024 6:09 AM, Chip Seraphine via slurm-users wrote:
Hello,
I am attempting to implement a billback model and finding myself stymied by the way that sreport handles job arrays. Basically, when a user submits a large array, their usage includes time that jobs in the back of the array spend waiting their turn. (My #1 user in “sreport user topusage” shows more “used” cpu*minutes than the cluster physically _has_ during that interval.) However, jobs that are idle pending resources are simply regarded as pending; as a result, a “polite” user who submits an array of 1000 jobs running N at a time is penalized over a user who just dumps 1000 loose jobs into the queue. This incentives my users to do exactly what I do not want!
Has anyone tried to bill their users based on the results of sreport? If so, how did you work around this problem? What did you use to determine the # of CPU*Minutes that a user actually allocated on during a given interval?
--
Chip Seraphine Grid Operations For support please use help-grid in email or slack. This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
That's essentially what I've been doing-- a daily 'sacct' that dumps a json file, and then I dig through the json files. I'm basically doing the same thing that sreport is doing. Given the enormous amount of machinery in Slurm for handling rollups, it seems bizarre that the whole thing is made useless because of a seemingly-odd decision to treat certain pending jobs as if they were running for accounting purposes.
On 3/4/24, 12:38 PM, "Brian Andrus via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Chip,
I use 'sacct' rather than sreport and get individual job data. That is ingested into a db and PowerBI, which can then aggregate as needed.
sreport is pretty general and likely not the best for accurate chargeback data.
Brian Andrus
On 3/4/2024 6:09 AM, Chip Seraphine via slurm-users wrote:
Hello,
I am attempting to implement a billback model and finding myself stymied by the way that sreport handles job arrays. Basically, when a user submits a large array, their usage includes time that jobs in the back of the array spend waiting their turn. (My #1 user in “sreport user topusage” shows more “used” cpu*minutes than the cluster physically _has_ during that interval.) However, jobs that are idle pending resources are simply regarded as pending; as a result, a “polite” user who submits an array of 1000 jobs running N at a time is penalized over a user who just dumps 1000 loose jobs into the queue. This incentives my users to do exactly what I do not want!
Has anyone tried to bill their users based on the results of sreport? If so, how did you work around this problem? What did you use to determine the # of CPU*Minutes that a user actually allocated on during a given interval?
--
Chip Seraphine Grid Operations For support please use help-grid in email or slack. This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
Will using option "End=now" with sreport not exclude the still pending array jobs while including data for the completed ones?
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Mon, 4 Mar 2024 5:18pm, Chip Seraphine via slurm-users wrote:
External Email - Use Caution
That's essentially what I've been doing-- a daily 'sacct' that dumps a json file, and then I dig through the json files. I'm basically doing the same thing that sreport is doing. Given the enormous amount of machinery in Slurm for handling rollups, it seems bizarre that the whole thing is made useless because of a seemingly-odd decision to treat certain pending jobs as if they were running for accounting purposes.
On 3/4/24, 12:38 PM, "Brian Andrus via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Chip,
I use 'sacct' rather than sreport and get individual job data. That is ingested into a db and PowerBI, which can then aggregate as needed.
sreport is pretty general and likely not the best for accurate chargeback data.
Brian Andrus
On 3/4/2024 6:09 AM, Chip Seraphine via slurm-users wrote:
Hello,
I am attempting to implement a billback model and finding myself stymied by the way that sreport handles job arrays. Basically, when a user submits a large array, their usage includes time that jobs in the back of the array spend waiting their turn. (My #1 user in “sreport user topusage” shows more “used” cpu*minutes than the cluster physically _has_ during that interval.) However, jobs that are idle pending resources are simply regarded as pending; as a result, a “polite” user who submits an array of 1000 jobs running N at a time is penalized over a user who just dumps 1000 loose jobs into the queue. This incentives my users to do exactly what I do not want!
Has anyone tried to bill their users based on the results of sreport? If so, how did you work around this problem? What did you use to determine the # of CPU*Minutes that a user actually allocated on during a given interval?
--
Chip Seraphine Grid Operations For support please use help-grid in email or slack. This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline https://www.massgeneralbrigham.org/complianceline . Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.