We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
You'd have to do this within e.g. the system's bashrc infrastructure. The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment.
Another option would be to run the commands/scripts for all users on some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Yeah, I was contemplating doing that so I didn't have a dependency on the scheduler being up or down or busy.
What I was more curious about is if any one had an prebaked scripts for that.
-Paul Edmon-
On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:
You'd have to do this within e.g. the system's bashrc infrastructure. The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment.
Another option would be to run the commands/scripts for all users on some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Maybe a heavier lift than you had in mind, but check out xdmod, open.xdmod.org.
It was developed by the NSF as part of the now-shuttered XSEDE program, and is useful for both system and user monitoring.
-- A.
On Fri, Aug 09, 2024 at 12:12:08PM -0400, Paul Edmon via slurm-users wrote:
Yeah, I was contemplating doing that so I didn't have a dependency on the scheduler being up or down or busy.
What I was more curious about is if any one had an prebaked scripts for that.
-Paul Edmon-
On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:
You'd have to do this within e.g. the system's bashrc infrastructure. The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment.
Another option would be to run the commands/scripts for all users on some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Yup, we have that installed already. It's been very beneficial for over all monitoring.
-Paul Edmon-
On 8/9/2024 12:27 PM, Reid, Andrew C.E. (Fed) wrote:
Maybe a heavier lift than you had in mind, but check out xdmod, open.xdmod.org.
It was developed by the NSF as part of the now-shuttered XSEDE program, and is useful for both system and user monitoring.
-- A.
On Fri, Aug 09, 2024 at 12:12:08PM -0400, Paul Edmon via slurm-users wrote:
Yeah, I was contemplating doing that so I didn't have a dependency on the scheduler being up or down or busy.
What I was more curious about is if any one had an prebaked scripts for that.
-Paul Edmon-
On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:
You'd have to do this within e.g. the system's bashrc infrastructure. The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment.
Another option would be to run the commands/scripts for all users on some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
I too would be interested in some lightweight scripts. XDMOD in my experience has been very intense in workload to install, maintain and learn. It's great if one needs that level of interactivity, granularity and detail, but for some "quick and dirty" summary in a small dept it's not only overkill, it's also impossible given the available staffing.
On Fri, Aug 9, 2024 at 10:31 AM Paul Edmon via slurm-users < slurm-users@lists.schedmd.com> wrote:
Yup, we have that installed already. It's been very beneficial for over all monitoring.
-Paul Edmon-
On 8/9/2024 12:27 PM, Reid, Andrew C.E. (Fed) wrote:
Maybe a heavier lift than you had in mind, but check out xdmod, open.xdmod.org.
It was developed by the NSF as part of the now-shuttered XSEDE program, and is useful for both system and user monitoring.
-- A.
On Fri, Aug 09, 2024 at 12:12:08PM -0400, Paul Edmon via slurm-users
wrote:
Yeah, I was contemplating doing that so I didn't have a dependency on
the
scheduler being up or down or busy.
What I was more curious about is if any one had an prebaked scripts for that.
-Paul Edmon-
On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:
You'd have to do this within e.g. the system's bashrc infrastructure.
The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment.
Another option would be to run the commands/scripts for all users on
some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users <
slurm-users@lists.schedmd.com> wrote:
We are working to make our users more aware of their usage. One of
the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
I too would be interested in some lightweight scripts
For lightweight stats I tend to use this excellent script: slurmacct. Author is member of this mailinglist too. (hi):
https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacc...
Currently I am in process of writing prometheus exporter as the one I've used for years (https://github.com/vpenso/prometheus-slurm-exporter) provides suboptimal results with Slurm 24.04+. (we use looong job arrays at our system breaking somehow the exporter, which is parsing text output of squeue command)
cheers
josef
________________________________ From: Davide DelVento via slurm-users slurm-users@lists.schedmd.com Sent: Wednesday, 14 August 2024 01:52 To: Paul Edmon pedmon@cfa.harvard.edu Cc: Reid, Andrew C.E. (Fed) andrew.reid@nist.gov; Jeffrey T Frey frey@udel.edu; slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Print Slurm Stats on Login
I too would be interested in some lightweight scripts. XDMOD in my experience has been very intense in workload to install, maintain and learn. It's great if one needs that level of interactivity, granularity and detail, but for some "quick and dirty" summary in a small dept it's not only overkill, it's also impossible given the available staffing. ...
This is wonderful, thanks Josef and Ole! I will need to familiarize myself with it, but on a cursory glance it looks almost exactly what I was looking for!
On Wed, Aug 14, 2024 at 1:44 AM Josef Dvořáček via slurm-users < slurm-users@lists.schedmd.com> wrote:
I too would be interested in some lightweight scripts
For lightweight stats I tend to use this excellent script: slurmacct. Author is member of this mailinglist too. (hi):
https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacc...
Currently I am in process of writing prometheus exporter as the one I've used for years (https://github.com/vpenso/prometheus-slurm-exporter) provides suboptimal results with Slurm 24.04+. (we use looong job arrays at our system breaking somehow the exporter, which is parsing text output of squeue command)
cheers
josef
*From:* Davide DelVento via slurm-users slurm-users@lists.schedmd.com *Sent:* Wednesday, 14 August 2024 01:52 *To:* Paul Edmon pedmon@cfa.harvard.edu *Cc:* Reid, Andrew C.E. (Fed) andrew.reid@nist.gov; Jeffrey T Frey < frey@udel.edu>; slurm-users@lists.schedmd.com < slurm-users@lists.schedmd.com> *Subject:* [slurm-users] Re: Print Slurm Stats on Login
I too would be interested in some lightweight scripts. XDMOD in my experience has been very intense in workload to install, maintain and learn. It's great if one needs that level of interactivity, granularity and detail, but for some "quick and dirty" summary in a small dept it's not only overkill, it's also impossible given the available staffing. ...
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
This thread when a bunch of different directions. However I ran with Jeffrey's suggestion and wrote up a profile.d script along with other supporting scripts to pull the data. The setup I put together is here for the community to use as they see fit:
https://github.com/fasrc/puppet-slurm_stats
While this is written as a puppet module the scripts there in can be used by anyone as its a pretty straightforward set up and the templates have obvious places to do a find and replace.
Naturally I'm happy to take additional merge requests. Thanks for all the interesting conversation about this. Lots of great ideas.
-Paul Edmon-
On 8/9/24 12:04 PM, Jeffrey T Frey wrote:
You'd have to do this within e.g. the system's bashrc infrastructure. The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment.
Another option would be to run the commands/scripts for all users on some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Thanks everybody once again and especially Paul: your job_summary script was exactly what I needed, served on a golden plate. I just had to modify/customize the date range and change the following line (I can make a PR if you want, but it's such a small change that it'd take more time to deal with the PR than just typing it)
- Timelimit = time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00')) + Timelimit = time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00').replace('Partition_Limit','365-00:00:00'))
Cheers, Davide
On Tue, Aug 27, 2024 at 1:40 PM Paul Edmon via slurm-users < slurm-users@lists.schedmd.com> wrote:
This thread when a bunch of different directions. However I ran with Jeffrey's suggestion and wrote up a profile.d script along with other supporting scripts to pull the data. The setup I put together is here for the community to use as they see fit:
https://github.com/fasrc/puppet-slurm_stats
While this is written as a puppet module the scripts there in can be used by anyone as its a pretty straightforward set up and the templates have obvious places to do a find and replace.
Naturally I'm happy to take additional merge requests. Thanks for all the interesting conversation about this. Lots of great ideas.
-Paul Edmon-
On 8/9/24 12:04 PM, Jeffrey T Frey wrote:
You'd have to do this within e.g. the system's bashrc infrastructure.
The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment.
Another option would be to run the commands/scripts for all users on
some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users <
slurm-users@lists.schedmd.com> wrote:
We are working to make our users more aware of their usage. One of the
ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Thanks. I've made that fix.
-Paul Edmon-
On 8/28/24 5:42 PM, Davide DelVento wrote:
Thanks everybody once again and especially Paul: your job_summary script was exactly what I needed, served on a golden plate. I just had to modify/customize the date range and change the following line (I can make a PR if you want, but it's such a small change that it'd take more time to deal with the PR than just typing it)
- Timelimit =
time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00'))
- Timelimit =
time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00').replace('Partition_Limit','365-00:00:00'))
Cheers, Davide
On Tue, Aug 27, 2024 at 1:40 PM Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
This thread when a bunch of different directions. However I ran with Jeffrey's suggestion and wrote up a profile.d script along with other supporting scripts to pull the data. The setup I put together is here for the community to use as they see fit: https://github.com/fasrc/puppet-slurm_stats While this is written as a puppet module the scripts there in can be used by anyone as its a pretty straightforward set up and the templates have obvious places to do a find and replace. Naturally I'm happy to take additional merge requests. Thanks for all the interesting conversation about this. Lots of great ideas. -Paul Edmon- On 8/9/24 12:04 PM, Jeffrey T Frey wrote: > You'd have to do this within e.g. the system's bashrc infrastructure. The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of login shells based on how responsive slurmctld/slurmdbd are at that moment. > > Another option would be to run the commands/scripts for all users on some timed schedule — e.g. produce per-user stats every 30 minutes. So long as the stats are publicly-visible anyway, put those summaries in a shared file system with open read access. Name the files by uid number. Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u). > > > > >> On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users <slurm-users@lists.schedmd.com> wrote: >> >> We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology. >> >> -Paul Edmon- >> >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every night. It was written for our cluster, but probably wouldn't take much work to adapt to another cluster with similar structure.
You can see the code and some screenshots at:
https://github.com/s-andrews/capstone_monitor
..and there's a video walk through at:
We've also got more friendly scripts for monitoring current and past jobs on the command line. These are in a private repository as some of the other information there is more sensitive but I'm happy to share those scripts. You can see the scripts being used in https://vimeo.com/982986202
Simon.
-----Original Message----- From: Paul Edmon via slurm-users slurm-users@lists.schedmd.com Sent: 09 August 2024 16:12 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Print Slurm Stats on Login
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
------------------------------------ This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it (https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1z...).
Heavyweight solution (although if you have grafana and prometheus going already a little less so): https://github.com/rivosinc/prometheus-slurm-exporter
On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users < slurm-users@lists.schedmd.com> wrote:
Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every night. It was written for our cluster, but probably wouldn't take much work to adapt to another cluster with similar structure.
You can see the code and some screenshots at:
https://github.com/s-andrews/capstone_monitor
..and there's a video walk through at:
We've also got more friendly scripts for monitoring current and past jobs on the command line. These are in a private repository as some of the other information there is more sensitive but I'm happy to share those scripts. You can see the scripts being used in https://vimeo.com/982986202
Simon.
-----Original Message----- From: Paul Edmon via slurm-users slurm-users@lists.schedmd.com Sent: 09 August 2024 16:12 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Print Slurm Stats on Login
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it ( https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1z... ).
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Thanks Kevin and Simon,
The full thing that you do is indeed overkill, however I was able to learn how to collect/parse some of the information I need.
What I am still unable to get is:
- utilization by queue (or list of node names), to track actual use of expensive resources such as GPUs, high memory nodes, etc - statistics about wait-in-queue for jobs, due to unavailable resources
hopefully both in a sreport-like format by user and by overall system
I suspect this information is available in sacct, but needs some massaging/consolidation to become useful for what I am looking for. Perhaps either (or both) of your scripts already do that in some place that I did not find? That would be terrific, and I'd appreciate it if you can point me to its place.
Thanks again!
On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users < slurm-users@lists.schedmd.com> wrote:
Heavyweight solution (although if you have grafana and prometheus going already a little less so): https://github.com/rivosinc/prometheus-slurm-exporter
On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users < slurm-users@lists.schedmd.com> wrote:
Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every night. It was written for our cluster, but probably wouldn't take much work to adapt to another cluster with similar structure.
You can see the code and some screenshots at:
https://github.com/s-andrews/capstone_monitor
..and there's a video walk through at:
We've also got more friendly scripts for monitoring current and past jobs on the command line. These are in a private repository as some of the other information there is more sensitive but I'm happy to share those scripts. You can see the scripts being used in https://vimeo.com/982986202
Simon.
-----Original Message----- From: Paul Edmon via slurm-users slurm-users@lists.schedmd.com Sent: 09 August 2024 16:12 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Print Slurm Stats on Login
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it ( https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1z... ).
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hi Davide,
Did you already check out what the slurmacct script can do for you? See https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacc...
What you're asking for seems like a pretty heavy task regarding system resources and Slurm database requests. You don't imagine this to run every time a user makes a login shell? Some users might run "bash -l" inside jobs to emulate a login session, causing a heavy load on your servers.
/Ole
On 8/21/24 01:13, Davide DelVento via slurm-users wrote:
Thanks Kevin and Simon,
The full thing that you do is indeed overkill, however I was able to learn how to collect/parse some of the information I need.
What I am still unable to get is:
- utilization by queue (or list of node names), to track actual use of
expensive resources such as GPUs, high memory nodes, etc
- statistics about wait-in-queue for jobs, due to unavailable resources
hopefully both in a sreport-like format by user and by overall system
I suspect this information is available in sacct, but needs some massaging/consolidation to become useful for what I am looking for. Perhaps either (or both) of your scripts already do that in some place that I did not find? That would be terrific, and I'd appreciate it if you can point me to its place.
Thanks again!
On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Heavyweight solution (although if you have grafana and prometheus going already a little less so): https://github.com/rivosinc/prometheus-slurm-exporter <https://github.com/rivosinc/prometheus-slurm-exporter> On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> wrote: Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every night. It was written for our cluster, but probably wouldn't take much work to adapt to another cluster with similar structure. You can see the code and some screenshots at: https://github.com/s-andrews/capstone_monitor <https://github.com/s-andrews/capstone_monitor> ..and there's a video walk through at: https://vimeo.com/982985174 <https://vimeo.com/982985174> We've also got more friendly scripts for monitoring current and past jobs on the command line. These are in a private repository as some of the other information there is more sensitive but I'm happy to share those scripts. You can see the scripts being used in https://vimeo.com/982986202 <https://vimeo.com/982986202> Simon. -----Original Message----- From: Paul Edmon via slurm-users <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> Sent: 09 August 2024 16:12 To: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] Print Slurm Stats on Login We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology. -Paul Edmon- -- slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com> ------------------------------------ This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it (https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D <https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D>).
Hi,
what Ole wrote is exactly what crossed my mind. I had an episode with stats at login too, I put reportseff to motd script and it was a bad idea. It turned out that if for any reason slurm controler took longer time to respond, it delayed user login which annoyed them more than they appreciated the output from that command. It got even worse when controller did not respond, due to not the best error handling in reportseff user got python trace at login. After all I crafted simple script called `resused` which shows last 15 jobs from last 7 days via reportseff, and users can run it by themselves whenerver they need to.
Best regards Patryk.
On 24/08/21 08:17, Ole Holm Nielsen via slurm-users wrote:
Hi Davide,
Did you already check out what the slurmacct script can do for you? See https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacc...
What you're asking for seems like a pretty heavy task regarding system resources and Slurm database requests. You don't imagine this to run every time a user makes a login shell? Some users might run "bash -l" inside jobs to emulate a login session, causing a heavy load on your servers.
/Ole
On 8/21/24 01:13, Davide DelVento via slurm-users wrote:
Thanks Kevin and Simon,
The full thing that you do is indeed overkill, however I was able to learn how to collect/parse some of the information I need.
What I am still unable to get is:
- utilization by queue (or list of node names), to track actual use of
expensive resources such as GPUs, high memory nodes, etc
- statistics about wait-in-queue for jobs, due to unavailable resources
hopefully both in a sreport-like format by user and by overall system
I suspect this information is available in sacct, but needs some massaging/consolidation to become useful for what I am looking for. Perhaps either (or both) of your scripts already do that in some place that I did not find? That would be terrific, and I'd appreciate it if you can point me to its place.
Thanks again!
On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Heavyweight solution (although if you have grafana and prometheus going already a little less so): https://github.com/rivosinc/prometheus-slurm-exporter <https://github.com/rivosinc/prometheus-slurm-exporter> On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> wrote: Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every
night. It was written for our cluster, but probably wouldn't take much work to adapt to another cluster with similar structure.
You can see the code and some screenshots at: https://github.com/s-andrews/capstone_monitor <https://github.com/s-andrews/capstone_monitor> ..and there's a video walk through at: https://vimeo.com/982985174 <https://vimeo.com/982985174> We've also got more friendly scripts for monitoring current and past jobs on the command line. These are in a private repository as some of the other information there is more sensitive but I'm happy to share those scripts. You can see the scripts being used in https://vimeo.com/982986202 <https://vimeo.com/982986202> Simon. -----Original Message----- From: Paul Edmon via slurm-users <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> Sent: 09 August 2024 16:12 To: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] Print Slurm Stats on Login We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology. -Paul Edmon- -- slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com> ------------------------------------ This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it (https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D <https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D>).
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Thanks, Ole! Your tools and what you do for the community is fantastic, we all appreciate you!
Of course, I did look (and use) your script. But I need more info.
And no, this is not something that users would run *ever* (let alone at every login). This is something I *myself* (the cluster administrator) need to run, once a quarter, or perhaps even just once a year, to inform my managers of cluster utilization to keep them apprised on the status of the affairs, and justify change in funding for future hardware purchases. Sorry for not making this clear, given the initial message I replied to.
Thanks for any suggestion you might have.
On Wed, Aug 21, 2024 at 12:19 AM Ole Holm Nielsen via slurm-users < slurm-users@lists.schedmd.com> wrote:
Hi Davide,
Did you already check out what the slurmacct script can do for you? See
https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacc...
What you're asking for seems like a pretty heavy task regarding system resources and Slurm database requests. You don't imagine this to run every time a user makes a login shell? Some users might run "bash -l" inside jobs to emulate a login session, causing a heavy load on your servers.
/Ole
On 8/21/24 01:13, Davide DelVento via slurm-users wrote:
Thanks Kevin and Simon,
The full thing that you do is indeed overkill, however I was able to
learn
how to collect/parse some of the information I need.
What I am still unable to get is:
- utilization by queue (or list of node names), to track actual use of
expensive resources such as GPUs, high memory nodes, etc
- statistics about wait-in-queue for jobs, due to unavailable resources
hopefully both in a sreport-like format by user and by overall system
I suspect this information is available in sacct, but needs some massaging/consolidation to become useful for what I am looking for. Perhaps either (or both) of your scripts already do that in some place that I did not find? That would be terrific, and I'd appreciate it if
you
can point me to its place.
Thanks again!
On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com>
wrote:
Heavyweight solution (although if you have grafana and prometheus going already a little less so): https://github.com/rivosinc/prometheus-slurm-exporter <https://github.com/rivosinc/prometheus-slurm-exporter> On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com
wrote: Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every
night.
It was written for our cluster, but probably wouldn't take much work to adapt to another cluster with similar structure. You can see the code and some screenshots at: https://github.com/s-andrews/capstone_monitor <https://github.com/s-andrews/capstone_monitor> ..and there's a video walk through at: https://vimeo.com/982985174 <https://vimeo.com/982985174> We've also got more friendly scripts for monitoring current and past jobs on the command line. These are in a private repository as some of the other information there is more sensitive but I'm happy to share those scripts. You can see the scripts being used in https://vimeo.com/982986202 <https://vimeo.com/982986202> Simon. -----Original Message----- From: Paul Edmon via slurm-users <slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> Sent: 09 August 2024 16:12 To: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] Print Slurm Stats on Login We are working to make our users more aware of their usage. One
of
the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology. -Paul Edmon- -- slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com> ------------------------------------ This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it (
https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1z... < https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1z...
).
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hi Davide,
Thanks, I appreciate your positive feedback! Some comments are below:
On 21-08-2024 15:07, Davide DelVento wrote:
Thanks, Ole! Your tools and what you do for the community is fantastic, we all appreciate you!
Of course, I did look (and use) your script. But I need more info.
And no, this is not something that users would run *ever* (let alone at every login). This is something I *myself* (the cluster administrator) need to run, once a quarter, or perhaps even just once a year, to inform my managers of cluster utilization to keep them apprised on the status of the affairs, and justify change in funding for future hardware purchases. Sorry for not making this clear, given the initial message I replied to.
...
> What I am still unable to get is: > > - utilization by queue (or list of node names), to track actual use of expensive resources such as GPUs, high memory nodes, etc
The slurmacct script can actually break down statistics by partition, which I guess is what you're asking for? The usage of the command is:
# slurmacct -h Usage: slurmacct [-s Start_time -e End_time | -c | -w | -m monthyear] [-p partition(s)] [-u username] [-g groupname] [-G] [-W workdir] [-r report-prefix] [-n] [-h] where: -s Start_time [last month]: Starting time of accounting period. -e End_time [last month]: End time of accounting period. -c: Current month -w: Last week -m monthyear: Select month and year (like "november2019") -p partition(s): Select only Slurm partion <partition>[,partition2,...] -u username: Print only user <username> -g groupname: Print only users in UNIX group <groupname> -G: Print only groupwise summed accounting data -W directory: Print only jobs with this string in the job WorkDir -r: Report name prefix -n: No header information is printed -h: Print this help information
The Start_time and End_time values specify the date/time interval of job completion/termination (see "man sacct").
Hint: Specify Start/End time as MMDD (Month and Date)
> - statistics about wait-in-queue for jobs, due to unavailable resources
The slurmacct report prints "Average q-hours" (starttime minus submittime).
> hopefully both in a sreport-like format by user and by overall system > > I suspect this information is available in sacct, but needs some > massaging/consolidation to become useful for what I am looking for. > Perhaps either (or both) of your scripts already do that in some place > that I did not find? That would be terrific, and I'd appreciate it if you > can point me to its place.
We use the "topreports" script to gather weekly, monthly and yearly reports (using slurmacct) for management (professors at our university).
IHTH, Ole
Hi Ole,
On Wed, Aug 21, 2024 at 1:06 PM Ole Holm Nielsen via slurm-users < slurm-users@lists.schedmd.com> wrote:
The slurmacct script can actually break down statistics by partition, which I guess is what you're asking for? The usage of the command is:
Yes, this is almost what I was asking for. And admittedly I now realize that with perhaps some minor algebra (using the TOTAL-all line) I could get what I need. What confused me is that running it from everything or one partition reported the same beginning, rather than a partition-specific beginning:
[davide ~]$ slurmacct -s 0101 -e 0202 Start date 0101 End date 0202 Report generated to file /tmp/Slurm_report_acct_0101_0202 [davide ~]$ cat /tmp/Slurm_report_acct_0101_0202 -------------------------------------------------------------------------------- Cluster Utilization 01-Jan-2024_00:00 - 01-Feb-2024_23:59 Usage reported in Percentage of Total -------------------------------------------------------------------------------- Cluster Allocated Down PLND Dow Idle Planned Reported --------- ---------- ---------- -------- --------- -------- ---------- cluster 23.25% 67.85% 0.00% 8.89% 0.01% 100.00%
Usage sorted by top users: (omitted)
[davide ~]$ slurmacct -s 0101 -e 0202 Start date 0101 End date 0202 Print only accounting in Slurm partition gpu Report generated to file /tmp/Slurm_report_acct_0101_0202 [davide ~]$ cat /tmp/Slurm_report_acct_0101_0202 -------------------------------------------------------------------------------- Cluster Utilization 01-Jan-2024_00:00 - 01-Feb-2024_23:59 Usage reported in Percentage of Total -------------------------------------------------------------------------------- Cluster Allocated Down PLND Dow Idle Planned Reported --------- ---------- ---------- -------- --------- -------- ---------- cluster 23.25% 67.85% 0.00% 8.89% 0.01% 100.00%
Partition selected: gpu Usage sorted by top users: (omitted)
Also, what you label "Wallclock hours" in the table of users is actually core-hours? Not even node-hours, otherwise I am reading things incorrectly.
The Start_time and End_time values specify the date/time interval of
job completion/termination (see "man sacct").
Hint: Specify Start/End time as MMDD (Month and Date)
Small suggestion: change this to
Hint: Specify Start/End time as MMDD (Month and Day) or as MMDDYY (Month and Day and Year) since sreport accepts it and your tool appears to otherwise understand that format.
> - statistics about wait-in-queue for jobs, due to unavailable resources
The slurmacct report prints "Average q-hours" (starttime minus submittime).
Ahaha! That's it! Super useful, I was wondering what "q" was (wait-in-Queue, I guess). You are super.
We use the "topreports" script to gather weekly, monthly and yearly
reports (using slurmacct) for management (professors at our university).
I knew that I must not have been the only one with this need ;-)
Thanks again!
Those pieces of information are available from squeue / sacct as long as you’re happy to have a wrapper which does the aggregation part for you. The commands I parse for our stat summaries are:
scontrol show nodes
squeue -r -O jobid,username,minmemory,numcpus,nodelist
sacct -a -S [one_month_ago] -o jobid,jobname,alloccpus,cputime%15,reqmem,account,submit,elapsed,state
The only thing which I can’t find an easy way to get is the total requested memory for a job. You’d think this would be simple with squeue minmemory – except that for some jobs that value is the value for the whole job, and for others it’s a value per-cpu, so if you want to know the total you have to multiply by the number of requested CPUs. The only place I’ve managed to find that setting is from
scontrol show jobid -d [jobid]
Where you can examine the “MinMemoryCPU” value – however this is really slow if you’re doing that for thousands of jobs. If anyone knows how to get this to show up correctly in squeue/sacct that would be super helpful.
Simon.
From: Davide DelVento davide.quantum@gmail.com Sent: 21 August 2024 00:14 To: Kevin Broch kbroch@rivosinc.com; Simon Andrews simon.andrews@babraham.ac.uk Cc: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Re: Print Slurm Stats on Login
CAUTION: This email originated outside of the Organisation. Please help to keep us safe and do not click links or open attachments unless you recognise the sender and know the content is safe.
Thanks Kevin and Simon,
The full thing that you do is indeed overkill, however I was able to learn how to collect/parse some of the information I need.
What I am still unable to get is:
- utilization by queue (or list of node names), to track actual use of expensive resources such as GPUs, high memory nodes, etc - statistics about wait-in-queue for jobs, due to unavailable resources
hopefully both in a sreport-like format by user and by overall system
I suspect this information is available in sacct, but needs some massaging/consolidation to become useful for what I am looking for. Perhaps either (or both) of your scripts already do that in some place that I did not find? That would be terrific, and I'd appreciate it if you can point me to its place.
Thanks again!
On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> wrote: Heavyweight solution (although if you have grafana and prometheus going already a little less so): https://github.com/rivosinc/prometheus-slurm-exporter
On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> wrote: Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every night. It was written for our cluster, but probably wouldn't take much work to adapt to another cluster with similar structure.
You can see the code and some screenshots at:
https://github.com/s-andrews/capstone_monitor
..and there's a video walk through at:
https://vimeo.com/982985174https://url6.mailanyone.net/scanner?m=1sgY3P-00000007sVN-37lD&d=4%7Cmail%2F90%2F1724195400%2F1sgY3P-00000007sVN-37lD%7Cin6e%7C57e1b682%7C10448314%7C12652688%7C66C5234BD2594AC531A50DA003030E55&o=%2Fphti%3A%2Fvts%2F.me8om9oc4715892&s=HkUa6gVu09L4VuD5nS6l_lPmQdY
We've also got more friendly scripts for monitoring current and past jobs on the command line. These are in a private repository as some of the other information there is more sensitive but I'm happy to share those scripts. You can see the scripts being used in https://vimeo.com/982986202https://url6.mailanyone.net/scanner?m=1sgY3P-00000007sVN-37lD&d=4%7Cmail%2F90%2F1724195400%2F1sgY3P-00000007sVN-37lD%7Cin6e%7C57e1b682%7C10448314%7C12652688%7C66C5234BD2594AC531A50DA003030E55&o=%2Fphti%3A%2Fvts%2F.me8om9oc2026892&s=VhbNypF9YxWyJlVfGG9twDcHcBI
Simon.
-----Original Message----- From: Paul Edmon via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> Sent: 09 August 2024 16:12 To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Print Slurm Stats on Login
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com
------------------------------------ This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it (https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1z...https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D).
-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com
You can also check https://github.com/prod-feng/slurm_tools
slurm_job_perf_show.py may be helpful.
I used to try to use slurm_job_perf_show_email.py to send emails to users to summarize their usage, like monthly. While some users seemed to get confused, so stopped.
Best,
Feng
On Fri, Aug 9, 2024 at 11:13 AM Paul Edmon via slurm-users slurm-users@lists.schedmd.com wrote:
We are working to make our users more aware of their usage. One of the ideas we came up with was to having some basic usage stats printed at login (usage over past day, fairshare, job efficiency, etc). Does anyone have any scripts or methods that they use to do this? Before baking my own I was curious what other sites do and if they would be willing to share their scripts and methodology.
-Paul Edmon-
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com