[slurm-users] how to know the real utilization of a node when oversubscribe is set to FORCE (Mark Hahn)

Sebastian T Smith stsmith at unr.edu
Fri Jul 17 20:30:35 UTC 2020


Hi,

I think the `Elapsed` or `ElapsedRaw` field is what you're looking for.  Selected example from my system:

$ sacct -X --allusers --format="AllocCPUS,Elapsed,ElapsedRaw,CPUTime,CPUTimeRAW"

AllocCPUS   Elapsed            ElapsedRaw    CPUTime              CPUTimeRAW
--------------------------------------------------------------------------------------------------------
64                 10-00:00:29    864029            640-00:30:56       55297856
640               10-05:09:08    882548            6537-09:25:20    564830720
128               5-23:23:21      516201            764-17:48:48       66073728
...
...
...

Divide `CPUTimeRAW` by `ElapsedRaw` and you get the allocated CPUs for the job.  Be careful with requested resources vs allocated resources if you have SMT enabled on your system.  The values can be different depending on the options of your user's jobs.

- Sebastian


--

[University of Nevada, Reno]<http://www.unr.edu/>
Sebastian Smith
High-Performance Computing Engineer
Office of Information Technology
1664 North Virginia Street
MS 0291

work-phone: 775-682-5050<tel:7756825050>
email: stsmith at unr.edu<mailto:stsmith at unr.edu>
website: http://rc.unr.edu<http://rc.unr.edu/>

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of 肖正刚 <guru.novice at gmail.com>
Sent: Thursday, July 16, 2020 8:15 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>; Mark Hahn <hahn at mcmaster.ca>
Subject: Re: [slurm-users] how to know the real utilization of a node when oversubscribe is set to FORCE (Mark Hahn)

Hi, Hahn
I mean the elapsed time.
In the example I mentioned earlier,a job run 10s ,the elapsed time from sacct is 640s(10s*64), so how do i get the real elapsed time from sacct or other command line tools?

As you mentioned, i checked usercpu/systemcpu/totalcpu, but they all zero
Query command: sacct -T -X -S 2020-07-16T00:00:00 -E 2020-07-16T23:59:59 -r $partition --format=cputimeraw,usercpu,systemcpu,totalcpu,alloccpus,allocnodes,AllocTRES%40
CPUTimeRAW    UserCPU  SystemCPU   TotalCPU  AllocCPUS AllocNodes                                AllocTRES
---------- ---------- ---------- ---------- ---------- ---------- ----------------------------------------
       640   00:00:00   00:00:00   00:00:00         64          1                 billing=64,cpu=64,node=1
       640   00:00:00   00:00:00   00:00:00         64          1                 billing=64,cpu=64,node=1
       640   00:00:00   00:00:00   00:00:00         64          1                 billing=64,cpu=64,node=1
       640   00:00:00   00:00:00   00:00:00         64          1                 billing=64,cpu=64,node=1
       640   00:00:00   00:00:00   00:00:00         64          1                 billing=64,cpu=64,node=1
       640   00:00:00   00:00:00   00:00:00         64          1                 billing=64,cpu=64,node=1
       640   00:00:00   00:00:00   00:00:00         64          1                 billing=64,cpu=64,node=1

Jobs use 1/2/4/8/16/32/64 cores, but elapsed time are the same from sacct.
The real elasped time should be 10/20/40/80/160/320/640

regards.



----------------------------------------------------------------------

Message: 1
Date: Thu, 16 Jul 2020 11:03:07 -0400 (EDT)
From: Mark Hahn <hahn at mcmaster.ca<mailto:hahn at mcmaster.ca>>
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: Re: [slurm-users] how to know the real utilization of a node
        when oversubscribe is set to FORCE
Message-ID:
        <alpine.LFD.2.02.2007161036410.16842 at coffee.psychology.mcmaster.ca<mailto:alpine.LFD.2.02.2007161036410.16842 at coffee.psychology.mcmaster.ca>>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

> srun -N 1 -n 1 -p testA sleep 10
> then the cpurawtime of this job recorded by slurm is 640s, but actually
> this job only used 10s;
> so, I want to know are there any way to get the real cputime used by this
> job in slurm.

if you really mean cpu time (compute-bound, not elapsed),
then don't you just want usercpu, systemcpu and totalcpu from sacct?

cputime/cputimeraw is just ncpus * elapsed.

regards,
--
operator may differ from spokesperson.              hahn at mcmaster.ca<mailto:hahn at mcmaster.ca>



------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200717/023137a0/attachment.htm>


More information about the slurm-users mailing list