[slurm-users] how to know the real utilization of a node when oversubscribe is set to FORCE (Mark Hahn)
Sebastian T Smith
stsmith at unr.edu
Fri Jul 17 20:30:35 UTC 2020
Hi,
I think the `Elapsed` or `ElapsedRaw` field is what you're looking for. Selected example from my system:
$ sacct -X --allusers --format="AllocCPUS,Elapsed,ElapsedRaw,CPUTime,CPUTimeRAW"
AllocCPUS Elapsed ElapsedRaw CPUTime CPUTimeRAW
--------------------------------------------------------------------------------------------------------
64 10-00:00:29 864029 640-00:30:56 55297856
640 10-05:09:08 882548 6537-09:25:20 564830720
128 5-23:23:21 516201 764-17:48:48 66073728
...
...
...
Divide `CPUTimeRAW` by `ElapsedRaw` and you get the allocated CPUs for the job. Be careful with requested resources vs allocated resources if you have SMT enabled on your system. The values can be different depending on the options of your user's jobs.
- Sebastian
--
[University of Nevada, Reno]<http://www.unr.edu/>
Sebastian Smith
High-Performance Computing Engineer
Office of Information Technology
1664 North Virginia Street
MS 0291
work-phone: 775-682-5050<tel:7756825050>
email: stsmith at unr.edu<mailto:stsmith at unr.edu>
website: http://rc.unr.edu<http://rc.unr.edu/>
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of 肖正刚 <guru.novice at gmail.com>
Sent: Thursday, July 16, 2020 8:15 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>; Mark Hahn <hahn at mcmaster.ca>
Subject: Re: [slurm-users] how to know the real utilization of a node when oversubscribe is set to FORCE (Mark Hahn)
Hi, Hahn
I mean the elapsed time.
In the example I mentioned earlier,a job run 10s ,the elapsed time from sacct is 640s(10s*64), so how do i get the real elapsed time from sacct or other command line tools?
As you mentioned, i checked usercpu/systemcpu/totalcpu, but they all zero
Query command: sacct -T -X -S 2020-07-16T00:00:00 -E 2020-07-16T23:59:59 -r $partition --format=cputimeraw,usercpu,systemcpu,totalcpu,alloccpus,allocnodes,AllocTRES%40
CPUTimeRAW UserCPU SystemCPU TotalCPU AllocCPUS AllocNodes AllocTRES
---------- ---------- ---------- ---------- ---------- ---------- ----------------------------------------
640 00:00:00 00:00:00 00:00:00 64 1 billing=64,cpu=64,node=1
640 00:00:00 00:00:00 00:00:00 64 1 billing=64,cpu=64,node=1
640 00:00:00 00:00:00 00:00:00 64 1 billing=64,cpu=64,node=1
640 00:00:00 00:00:00 00:00:00 64 1 billing=64,cpu=64,node=1
640 00:00:00 00:00:00 00:00:00 64 1 billing=64,cpu=64,node=1
640 00:00:00 00:00:00 00:00:00 64 1 billing=64,cpu=64,node=1
640 00:00:00 00:00:00 00:00:00 64 1 billing=64,cpu=64,node=1
Jobs use 1/2/4/8/16/32/64 cores, but elapsed time are the same from sacct.
The real elasped time should be 10/20/40/80/160/320/640
regards.
----------------------------------------------------------------------
Message: 1
Date: Thu, 16 Jul 2020 11:03:07 -0400 (EDT)
From: Mark Hahn <hahn at mcmaster.ca<mailto:hahn at mcmaster.ca>>
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: Re: [slurm-users] how to know the real utilization of a node
when oversubscribe is set to FORCE
Message-ID:
<alpine.LFD.2.02.2007161036410.16842 at coffee.psychology.mcmaster.ca<mailto:alpine.LFD.2.02.2007161036410.16842 at coffee.psychology.mcmaster.ca>>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> srun -N 1 -n 1 -p testA sleep 10
> then the cpurawtime of this job recorded by slurm is 640s, but actually
> this job only used 10s;
> so, I want to know are there any way to get the real cputime used by this
> job in slurm.
if you really mean cpu time (compute-bound, not elapsed),
then don't you just want usercpu, systemcpu and totalcpu from sacct?
cputime/cputimeraw is just ncpus * elapsed.
regards,
--
operator may differ from spokesperson. hahn at mcmaster.ca<mailto:hahn at mcmaster.ca>
------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200717/023137a0/attachment.htm>
More information about the slurm-users
mailing list