[slurm-users] Are these threads actually unused?
Mike Cammilleri
mikec at stat.wisc.edu
Tue Feb 13 09:35:43 MST 2018
I should also mention that of course we are aware that R is a single threaded application - but users can be doing all sorts of things within their R scripting. In this particular case this user is using the FLARE package I believe. Often they are seeking to do embarrassingly parallel types of tasks.
-----Original Message-----
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Mike Cammilleri
Sent: Tuesday, February 13, 2018 10:31 AM
To: slurm-users at lists.schedmd.com
Subject: [slurm-users] Are these threads actually unused?
I posted a question similar to this a couple months ago regarding CPU utilization which we figured out - sometimes too many threads on one cpu creates high CPU load, and thus slower compute time because things are waiting. A more proper allocation should be set in the submit script (e.g. --cpus-per-task). We've been doing pretty good with CPU efficiency as we monitor users' allocations to make sure they're getting the most efficient resource reservations.
One thing I notice is that sometimes R has 48 threads but only one seems active. Looking at 'top' on a node that has 48 cpus:
10:27:46 up 52 days, 19:40, 1 user, load average: 11.99, 11.98, 12.03
32862 hyunseu+ 20 0 2304464 190184 8500 R 100.0 0.1 767:51.76 R 48 0
32919 hyunseu+ 20 0 2302516 186568 8484 R 100.0 0.1 767:59.15 R 48 6
32932 hyunseu+ 20 0 2303616 187688 8488 R 100.0 0.1 767:59.41 R 48 5
32947 hyunseu+ 20 0 2303508 188028 8484 R 100.0 0.1 767:59.97 R 48 7
32950 hyunseu+ 20 0 2305800 189668 8456 R 100.0 0.1 767:59.73 R 48 2
32964 hyunseu+ 20 0 2303304 187972 8484 R 100.0 0.1 767:59.70 R 48 1
32980 hyunseu+ 20 0 2303396 187284 8500 R 100.0 0.1 767:58.84 R 48 4
The far two right columns are "number of threads" and "last cpu used." Each of his R processes being launched using --array, are having 48 threads, however, the CPU utilization is a nice 100% and the load average on the node is around 12 - which is how many array jobs are running on that node (I didn't copy/paste all of his processes listed in 'top'). So, it appears that one thread is running for each R process and things are proceeding nicely - but why do we see 48 threads for each R process and are they truly unused? Would they find performance increases by correcting these to have single threads each?
I've noticed this difference with various versions of R. R installed via apt-get in /usr/bin will have many threads like this example - but the R I build for the cluster will list a single thread in 'top' unless another package or certain method causes it to do otherwise. In this case, the users is using R-3.4.3/bin/Rscript
Thanks!
mike
More information about the slurm-users
mailing list