[slurm-users] Are these threads actually unused?

Mike Cammilleri mikec at stat.wisc.edu
Tue Feb 13 09:31:12 MST 2018


I posted a question similar to this a couple months ago regarding CPU utilization which we figured out - sometimes too many threads on one cpu creates high CPU load, and thus slower compute time because things are waiting.  A more proper allocation should be set in the submit script (e.g. --cpus-per-task). We've been doing pretty good with CPU efficiency as we monitor users' allocations to make sure they're getting the most efficient resource reservations.

One thing I notice is that sometimes R has 48 threads but only one seems active. Looking at 'top' on a node that has 48 cpus:

10:27:46 up 52 days, 19:40,  1 user,  load average: 11.99, 11.98, 12.03

32862 hyunseu+  20   0 2304464 190184   8500 R 100.0  0.1 767:51.76 R                                                                                                                                                                  48  0
32919 hyunseu+  20   0 2302516 186568   8484 R 100.0  0.1 767:59.15 R                                                                                                                                                                  48  6
32932 hyunseu+  20   0 2303616 187688   8488 R 100.0  0.1 767:59.41 R                                                                                                                                                                  48  5
32947 hyunseu+  20   0 2303508 188028   8484 R 100.0  0.1 767:59.97 R                                                                                                                                                                  48  7
32950 hyunseu+  20   0 2305800 189668   8456 R 100.0  0.1 767:59.73 R                                                                                                                                                                  48  2
32964 hyunseu+  20   0 2303304 187972   8484 R 100.0  0.1 767:59.70 R                                                                                                                                                                  48  1
32980 hyunseu+  20   0 2303396 187284   8500 R 100.0  0.1 767:58.84 R                                                                                                                                                                  48  4

The far two right columns are "number of threads" and "last cpu used." Each of his R processes being launched using --array, are having 48 threads, however, the CPU utilization is a nice 100% and the load average on the node is around 12 - which is how many array jobs are running on that node (I didn't copy/paste all of his processes listed in 'top'). So, it appears that one thread is running for each R process and things are proceeding nicely - but why do we see 48 threads for each R process and are they truly unused? Would they find performance increases by correcting these to have single threads each?

I've noticed this difference with various versions of R. R installed via apt-get in /usr/bin will have many threads like this example - but the R I build for the cluster will list a single thread in 'top' unless another package or certain method causes it to do otherwise. In this case, the users is using R-3.4.3/bin/Rscript

Thanks!
mike



More information about the slurm-users mailing list