[slurm-users] Using oversubscribe to hammer a node
Loris Bennett
loris.bennett at fu-berlin.de
Fri Jan 20 06:48:09 UTC 2023
Hi Rob,
"Groner, Rob" <rug262 at psu.edu> writes:
> I'm trying to setup a specific partition where users can fight with the OS for dominance, The oversubscribe property sounds like what I want, as it says
> "More than one job can execute simultaneously on the same compute resource." That's exactly what I want. I've setup a node with 48 CPU and
> oversubscribe set to force:4. I then execute a job that requests 48 cpus, and that starts running. I execute another job asking for 48 cores, and it gets
> assigned to the node...but it is not running, it's suspended. I can execute 2 more jobs, and they'll all go on the node (so, 4x) but 3 will be suspended at
> any time. I see the time slicing going on, but that isn't what I though it would be...I thought all 4 tasks per cpu would be running at the same time.
> Basically, I want the CPU/OS to work out the sharing of resources. Otherwise, if one of the tasks that is running is just sitting there doing nothing, it's
> going to do that for its 30 seconds while other tasks are suspended, right?
Is --oversubscribe set for the jobs?
> What I want to see is 4x the nodes CPUs in tasks all running at the same time, not time slicing, just for jobs using this partition. Is that a thing?
It might be thing. I'm not sure it is a very sensible thing. Time
slicing and context switching is still going to take place, with each
process getting a quarter of a core on average. It is not clear that
you will actually increase throughput this way. I would probably first
turn on hyperthreading to deal with jobs which have intermittent
CPU-usage.
Still, since Slurm offers the possibility of oversubscription, I assume
there must be a use-case.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
More information about the slurm-users
mailing list