[slurm-users] Using oversubscribe to hammer a node

Groner, Rob rug262 at psu.edu
Thu Jan 19 16:23:55 UTC 2023


I'm trying to setup a specific partition where users can fight with the OS for dominance,  The oversubscribe property sounds like what I want, as it says "More than one job can execute simultaneously on the same compute resource."  That's exactly what I want.  I've setup a node with 48 CPU and oversubscribe set to force:4.  I then execute a job that requests 48 cpus, and that starts running.  I execute another job asking for 48 cores, and it gets assigned to the node...but it is not running, it's suspended.  I can execute 2 more jobs, and they'll all go on the node (so, 4x) but 3 will be suspended at any time.  I see the time slicing going on, but that isn't what I though it would be...I thought all 4 tasks per cpu would be running at the same time.  Basically, I want the CPU/OS to work out the sharing of resources.  Otherwise, if one of the tasks that is running is just sitting there doing nothing, it's going to do that for its 30 seconds while other tasks are suspended, right?

What I want to see is 4x the nodes CPUs in tasks all running at the same time, not time slicing, just for jobs using this partition.  Is that a thing?

Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230119/a35239a2/attachment.htm>


More information about the slurm-users mailing list