[slurm-users] Running multiple jobs simultaneously

Matt Jay mattjay at uw.edu
Thu Sep 26 21:15:01 UTC 2019


Matt,

Depending on other parameters for the job, your '--ntasks=30' is likely having the effect of requesting 30 (or more) cores for that individual job, which likely is not "fitting" on an individual node (oversubscribe allows multiple jobs to share a resource, but doesn't impact resource request/requirements for an individual job).

The best approach will depend on the particulars of the job itself, but setting "--ntasks-per-core" in conjunction with the "--ntasks=30" would be one way to allow a job with more tasks than the core count on any of your nodes to run.

Matt Jay
HPC Systems Engineer - Hyak
Research Computing
University of Washington Information Technology


From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Matt Hohmeister
Sent: Thursday, September 26, 2019 1:56 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Running multiple jobs simultaneously

I just did that...beautiful...thanks! The "default" let me run 48 jobs concurrently across two nodes.

I've noticed that, still, when I have "#SBATCH --ntasks=30" in my .sbatch file, the job still refuses to run, and I'm back at the below. Should I just ask my users to not use -ntasks in their .sbatch files?


[mhohmeis at odin<https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users> ~]$ squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

     2052_[70-100]     debug whatever mhohmeis PD       0:00      4 (PartitionConfig)

Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739
Pronouns: he/him/his

From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> On Behalf Of Matt Jay
Sent: Thursday, September 26, 2019 4:34 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: Re: [slurm-users] Running multiple jobs simultaneously

Hi Matt,

Check out the "OverSubscribe" partition parameter.  Try setting your partition to "OverSubscribe=YES" and then submitting the jobs with the "-oversubscibe" option (or OverSubscribe=FORCE if you want this to happen for all jobs submitted to the partition).   Either oversubscribe option can be followed by a colon and the maximum number of jobs that can be assigned to a resource (iirc it defaults to 4 - so you might want to increase to allow the number of jobs you need - ie, maximum number of jobs you need to run simultaneously divided by number of cores available in the partition).

Matt Jay
HPC Systems Engineer - Hyak
Research Computing
University of Washington Information Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190926/5a936943/attachment.htm>


More information about the slurm-users mailing list