[slurm-users] Slrum crash with --mem-per-cpu=
Park, Gisoo (gp4r)
gp4r at virginia.edu
Mon Mar 2 15:45:06 UTC 2020
Hello,
After we set MaxMemPerCPU=9000 on a partition, we are seeing Slurm crash when we submit a job with --mem-per-cpu=.
When both -n and --mem-per-cpu= were in the sbatch script,
#SBATCH -n 1
#SBATCH --mem-per-cpu=20000
it worked fine and Slurm automatically increased the number of CPU.
NumNodes=1 NumCPUs=3 NumTasks=1 CPUs/Task=3 ReqB:S:C:T=0:0:*:*
TRES=cpu=3,mem=20001M,node=1,billing=3
However, when only --mem-per-cpu= is was the sbatch script,
#SBATCH --mem-per-cpu=20000
Slurm was crashed with the error.
[2020-03-02T10:36:41.677] _slurm_rpc_submit_batch_job: JobId=5190345 InitPrio=1000000 usec=423
[2020-03-02T10:36:42.167] error: _compute_c_b_task_dist: request was for 0 tasks, setting to 1
[2020-03-02T10:36:42.167] error: cons_res: _compute_c_b_task_dist oversubscribe for job 5190345
[2020-03-02T10:36:42.167] fatal: cons_res: cpus computation error
In the log Slurm set task to 1, but failed due to oversubscribe.
Any idea how to fix this issue?
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200302/2ca1035b/attachment.htm>
More information about the slurm-users
mailing list