[slurm-users] Slrum crash with --mem-per-cpu=

Park, Gisoo (gp4r) gp4r at virginia.edu
Mon Mar 2 15:45:06 UTC 2020


After we set MaxMemPerCPU=9000 on a partition, we are seeing Slurm crash when we submit a job with --mem-per-cpu=.

When both -n and --mem-per-cpu= were in the sbatch script,
#SBATCH -n 1
#SBATCH --mem-per-cpu=20000

it worked fine and Slurm automatically increased the number of CPU.
NumNodes=1 NumCPUs=3 NumTasks=1 CPUs/Task=3 ReqB:S:C:T=0:0:*:*

However, when only --mem-per-cpu= is was the sbatch script,
#SBATCH --mem-per-cpu=20000

Slurm was crashed with the error.
[2020-03-02T10:36:41.677] _slurm_rpc_submit_batch_job: JobId=5190345 InitPrio=1000000 usec=423
[2020-03-02T10:36:42.167] error: _compute_c_b_task_dist: request was for 0 tasks, setting to 1
[2020-03-02T10:36:42.167] error: cons_res: _compute_c_b_task_dist oversubscribe for job 5190345
[2020-03-02T10:36:42.167] fatal: cons_res: cpus computation error

In the log Slurm set task to 1, but failed due to oversubscribe.

Any idea how to fix this issue?


