[slurm-users] Job Step Resource Requests are Ignored
maria at rstudio.com
Tue May 5 23:47:12 UTC 2020
I'd like to set different resource limits for different steps of my job. A
sample script might look like this (e.g. job.sh):
srun --cpus-per-task=1 --mem=1 echo "Starting..."
srun --cpus-per-task=4 --mem=250 --exclusive <do something complicated>
srun --cpus-per-task=1 --mem=1 echo "Finished."
Then I would run the script from the command line using the following
command: sbatch --ntasks=1 job.sh. I have observed that while none of the
steps appear to have limited memory (which I'm pretty sure has to do with
my proctrack plugin type), the second step runs and scontrol show step
<id>.1 shows the step has having been allocated 4 CPUs, in reality the step
is only able to use 1.
I have also observed the opposite. Running the following command, I can see
that the job step is able to use all CPUs allocated to the job, rather than
the one it was allocated itself:
sbatch --ntasks=1 --cpus-per-task=2 << EOF
srun --cpus-per-task=1 <do something complicated>
My goal here is to be able to run a single job with 3 steps where the first
and last step are always executed, even if the second would not be run
because too many resources were requested.
Here is my slurm.conf, with commented out lines removed (this is just a
small test cluster with a single node on the same machine as the
NodeName=ubuntu CPUs=4 RealMemory=500 State=UNKNOWN
PartitionName=main Nodes=ubuntu Default=YES MaxTime=INFINITE State=UP
Any advice would be greatly appreciated! Thanks in advance!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users