[slurm-users] ActiveFeatures job submission
Alexander Block
Alexander.Block at lrz.de
Tue Feb 1 09:58:07 UTC 2022
Hello experts,
I hope someone is out there having some experience with the
"ActiveFeatures" and "AvailableFeatures" in the node configuration and
can give some advise.
We have configured 4 nodes with certain features, e.g.
"NodeName=thin1 Arch=x86_64 CoresPerSocket=24
CPUAlloc=0 CPUTot=96 CPULoad=44.98
AvailableFeatures=work,scratch
ActiveFeatures=work,scratch
..."
The features are obviously filesystems mounted. Now we are going to take
away one filesystem (work) for maintenance. Therefore we wanted to take
away the feature from the nodes. We tried e.g.
# scontrol update node=thin1 ActiveFeatures="scratch"
resulting in
"NodeName=thin1 Arch=x86_64 CoresPerSocket=24
CPUAlloc=0 CPUTot=96 CPULoad=44.98
AvailableFeatures=work,scratch
ActiveFeatures=scratch
..."
The problem now is that no jobs can be SUBMITTED requesting the feature
work, the error we get is
"sbatch: error: Batch job submission failed: Requested node
configuration is not available"
Does this make sense? We want our users to submit jobs requesting
features that are available in general because maintenances usually
don't last too long and the users want to submit jobs for the time once
the feature is available again since we have rather long queuing times.
I understand that jobs might be rejected when the feature is not
available at all but not when it is not active?! Furthermore, also 4
node jobs get rejected at submission when the feature is only active on
3 nodes. Is this a bug? Wouldn't it make more sense that the job just
sits in the queue waiting for the features/resources to be activated again?
Maybe someone has an idea how to handle this problem?
Thanks,
Alexander
More information about the slurm-users
mailing list