[slurm-users] ActiveFeatures job submission

Alexander Block Alexander.Block at lrz.de
Tue Feb 1 09:58:07 UTC 2022


Hello experts,

I hope someone is out there having some experience with the 
"ActiveFeatures" and "AvailableFeatures" in the node configuration and 
can give some advise.

We have configured 4 nodes with certain features, e.g.

"NodeName=thin1 Arch=x86_64 CoresPerSocket=24
    CPUAlloc=0 CPUTot=96 CPULoad=44.98
    AvailableFeatures=work,scratch
    ActiveFeatures=work,scratch

..."

The features are obviously filesystems mounted. Now we are going to take 
away one filesystem (work) for maintenance. Therefore we wanted to take 
away the feature from the nodes. We tried e.g.

# scontrol update node=thin1 ActiveFeatures="scratch"

resulting in

"NodeName=thin1 Arch=x86_64 CoresPerSocket=24
    CPUAlloc=0 CPUTot=96 CPULoad=44.98
    AvailableFeatures=work,scratch
    ActiveFeatures=scratch

..."

The problem now is that no jobs can be SUBMITTED requesting the feature 
work, the error we get is

"sbatch: error: Batch job submission failed: Requested node 
configuration is not available"


Does this make sense? We want our users to submit jobs requesting 
features that are available in general because maintenances usually 
don't last too long and the users want to submit jobs for the time once 
the feature is available again since we have rather long queuing times. 
I understand that jobs might be rejected when the feature is not 
available at all but not when it is not active?! Furthermore, also 4 
node jobs get rejected at submission when the feature is only active on 
3 nodes. Is this a bug? Wouldn't it make more sense that the job just 
sits in the queue waiting for the features/resources to be activated again?

Maybe someone has an idea how to handle this problem?

Thanks,

Alexander








More information about the slurm-users mailing list