Hello,
What I’m looking for is a way for a node to continue to be in the same partition, and have the same QoS(es), but only be chosen if a particular capability is being asked for. This is because we are rolling something (OS upgrade) out slowly to a small batch of nodes at first, and then more and more over time, and do not want to interrupt users’ workflows: we want them to default the ‘current’ nodes and only land on the ‘special’ ones if requested. (At a certain point the ‘special’ ones will become the majority and we’d swap the behaviour.)
Slurm has the well-known feature item that can be put on a node(s):
A comma-delimited list of arbitrary strings indicative of some characteristic associated with the node. There is no value or count associated with a feature at this time, a node either has a feature or it does not. A desired feature may contain a numeric component indicating, for example, processor speed but this numeric component will be considered to be part of the feature string. Features are intended to be used to filter nodes eligible to run jobs via the --constraintargument. By default a node has no features. Also see Gres for being able to have more control such as types and count. Using features is faster than scheduling against GRES but is limited to Boolean operations.
https://slurm.schedmd.com/slurm.conf.html#OPT_Features
So if there are (a bunch of) partitions, and nodes with-in those partitions, a job can be submitted to a partition and it can be run any any available node, or even be requested to run a particular node (--nodelist). With the above (and --constraint / --prefer), a particular sub-set of node(s) can be requested. But (AIUI) that sub-set is also available generally to everyone, even if a particular feature is not requested.
Is there a way to tell Slurm to not schedule a job on a node UNLESS a flag or option is set? Or is it necessary to set up new partition(s) or QoS(es)? I see that AllowAccounts (and AllowGroups) is applicable only to Partitions, and not (AFAICT) on a per node basis.
We’re currently on 22.05.x, but upgrading is fine.
Regards, David
We've done this though with job_submit.lua. Mostly with OS updates. We add a feature to everything then proceed. Telling users that adding a feature gets you on the "new" nodes.
I can send you the snippet if you're using the job_submit.lua script.
Bill
On 6/14/24 2:18 PM, David Magda via slurm-users wrote:
Hello,
What I’m looking for is a way for a node to continue to be in the same partition, and have the same QoS(es), but only be chosen if a particular capability is being asked for. This is because we are rolling something (OS upgrade) out slowly to a small batch of nodes at first, and then more and more over time, and do not want to interrupt users’ workflows: we want them to default the ‘current’ nodes and only land on the ‘special’ ones if requested. (At a certain point the ‘special’ ones will become the majority and we’d swap the behaviour.)
Slurm has the well-known feature item that can be put on a node(s):
A comma-delimited list of arbitrary strings indicative of some characteristic associated with the node. There is no value or count associated with a feature at this time, a node either has a feature or it does not. A desired feature may contain a numeric component indicating, for example, processor speed but this numeric component will be considered to be part of the feature string. Features are intended to be used to filter nodes eligible to run jobs via the --constraintargument. By default a node has no features. Also see Gres for being able to have more control such as types and count. Using features is faster than scheduling against GRES but is limited to Boolean operations.
https://slurm.schedmd.com/slurm.conf.html#OPT_Features
So if there are (a bunch of) partitions, and nodes with-in those partitions, a job can be submitted to a partition and it can be run any any available node, or even be requested to run a particular node (--nodelist). With the above (and --constraint / --prefer), a particular sub-set of node(s) can be requested. But (AIUI) that sub-set is also available generally to everyone, even if a particular feature is not requested.
Is there a way to tell Slurm to not schedule a job on a node UNLESS a flag or option is set? Or is it necessary to set up new partition(s) or QoS(es)? I see that AllowAccounts (and AllowGroups) is applicable only to Partitions, and not (AFAICT) on a per node basis.
We’re currently on 22.05.x, but upgrading is fine.
Regards, David
I wrote a job_submit.lua also. It would append "¢os79" to the feature string unless the features already contained "el9," or if empty, set the features string to "centos79" without the ampersand. I didn't hear from any users doing anything fancy enough with their feature string for the ampersand to cause a problem.
We did something like this in the past but from C. However, modifying the features was painful if the user did any interesting syntax.
What we are doing now is using --extra for that purpose. The nodes boot up with SLURMD_OPTIONS="--extra {\"os\":\"rhel9\"}" or similar. Users can request --extra=os=rhel9 or whatever if they want to submit across OS versions for some weird reason.
Handling defaults is problematic because there is no way to set a default --extra for people. We had some things working to set an environment variable on the nodes that gets passed by sbatch, et al. and then read it from the submit plugin. We would then set the --extra in the job submit plugin. The problem is that salloc and srun behave differently and you can't access the environment.
Instead, we are now looking up the alloc_node in the plugin and reading its `extra` directly. Here's what the relevant parts look like: static void _set_extra_from_alloc_node(job_desc_msg_t *job_desc) { node_record_t *node_ptr = find_node_record(job_desc->alloc_node); char *default_str = "os=rhel7";
if (node_ptr == NULL) { job_desc->extra = xstrdup(default_str); info("WARNING: _set_extra_from_alloc_node: node %s not found. Setting job to default '%s'", job_desc->alloc_node, default_str); } else { if (!xstrcmp(node_ptr->extra, "{"os":"rhel7"}")) { job_desc->extra = xstrdup("os=rhel7"); } else if (!xstrcmp(node_ptr->extra, "{"os":"rhel9"}")) { job_desc->extra = xstrdup("os=rhel9"); } else { job_desc->extra = xstrdup(default_str); info("WARNING: _set_extra_from_alloc_node: node %s returned extra of '%s' which did not match known values. Setting job to default '%s'", job_desc->alloc_node, node_ptr->extra, default_str); } } }
...
if (!job_desc->extra) { _set_extra_from_alloc_node(job_desc); }
I don't know if you can do it in lua. The easiest way to do this would be if there was an environment variable for a default --extra, but there isn't currently. I've been meaning to ask SchedMD about that but haven't done so yet.
By the way, the nice thing about --extra is that there's no juggling of features in config files. Whatever OS it boots up in, that's what ends up in the extra field. We have a script that populates the relevant file before Slurm boots.
On 6/14/24 12:33, Laura Hild via slurm-users wrote:
I wrote a job_submit.lua also. It would append "¢os79" to the feature string unless the features already contained "el9," or if empty, set the features string to "centos79" without the ampersand. I didn't hear from any users doing anything fancy enough with their feature string for the ampersand to cause a problem.
This functionality in slurmd was added in August 2023, so not in the version we’re currently running:
https://github.com/SchedMD/slurm/commit/0daa1fda97c125c0b1c48cbdcdeaf1382ed7...
Perhaps something for the future. Currently looking like the job_submit.lua is the best candidate.
On Jun 14, 2024, at 15:37, Ryan Cox via slurm-users slurm-users@lists.schedmd.com wrote:
What we are doing now is using --extra for that purpose. The nodes boot up with SLURMD_OPTIONS="--extra {\"os\":\"rhel9\"}" or similar. Users can request --extra=os=rhel9 or whatever if they want to submit across OS versions for some weird reason.
Could you post that snippet?
On Jun 14, 2024, at 14:33, Laura Hild via slurm-users slurm-users@lists.schedmd.com wrote:
I wrote a job_submit.lua also. It would append "¢os79" to the feature string unless the features already contained "el9," or if empty, set the features string to "centos79" without the ampersand. I didn't hear from any users doing anything fancy enough with their feature string for the ampersand to cause a problem.
Could you post that snippet?
function slurm_job_submit ( job_desc, part_list, submit_uid ) if job_desc.features then if not string.find(job_desc.features,"el9") then job_desc.features = job_desc.features .. '¢os79' end else job_desc.features = "centos79" end return slurm.SUCCESS end