[slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

Robert Kudyba rkudyba at fordham.edu
Wed Apr 8 19:17:23 UTC 2020


>
> > and the NVIDIA Management Library (NVML) is installed on the node and
>> > was found during Slurm configuration
>>
>> That's the key phrase - when whoever compiled Slurm ran ./configure
>> *before* compilation it was on a system without the nvidia libraries and
>> headers present, so Slurm could not compile that support in.
>>
>> You'll need to redo the build on a system with the nvidia libraries and
>> headers in order for this to work.
>
>
As I wrote we use Bright Cluster on CentOS 7.7. So we just follow their
instructions
<https://support.brightcomputing.com/manuals/8.2/admin-manual.pdf#subsection.7.5.1>
to
use yum install slurm20, here they show Slurm 19 but it's the same for 20:
Example
[root at bright82 ~]# rpm -qa | grep slurm | xargs -p rpm -e
[root at bright82 ~]# rpm -qa -r /cm/images/default-image |grep slurm |xargs
-p rpm -r /cm/images/default-image -e
[root at bright82 ~]# yum install slurm19-client slurm19-slurmdbd
slurm19-perlapi slurm19-contribs slurm19
[root at bright82 ~]# yum install --installroot=/cm/images/default-image
slurm19-client
If either slurm or slurm19 is installed, then the administrator can run
wlm-setup using the workload manager name slurm—that is without the 19
suffix–to set up Slurm. The roles at node level, or
category level—slurmserver and slurmclient—work with either Slurm version.
Configuring Slurm
After package setup is done with wlm-setup (section 7.3), Slurm software
components are installed in /cm/shared/apps/slurm/current.
Slurm clients and servers can be configured to some extent via role
assignment (sections 7.4.1 and 7.4.2). Using cmsh, advanced option
parameters can be set under the slurmclient role:
For example, the number of cores per socket can be set:
Example
[bright82->category[default]->roles[slurmclient]]% set corespersocket 2
[bright82->category*[default*]->roles*[slurmclient*]]% commit
In order to configure generic resources, the genericresources mode can be
used to set a list of objects. Each object then represents one generic
resource available on nodes. Each value of name in genericresources must
already be defined in the list of GresTypes. The list of GresTypes is
defined in the slurmserver role. Several generic resources entries can have
the same value for name (for example gpu), but must have a unique alias.
The alias is a string that is used to manage the resource entry in cmsh or
in Bright View. The string is enclosed in square brackets in cmsh, and is
used instead of the name for the object. The alias does not affect Slurm
configuration.

For example, to add two GPUs for all the nodes in the default category
which are of type k20xm, and to assign them to different CPU cores, the
following cmsh commands can be run:
Example
[bright82]% category use default
[bright82->category[default]]% roles
[bright82->category[default]->roles]% use slurmclient
[...[slurmclient]]% genericresources
[...[slurmclient]->genericresources]% add gpu0
[...[slurmclient*]->genericresources*[gpu0*]]% set name gpu
[...[slurmclient*]->genericresources*[gpu0*]]% set file /dev/nvidia0
[...[slurmclient*]->genericresources*[gpu0*]]% set cores 0-7
[...[slurmclient*]->genericresources*[gpu0*]]% set type k20xm
[...[slurmclient*]->genericresources*[gpu0*]]% add gpu1
[...[slurmclient*]->genericresources*[gpu1*]]% set name gpu
[...[slurmclient*]->genericresources*[gpu1*]]% set file /dev/nvidia1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200408/0e58927b/attachment.htm>


More information about the slurm-users mailing list