[slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS
Davide.Vanzo at UTSouthwestern.edu
Wed Apr 8 21:57:17 UTC 2020
As I said at the beginning, I have never played with MPS, so my answer is based only on what the Slurm documentation shows.
Apparently MPS does not require NVML, hence you can avoid setting AutoDetect and instead list the GPU resources in the gres.conf file old style. That should help you to get over that fatal error without having to rebuild Slurm from sources.
Davide Vanzo, PhD
BioHPC – Lyda Hill Dept. of Bioinformatics
UT Southwestern Medical Center
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Robert Kudyba
Sent: Wednesday, April 8, 2020 4:50 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS
> use yum install slurm20, here they show Slurm 19 but it's the same for 20
In that case you'll need to open a bug with Bright to get them to
rebuild Slurm with nvml support.
They told me they don't officially support MPS nor Slurm and to come here to get support (or pay SchedMD).
The vicious cycle continues.
Since all I want it MPS enabled from https://slurm.schedmd.com/gres.html#MPS_config_example_2
"CUDA Multi-Process Service (MPS) provides a mechanism where GPUs can be shared by multiple jobs, where each job is allocated some percentage of the GPU's resources. The total count of MPS resources available on a node should be configured in the slurm.conf file (e.g. "NodeName=tux[1-16] Gres=gpu:2,mps:200"). Several options are available for configuring MPS in the gres.conf file as listed below with examples following that:
No MPS configuration: The count of gres/mps elements defined in the slurm.conf will be evenly distributed across all GPUs configured on the node. For the example, "NodeName=tux[1-16] Gres=gpu:2,mps:200" will configure a count of 100 gres/mps resources on each of the two GPUs."
Do I even need to edit gres.conf? Can I just leave out AutoDetect=nvml?
CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.
The future of medicine, today.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users