[slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

Davide Vanzo Davide.Vanzo at UTSouthwestern.edu
Wed Apr 8 21:57:17 UTC 2020


As I said at the beginning, I have never played with MPS, so my answer is based only on what the Slurm documentation shows.
Apparently MPS does not require NVML, hence you can avoid setting AutoDetect and instead list the GPU resources in the gres.conf file old style. That should help you to get over that fatal error without having to rebuild Slurm from sources.

--
Davide Vanzo, PhD
Computer Scientist
BioHPC – Lyda Hill Dept. of Bioinformatics
UT Southwestern Medical Center

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Robert Kudyba
Sent: Wednesday, April 8, 2020 4:50 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

EXTERNAL MAIL
> use yum install slurm20, here they show Slurm 19 but it's the same for 20

In that case you'll need to open a bug with Bright to get them to
rebuild Slurm with nvml support.

They told me they don't officially support MPS nor Slurm and to come here to get support (or pay SchedMD).

The vicious cycle continues.

Since all I want it MPS enabled from https://slurm.schedmd.com/gres.html#MPS_config_example_2
"CUDA Multi-Process Service (MPS) provides a mechanism where GPUs can be shared by multiple jobs, where each job is allocated some percentage of the GPU's resources. The total count of MPS resources available on a node should be configured in the slurm.conf file (e.g. "NodeName=tux[1-16] Gres=gpu:2,mps:200"). Several options are available for configuring MPS in the gres.conf file as listed below with examples following that:

No MPS configuration: The count of gres/mps elements defined in the slurm.conf will be evenly distributed across all GPUs configured on the node. For the example, "NodeName=tux[1-16] Gres=gpu:2,mps:200" will configure a count of 100 gres/mps resources on each of the two GPUs."

Do I even need  to edit gres.conf? Can I just leave out AutoDetect=nvml?
CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.

________________________________

UT Southwestern


Medical Center



The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200408/924c725b/attachment.htm>


More information about the slurm-users mailing list