[slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS
Davide Vanzo
Davide.Vanzo at UTSouthwestern.edu
Tue Apr 7 20:33:28 UTC 2020
Robert,
That error is typically due to slurmd/slurmctld version mismatch or different configuration. I would not be surprised if you need to restart slurmd too after changing the SelectType configuration.
Also, do not forget this warning from the documentation when it comes to modifying SelectType:
Changing this value can only be done by restarting the slurmctld daemon and will result in the loss of all job information (running and pending) since the job state save format used by each plugin is different.
--
Davide Vanzo, PhD
Computer Scientist
BioHPC – Lyda Hill Dept. of Bioinformatics
UT Southwestern Medical Center
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Robert Kudyba
Sent: Tuesday, April 7, 2020 3:26 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS
EXTERNAL MAIL
Using Slurm 20.02 on CentIOS 7.7 with Bright Cluster. We changed the following options to enable MPS:
SelectType=select/cons_tres
GresTypes=gpu,mic,mps
I restarted slurmctld and ran scontrol reconfigure, however all jobs get the below error:
[2020-04-07T15:29:00.741] debug: backfill: no jobs to backfill
[2020-04-07T15:29:03.051] Resending TERMINATE_JOB request JobId=3056 Nodelist=node[001-002]
[2020-04-07T15:29:03.051] Resending TERMINATE_JOB request JobId=3061 Nodelist=node003
[2020-04-07T15:29:03.051] debug: sched: Running job scheduler
[2020-04-07T15:29:03.063] agent/is_node_resp: node:node003 RPC:REQUEST_TERMINATE_JOB : Header lengths are longer than data received
[2020-04-07T15:29:03.071] agent/is_node_resp: node:node002 RPC:REQUEST_TERMINATE_JOB : Header lengths are longer than data received
[2020-04-07T15:29:03.071] agent/is_node_resp: node:node001 RPC:REQUEST_TERMINATE_JOB : Header lengths are longer than data received
Do any other options need changing? What causes these header length errors?
CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.
________________________________
UT Southwestern
Medical Center
The future of medicine, today.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200407/0fc110eb/attachment-0001.htm>
More information about the slurm-users
mailing list