<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Open Sans";
panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
{font-family:"Segoe UI";
panose-1:2 11 5 2 4 2 4 2 2 3;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Sajesh,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">For those other users that may have run into this. I found a reason why srun cannot run interactive jobs, and it may not necessarily be related to RHEL/CentOS 7<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If one straces the slurmd one may see (see arg 3 for gid)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Open Sans",serif">chown("/dev/pts/1", 1326, 7) = -1 EPERM (Operation not permitted)<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">in my case I had (something similar)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Open Sans",serif">chown("/dev/pts/1", 1326, 0) = -1 EPERM (Operation not permitted)<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">For our site, this report was also helpful<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Segoe UI",sans-serif"><a href="https://bugs.schedmd.com/show_bug.cgi?id=8729" target="_blank" title="https://bugs.schedmd.com/show_bug.cgi?id=8729">https://bugs.schedmd.com/show_bug.cgi?id=8729</a></span><span style="font-size:10.5pt;font-family:"Segoe UI",sans-serif"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">tty was mapped to group 7 in Sajesh’s case. It (tty) should always be mapped to group 5. At our site, we had a problem with /etc/group being large and the tty group not being properly read in.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The fix for us was to resort the group file by gid, so that the tty line would fall on line 5.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hope this helps,<o:p></o:p></p>
<p class="MsoNormal">Kevin<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Sajesh Singh <ssingh@amnh.org><br>
<b>Date: </b>Wednesday, March 25, 2020 at 2:23 AM<br>
<b>To: </b>slurm-users@schedmd.com <slurm-users@schedmd.com><br>
<b>Subject: </b>[slurm-users] Cannot run interactive jobs<o:p></o:p></span></p>
</div>
<p class="MsoNormal">CentOS 7.7.1908<o:p></o:p></p>
<p class="MsoNormal">Slurm 18.08.8<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">When trying to run an interactive job I am getting the following error:<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">srun: error: task 0 launch failed: Slurmd could not connect IO<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Checking the log file on the compute node I see the following error:<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.262] launch task 13.0 request from UID:1326 GID:50000 HOST:192.168.229.254 PORT:14980<o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.262] lllp_distribution jobid [13] implicit auto binding: cores,one_thread, dist 8192<o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.262] _task_layout_lllp_cyclic<o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.262] _lllp_generate_cpu_bind jobid [13]: mask_cpu,one_thread, 0x0000000000000001<o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.262] _run_prolog: run job script took usec=5<o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.262] _run_prolog: prolog with lock for job 13 ran for 0 seconds<o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.272] [13.0] Considering each NUMA node as a socket<o:p></o:p></p>
<p class="MsoNormal"><b>[2020-03-25T01:42:08.310] [13.0] error: stdin openpty: Operation not permitted</b><o:p></o:p></p>
<p class="MsoNormal"><b>[2020-03-25T01:42:08.311] [13.0] error: IO setup failed: Operation not permitted</b><o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.311] [13.0] error: job_manager exiting abnormally, rc = 4021<o:p></o:p></p>
<p class="MsoNormal">[2020-03-25T01:42:08.315] [13.0] done with job<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">When doing the same on a CentOS 7.3 and Slurm 18.08.4 cluster the interactive job runs as expected.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Any advise on how to remedy this would be appreciated.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">-Sajesh-<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</body>
</html>