[slurm-users] Fwd: An issue with HOSTNAME env var when using salloc/srun for interactive job with Slurm 17.11.7

Fullop, Joshi fullop at lanl.gov
Thu Jul 12 11:53:02 MDT 2018


I can confirm the behavior you are reporting.  We noticed this a number of months ago as well. What is happening is the HOSTNAME variable is being exported on a front-end/login node and when the environment is being copied out for the salloc the variable is being picked up and replicated on the compute node.  This is due to the variable likely being exported rather than set.  Additionally, the /etc/profile script is NOT setting the HOSTNAME variable on the compute nodes because it is already set…

/etc/profile:
…
if test -s /etc/HOSTNAME ; then
    test -z "$HOSTNAME" && HOSTNAME=`cat /etc/HOSTNAME`
else
    test -z "$HOSTNAME" && HOSTNAME=$HOST
fi
…

I have not tracked the origin of what changed and when, but both aforementioned things have to be the way they are for this behavior to be seen.  The fix therefore can be addressed by changing one of them… 1) Make sure the hostname is not being exported. (Probably the best way) or 2) change /etc/profile to not test for the HOSTNAME variable and just set it regardless from /etc/HOSTNAME

So this is largely an OS/environment issue, and less of a slurm thing.  I am curious though as to what OS you experienced this on.

Hope this helps.

Joshi Fullop
HPC-ENV
Los Alamos National Laboratory




From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of CB
Sent: Tuesday, July 10, 2018 10:55 AM
To: slurm-users at lists.schedmd.com
Subject: [slurm-users] Fwd: An issue with HOSTNAME env var when using salloc/srun for interactive job with Slurm 17.11.7

Hi,

We've recently upgraded to Slurm 17.11.7 from 16.05.8.

We noticed that the environment variable, HOSTNAME, does not refelct the compute node with an interactive job using the salloc/srun command.
Instead it still points to the submit hostname although .SLURMD_NODENAME reflects the correct  compute node name.

$ salloc --immediate -p manycore --constraint=xeon64c --exclusive -O -N 1 --qos=high  srun --pty bash -i
salloc: Granted job allocation 2291315
salloc: Waiting for resource configuration
salloc: Nodes mc-1 are ready for job

[user1 at mc-1 test]$ echo $HOSTNAME
login-3

[user1 at mc-1 test]$ echo $SLURMD_NODENAME
nc-1

Is this a bug introduced with 17.11.x version or something that has been there before?  According to our user, it used to point the compute node name.

BTW, if I test the environment variable with a batch job, HOSTNAME environment variable reflects the compute node name correctly.

Thanks,
- Chansup
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180712/f1928217/attachment.html>


More information about the slurm-users mailing list