[slurm-users] Slurm setup question
Matt Hohmeister
hohmeister at psy.fsu.edu
Wed Apr 11 09:22:43 MDT 2018
Thanks; I just set StateSaveLocation=/var/spool/slurm.state, and that went away. Of course, another error popped up:
Apr 11 11:19:24 psy-slurm slurmctld[1772]: fatal: Invalid node names in partition slurm
Here’s the relevant section from slurm.conf; IP address changed to protect the innocent. This is a single-node cluster that I’m using just to make a working proof-of-concept.
# COMPUTE NODES
NodeName=psy-slurm NodeAddr=192.0.2.157
PartitionName=slurm Nodes= Default=YES MaxTime=INFINITE State=UP
Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Douglas Jacobsen
Sent: Wednesday, April 11, 2018 10:40 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Slurm setup question
It looks like your slurm.conf is specifying /var/spool as your Save state directory, and `fatal: Incorrect permissions on state save loc: /var/spool` indicates that SlurmUser (another configuration in slurm.conf) does not have access to write to it. It might be a good to make a directory dedicated for this purpose, e.g. /var/spool/slurm/<clustername>_state, and then make sure that the SlurmUser (usually either "slurm" or root, depending on your needs), can access that directory.
----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.nersc.gov&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=vB5kEptc0F3Hjfyf88mDCPQ_3BeootNbrc5vZ6VtPtM&s=3JKzB9CTmI7XjmapV74NhKTJ4VywZ8_8VMGsjnV5H5k&e=>
dmjacobsen at lbl.gov<mailto:dmjacobsen at lbl.gov>
------------- __o
---------- _ '\<,_
----------(_)/ (_)__________________________
On Wed, Apr 11, 2018 at 5:44 AM, Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk<mailto:Ole.H.Nielsen at fysik.dtu.dk>> wrote:
Hi Matt,
You might want to take a look at my Slurm Wiki, which focuses on CentOS/RHEL 7: https://wiki.fysik.dtu.dk/niflheim/SLURM<https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.fysik.dtu.dk_niflheim_SLURM&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=vB5kEptc0F3Hjfyf88mDCPQ_3BeootNbrc5vZ6VtPtM&s=F-iggDAdLMvraK3g3jfyopytOTXy3HGv53ym-0MQgpg&e=>. Complete instructions for Slurm installation, configuration, etc. is in the Wiki.
/Ole
On 04/11/2018 02:26 PM, Matt Hohmeister wrote:
I’m brand-new to Slurm, and setting it up on a single RHEL 7.4 VM as a proof of concept before I deploy it. After following the instructions on https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slothparadise.com_how-2Dto-2Dinstall-2Dslurm-2Don-2Dcentos-2D7-2Dcluster_&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=vB5kEptc0F3Hjfyf88mDCPQ_3BeootNbrc5vZ6VtPtM&s=bUTMEN-IG50GLAc7wvp5c7nuoJpgT_byZVnQTwG9RDw&e=> (sorry, site not working now), I can get slurmd to start perfectly, but slurmctld fails to start with the following journalctl -xe; I was wondering if anyone has run into this or could shed some light on this…thanks in advance!
Apr 11 08:18:30 psy-slurm polkitd[680]: Registered Authentication Agent for unix-process:1779:31362 (system bus name :1.26 [/usr/bin/pkttyagent --notify-fd 5 --fallbac
Apr 11 08:18:30 psy-slurm systemd[1]: Starting Slurm controller daemon...
-- Subject: Unit slurmctld.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_systemd-2Ddevel&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=vB5kEptc0F3Hjfyf88mDCPQ_3BeootNbrc5vZ6VtPtM&s=4J5mZcw-p1Dy62M54PcatiHx-_PqsYZEsCsDVhYBybE&e=>
--
-- Unit slurmctld.service has begun starting up.
Apr 11 08:18:30 psy-slurm systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.
Apr 11 08:18:30 psy-slurm systemd[1]: Started Slurm controller daemon.
-- Subject: Unit slurmctld.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_systemd-2Ddevel&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=vB5kEptc0F3Hjfyf88mDCPQ_3BeootNbrc5vZ6VtPtM&s=4J5mZcw-p1Dy62M54PcatiHx-_PqsYZEsCsDVhYBybE&e=>
--
-- Unit slurmctld.service has finished starting up.
--
-- The start-up result is done.
Apr 11 08:18:30 psy-slurm polkitd[680]: Unregistered Authentication Agent for unix-process:1779:31362 (system bus name :1.26, object path /org/freedesktop/PolicyKit1/A
Apr 11 08:18:30 psy-slurm slurmctld[1787]: fatal: Incorrect permissions on state save loc: /var/spool
Apr 11 08:18:30 psy-slurm systemd[1]: slurmctld.service: main process exited, code=exited, status=1/FAILURE
Apr 11 08:18:30 psy-slurm systemd[1]: Unit slurmctld.service entered failed state.
Apr 11 08:18:30 psy-slurm systemd[1]: slurmctld.service failed.
Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180411/ce717d3a/attachment.html>
More information about the slurm-users
mailing list