[slurm-users] FSU & Slurm

Matt Hohmeister hohmeister at psy.fsu.edu
Wed Apr 11 13:38:30 MDT 2018


Actually, I’ve gotten it all working; I just had overlooked some things in slurm.conf.

I had previously trying to get Son of Grid Engine working, but after ripping out half my hair, I went to Slurm, since it’s what our research computing center uses. 😊

Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Sean Caron
Sent: Wednesday, April 11, 2018 3:36 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>; Sean Caron <scaron at umich.edu>
Subject: Re: [slurm-users] FSU & Slurm

Hi Matt,

As a protest to asking questions on this list and getting solicitations for pay-for support, let me give you some advice for free :)

If you look at your slurm.conf you'll see there are two directories that your slurm user and group need to have write access to.

One is whatever you configure as SlurmdSpoolDir. This needs to be available on all worker nodes that are running slurmd. Set ownership to slurm user and slurm group and mode 755.

The other is StateSaveLocation. This needs to be present just on your controller (where slurmctld runs). Again, this should have ownership of slurm user and slurm group and mode 755.

You probably want to use something more specific than just /var/spool for your StateSaveLocation

Best,

Sean


On Wed, Apr 11, 2018 at 1:48 PM, Jess Arrington <jess at schedmd.com<mailto:jess at schedmd.com>> wrote:
Hi Matt,

I hope your day is treating you well.


Thank you for your posts on the Slurm user list.


By chance, do you work with Paul Van Der Mark?


Would there be interest on your side to see a Slurm support contract for your systems at FSU?

Sites running Slurm with support give us feedback that support is invaluable and a great return back to the organization with much better system utilization with optimized configs by our experts (which pays for the support contract in and of itself) and their sites not having to rely on in-house best effort support hacks that get very expensive and turn into complicated chaos and potential down systems.


Additionally, support keeps the Slurm project alive and going strong


Please let me know your thoughts or if you would like me to reach out to another contact at FSU to chat about this further.



Take care,



Jess Arrington
Director of Sales | 801-616-7823
204 N 1200 E #203 Lehi, UT 84043<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D204-2BN-2B1200-2BE-2B-2523203-2BLehi-2C-2BUT-2B84043-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=8yB0SkVystFGD-TJ40nZcctsJi2KZKItVMPqZCkrGl0&s=rftyBrt83wE0kthyAQSLm2BY7GFZE6IA7LKFL2AOfbg&e=>

On Wed, Apr 11, 2018 at 6:26 AM, Matt Hohmeister <hohmeister at psy.fsu.edu<mailto:hohmeister at psy.fsu.edu>> wrote:
I’m brand-new to Slurm, and setting it up on a single RHEL 7.4 VM as a proof of concept before I deploy it. After following the instructions on https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slothparadise.com_how-2Dto-2Dinstall-2Dslurm-2Don-2Dcentos-2D7-2Dcluster_&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=8yB0SkVystFGD-TJ40nZcctsJi2KZKItVMPqZCkrGl0&s=_q5sO-LaJk4lnhH0SfJWMgyuoX8UBSrQ8xm09qfEKTE&e=> (sorry, site not working now), I can get slurmd to start perfectly, but slurmctld fails to start with the following journalctl -xe; I was wondering if anyone has run into this or could shed some light on this…thanks in advance!

Apr 11 08:18:30 psy-slurm polkitd[680]: Registered Authentication Agent for unix-process:1779:31362 (system bus name :1.26 [/usr/bin/pkttyagent --notify-fd 5 --fallbac
Apr 11 08:18:30 psy-slurm systemd[1]: Starting Slurm controller daemon...
-- Subject: Unit slurmctld.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_systemd-2Ddevel&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=8yB0SkVystFGD-TJ40nZcctsJi2KZKItVMPqZCkrGl0&s=IblrcsfHqVpgFyEMwN0EEP79-4O-Hu67St1xNF1e734&e=>
--
-- Unit slurmctld.service has begun starting up.
Apr 11 08:18:30 psy-slurm systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.
Apr 11 08:18:30 psy-slurm systemd[1]: Started Slurm controller daemon.
-- Subject: Unit slurmctld.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_systemd-2Ddevel&d=DwMFaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=Y7_jKRiyJUHl8NulOtnzB4UPVSMWmGk9Sds6aXi7m3U&m=8yB0SkVystFGD-TJ40nZcctsJi2KZKItVMPqZCkrGl0&s=IblrcsfHqVpgFyEMwN0EEP79-4O-Hu67St1xNF1e734&e=>
--
-- Unit slurmctld.service has finished starting up.
--
-- The start-up result is done.
Apr 11 08:18:30 psy-slurm polkitd[680]: Unregistered Authentication Agent for unix-process:1779:31362 (system bus name :1.26, object path /org/freedesktop/PolicyKit1/A
Apr 11 08:18:30 psy-slurm slurmctld[1787]: fatal: Incorrect permissions on state save loc: /var/spool
Apr 11 08:18:30 psy-slurm systemd[1]: slurmctld.service: main process exited, code=exited, status=1/FAILURE
Apr 11 08:18:30 psy-slurm systemd[1]: Unit slurmctld.service entered failed state.
Apr 11 08:18:30 psy-slurm systemd[1]: slurmctld.service failed.

Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180411/6ba62cb3/attachment-0001.html>


More information about the slurm-users mailing list