[slurm-users] Need help with controller issues

Dean Schulze dean.w.schulze at gmail.com
Tue Dec 10 21:20:29 UTC 2019


$ systemctl status slurmdbd
● slurmdbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor
preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-12-10 13:33:28 MST;
40min ago
  Process: 787 ExecStart=/usr/sbin/slurmdbd $SLURMDBD_OPTIONS (code=exited,
status=0/SUCCESS)
 Main PID: 791 (code=exited, status=1/FAILURE)

Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Starting Slurm DBD
accounting daemon...
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Started Slurm DBD
accounting daemon.
Dec 10 13:33:28 ubuntu-controller.liqid.com slurmdbd[791]: fatal: Unable to
initialize accounting_storage/mysql accounting storage plugin
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmdbd.service:
Main process exited, code=exited, status=1/FAILURE
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmdbd.service:
Failed with result 'exit-code'.
$ systemctl status slurmctld
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor
preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-12-10 13:33:28 MST;
41min ago
  Process: 788 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
(code=exited, status=0/SUCCESS)
 Main PID: 796 (code=exited, status=1/FAILURE)

Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Starting Slurm
controller daemon...
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Started Slurm
controller daemon.
Dec 10 13:33:28 ubuntu-controller.liqid.com slurmctld[796]: fatal: You are
running with a database but for some reason we have no TRES from it.  Th
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmctld.service:
Main process exited, code=exited, status=1/FAILURE
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmctld.service:
Failed with result 'exit-code'.
$

One issue is with a database plugin.  During database setup this command
failed:

sudo systemctl enable mysql

I did this instead

sudo systemctl enable mariadb.service

Maybe there is some config that has to be modified to use maria instead  of
mysql?


On Tue, Dec 10, 2019 at 2:13 PM Renfro, Michael <Renfro at tntech.edu> wrote:

> What do you get from
>
> systemctl status slurmdbd
> systemctl status slurmctld
>
> I’m assuming at least slurmdbd isn’t running.
>
> > On Dec 10, 2019, at 3:05 PM, Dean Schulze <dean.w.schulze at gmail.com>
> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > I'm trying to set up my first slurm installation following these
> instructions:
> >
> > https://github.com/nateGeorge/slurm_gpu_ubuntu
> >
> > I've had to deviate a little bit because I'm using virtual machines that
> don't have GPUs, so I don't have a gres.conf file and in
> /etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last
> line.
> >
> > On my controller vm I get errors when trying to do simple commnands:
> >
> > $ sinfo
> > slurm_load_partitions: Unable to contact slurm controller (connect
> failure)
> >
> > $ sudo sacctmgr add cluster compute-cluster
> > sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
> persistent connection to localhost:6819: Connection refused
> > sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
> > sacctmgr: error: Problem talking to the database: Connection refused
> >
> >
> > Something is supposed to be running on port 6819, but netstat shows
> nothing using that port.  What is supposed to be running on 6819?
> >
> > My database (Maria) is running.  I can connect to it with `sudo mysql -U
> root`.
> >
> > When I boot my controller which services are supposed to be running and
> on which ports?
> >
> > Thanks.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191210/149b569e/attachment-0001.htm>


More information about the slurm-users mailing list