[slurm-users] Need help with controller issues
Renfro, Michael
Renfro at tntech.edu
Tue Dec 10 21:11:52 UTC 2019
What do you get from
systemctl status slurmdbd
systemctl status slurmctld
I’m assuming at least slurmdbd isn’t running.
> On Dec 10, 2019, at 3:05 PM, Dean Schulze <dean.w.schulze at gmail.com> wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> I'm trying to set up my first slurm installation following these instructions:
>
> https://github.com/nateGeorge/slurm_gpu_ubuntu
>
> I've had to deviate a little bit because I'm using virtual machines that don't have GPUs, so I don't have a gres.conf file and in /etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last line.
>
> On my controller vm I get errors when trying to do simple commnands:
>
> $ sinfo
> slurm_load_partitions: Unable to contact slurm controller (connect failure)
>
> $ sudo sacctmgr add cluster compute-cluster
> sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to localhost:6819: Connection refused
> sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
> sacctmgr: error: Problem talking to the database: Connection refused
>
>
> Something is supposed to be running on port 6819, but netstat shows nothing using that port. What is supposed to be running on 6819?
>
> My database (Maria) is running. I can connect to it with `sudo mysql -U root`.
>
> When I boot my controller which services are supposed to be running and on which ports?
>
> Thanks.
>
More information about the slurm-users
mailing list