manually running it through sudo slurmdbd -D /path/to/conf is very quick on
my fresh install
trying to start the slurmdbd through systemctl take 3 minutes and then
crashes and fail
Is there an alternative to systemctl to start the slurmdbd in the
background ?
But most importantly I wanted to know why it takes so long through
systemctl. Maybe I can increase the timeout limit ?
On Thu, May 30, 2024 at 11:54 PM Ryan Novosielski <novosirj(a)rutgers.edu>
wrote:
> It may take longer to start than systemd allows for. How long does it take
> to start from the command line? It’s common to need to run it manually for
> upgrades to complete.
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS, |---------------------------*O*---------------------------
> ||_// the State | Ryan Novosielski - novosirj(a)rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> || \\ of NJ | Office of Advanced Research Computing - MSB
> A555B, Newark
> `'
>
> On May 30, 2024, at 20:24, Radhouane Aniba via slurm-users <
> slurm-users(a)lists.schedmd.com> wrote:
>
> Ok I made some progress here.
>
> I removed and purged slurmdbd mysql mariadb etc .. and started from
> scratch.
> I added the recommended mysqld requirements
>
> Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything
> worked well
>
> When I tried to start the service sudo systemctl start slurmdbd.service
> it didnt work
>
> sudo systemctl status slurmdbd.service
> ● slurmdbd.service - Slurm DBD accounting daemon
> Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor
> preset: enabled)
> Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC;
> 2min 5s ago
> Process: 6258 ExecStart=/usr/sbin/slurmdbd -D
> /etc/slurm-llnl/slurmdbd.conf (code=exited, status=0/SUCCESS)
>
> May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting
> daemon...
> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start operation
> timed out. Terminating.
> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with
> result 'timeout'.
> May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD
> accounting daemon.
>
> Even though it is the same command ?!
>
> Any idea ?
>
>
> On Thu, May 30, 2024 at 5:02 PM Radhouane Aniba <aradwen(a)gmail.com> wrote:
>
>> Thank you Ahmet and Brian,
>>
>> Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all
>> the cnf files for mysql and I cannot find the data it is displaying here
>>
>> slurmdbd: debug2: Attempting to connect to localhost:3306
>> slurmdbd: debug2: innodb_buffer_pool_size: 134217728
>> slurmdbd: debug2: innodb_log_file_size: 50331648
>> slurmdbd: debug2: innodb_lock_wait_timeout: 50
>> slurmdbd: error: Database settings not recommended values:
>> innodb_buffer_pool_size innodb_lock_wait_timeout
>>
>>
>> sudo tree /etc/mysql/*
>> /etc/mysql/conf.d
>> ├── mysql.cnf
>> └── mysqldump.cnf
>> /etc/mysql/debian.cnf
>> /etc/mysql/debian-start
>> /etc/mysql/FROZEN
>> /etc/mysql/mariadb.cnf
>> /etc/mysql/mariadb.conf.d
>> ├── 50-client.cnf
>> ├── 50-mysql-clients.cnf
>> ├── 50-mysqld_safe.cnf
>> └── 50-server.cnf
>> /etc/mysql/my.cnf
>> /etc/mysql/my.cnf.fallback
>> /etc/mysql/mysql.cnf
>> /etc/mysql/mysql.conf.d
>> ├── mysql.cnf
>> └── mysqld.cnf
>>
>> On Thu, May 30, 2024 at 12:21 PM Brian Andrus via slurm-users <
>> slurm-users(a)lists.schedmd.com> wrote:
>>
>>> That SIGTERM message means something is telling slurmdbd to quit.
>>>
>>> Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told
>>> to shutdown. If you are running in the foreground, a ^C does that. If you
>>> run a kill or killall on it, you will get that same message.
>>>
>>> Brian Andrus
>>> On 5/30/2024 6:53 AM, Radhouane Aniba via slurm-users wrote:
>>>
>>> Yes I can connect to my database using mysql --user=slurm
>>> --password=slurmdbpass slurm_acct_db and there is no firewall blocking
>>> mysql after checking the firewall question
>>>
>>> ALso here is the output of slurmdbd -D -vvv (note I can only run this as
>>> sudo )
>>>
>>> sudo slurmdbd -D -vvv
>>> slurmdbd: debug: Log file re-opened
>>> slurmdbd: debug: Munge authentication plugin loaded
>>> slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
>>> slurmdbd: debug2: Attempting to connect to localhost:3306
>>> slurmdbd: debug2: innodb_buffer_pool_size: 134217728
>>> slurmdbd: debug2: innodb_log_file_size: 50331648
>>> slurmdbd: debug2: innodb_lock_wait_timeout: 50
>>> slurmdbd: error: Database settings not recommended values:
>>> innodb_buffer_pool_size innodb_lock_wait_timeout
>>> slurmdbd: Accounting storage MYSQL plugin loaded
>>> slurmdbd: debug2: ArchiveDir = /tmp
>>> slurmdbd: debug2: ArchiveScript = (null)
>>> slurmdbd: debug2: AuthAltTypes = (null)
>>> slurmdbd: debug2: AuthInfo = (null)
>>> slurmdbd: debug2: AuthType = auth/munge
>>> slurmdbd: debug2: CommitDelay = 0
>>> slurmdbd: debug2: DbdAddr = localhost
>>> slurmdbd: debug2: DbdBackupHost = (null)
>>> slurmdbd: debug2: DbdHost = hannibal-hn
>>> slurmdbd: debug2: DbdPort = 7032
>>> slurmdbd: debug2: DebugFlags = (null)
>>> slurmdbd: debug2: DebugLevel = 6
>>> slurmdbd: debug2: DebugLevelSyslog = 10
>>> slurmdbd: debug2: DefaultQOS = (null)
>>> slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
>>> slurmdbd: debug2: MessageTimeout = 100
>>> slurmdbd: debug2: Parameters = (null)
>>> slurmdbd: debug2: PidFile = /run/slurmdbd.pid
>>> slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
>>> slurmdbd: debug2: PrivateData = none
>>> slurmdbd: debug2: PurgeEventAfter = 1 months*
>>> slurmdbd: debug2: PurgeJobAfter = 12 months*
>>> slurmdbd: debug2: PurgeResvAfter = 1 months*
>>> slurmdbd: debug2: PurgeStepAfter = 1 months
>>> slurmdbd: debug2: PurgeSuspendAfter = 1 months
>>> slurmdbd: debug2: PurgeTXNAfter = 12 months
>>> slurmdbd: debug2: PurgeUsageAfter = 24 months
>>> slurmdbd: debug2: SlurmUser = root(0)
>>> slurmdbd: debug2: StorageBackupHost = (null)
>>> slurmdbd: debug2: StorageHost = localhost
>>> slurmdbd: debug2: StorageLoc = slurm_acct_db
>>> slurmdbd: debug2: StoragePort = 3306
>>> slurmdbd: debug2: StorageType = accounting_storage/mysql
>>> slurmdbd: debug2: StorageUser = slurm
>>> slurmdbd: debug2: TCPTimeout = 2
>>> slurmdbd: debug2: TrackWCKey = 0
>>> slurmdbd: debug2: TrackSlurmctldDown= 0
>>> slurmdbd: debug2: acct_storage_p_get_connection: request new connection
>>> 1
>>> slurmdbd: debug2: Attempting to connect to localhost:3306
>>> slurmdbd: slurmdbd version 19.05.5 started
>>> slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024
>>> slurmdbd: debug2: Everything rolled up
>>>
>>>
>>> It goes like this for some time and then it crashes with this message
>>>
>>> slurmdbd: Terminate signal (SIGINT or SIGTERM) received
>>> slurmdbd: debug: rpc_mgr shutting down
>>>
>>>
>>> On Thu, May 30, 2024 at 8:18 AM mercan <ahmet.mercan(a)uhem.itu.edu.tr>
>>> wrote:
>>>
>>>> Did you try to connect database using mysql command?
>>>>
>>>> mysql --user=slurm --password=slurmdbpass slurm_acct_db
>>>>
>>>> C. Ahmet Mercan
>>>>
>>>> On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:
>>>>
>>>> Thank you Ahmet,
>>>> I dont have a firewall active.
>>>> And because slurmdbd cannot connect to the database I am not able to
>>>> getting it to be activated through systemctl I will share the output for
>>>> slurmdbd -D -vvv shortly but overall it is always saying trying to connect
>>>> to the db and then retries a couple of times and crashes
>>>>
>>>> R.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 30, 2024 at 2:51 AM mercan <ahmet.mercan(a)uhem.itu.edu.tr>
>>>> wrote:
>>>>
>>>>> Hi;
>>>>>
>>>>> Did you check can you connect db with your conf parameters from
>>>>> head-node:
>>>>>
>>>>> mysql --user=slurm --password=slurmdbpass slurm_acct_db
>>>>>
>>>>> Also, check and stop firewall and selinux, if they are running.
>>>>>
>>>>> Last, you can stop slurmdbd, then run run terminal with:
>>>>>
>>>>> slurmdbd -D -vvv
>>>>>
>>>>> Regards;
>>>>>
>>>>> C. Ahmet Mercan
>>>>>
>>>>> On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote:
>>>>>
>>>>> Hi everyone
>>>>> I am trying to get slurmdbd to run on my local home server but I am
>>>>> really struggling.
>>>>> Note : am a novice slurm user
>>>>> my slurmdbd always times out even though all the details in the conf
>>>>> file are correct
>>>>>
>>>>> My log looks like this
>>>>>
>>>>> [2024-05-29T20:51:30.088] Accounting storage MYSQL plugin loaded
>>>>> [2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
>>>>> [2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
>>>>> [2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
>>>>> [2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
>>>>> [2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
>>>>> [2024-05-29T20:51:30.088] debug2: CommitDelay = 0
>>>>> [2024-05-29T20:51:30.088] debug2: DbdAddr = localhost
>>>>> [2024-05-29T20:51:30.088] debug2: DbdBackupHost = (null)
>>>>> [2024-05-29T20:51:30.088] debug2: DbdHost = head-node
>>>>> [2024-05-29T20:51:30.088] debug2: DbdPort = 7032
>>>>> [2024-05-29T20:51:30.088] debug2: DebugFlags = (null)
>>>>> [2024-05-29T20:51:30.088] debug2: DebugLevel = 6
>>>>> [2024-05-29T20:51:30.088] debug2: DebugLevelSyslog = 10
>>>>> [2024-05-29T20:51:30.088] debug2: DefaultQOS = (null)
>>>>> [2024-05-29T20:51:30.088] debug2: LogFile = /var/log/slurmdbd.log
>>>>> [2024-05-29T20:51:30.088] debug2: MessageTimeout = 100
>>>>> [2024-05-29T20:51:30.088] debug2: Parameters = (null)
>>>>> [2024-05-29T20:51:30.088] debug2: PidFile = /run/slurmdbd.pid
>>>>> [2024-05-29T20:51:30.088] debug2: PluginDir =
>>>>> /usr/lib/x86_64-linux-gnu/slurm-wlm
>>>>> [2024-05-29T20:51:30.088] debug2: PrivateData = none
>>>>> [2024-05-29T20:51:30.088] debug2: PurgeEventAfter = 1 months*
>>>>> [2024-05-29T20:51:30.088] debug2: PurgeJobAfter = 12 months*
>>>>> [2024-05-29T20:51:30.088] debug2: PurgeResvAfter = 1 months*
>>>>> [2024-05-29T20:51:30.088] debug2: PurgeStepAfter = 1 months
>>>>> [2024-05-29T20:51:30.088] debug2: PurgeSuspendAfter = 1 months
>>>>> [2024-05-29T20:51:30.088] debug2: PurgeTXNAfter = 12 months
>>>>> [2024-05-29T20:51:30.088] debug2: PurgeUsageAfter = 24 months
>>>>> [2024-05-29T20:51:30.088] debug2: SlurmUser = root(0)
>>>>> [2024-05-29T20:51:30.089] debug2: StorageBackupHost = (null)
>>>>> [2024-05-29T20:51:30.089] debug2: StorageHost = localhost
>>>>> [2024-05-29T20:51:30.089] debug2: StorageLoc = slurm_acct_db
>>>>> [2024-05-29T20:51:30.089] debug2: StoragePort = 3306
>>>>> [2024-05-29T20:51:30.089] debug2: StorageType =
>>>>> accounting_storage/mysql
>>>>> [2024-05-29T20:51:30.089] debug2: StorageUser = slurm
>>>>> [2024-05-29T20:51:30.089] debug2: TCPTimeout = 2
>>>>> [2024-05-29T20:51:30.089] debug2: TrackWCKey = 0
>>>>> [2024-05-29T20:51:30.089] debug2: TrackSlurmctldDown= 0
>>>>> [2024-05-29T20:51:30.089] debug2: acct_storage_p_get_connection:
>>>>> request new connection 1
>>>>> [2024-05-29T20:51:30.089] debug2: Attempting to connect to
>>>>> localhost:3306
>>>>> [2024-05-29T20:51:30.090] slurmdbd version 19.05.5 started
>>>>> [2024-05-29T20:51:30.090] debug2: running rollup at Wed May 29
>>>>> 20:51:30 2024
>>>>> [2024-05-29T20:51:30.091] debug2: Everything rolled up
>>>>> [2024-05-29T20:51:49.673] Terminate signal (SIGINT or SIGTERM)
>>>>> received
>>>>> [2024-05-29T20:51:49.673] debug: rpc_mgr shutting down
>>>>>
>>>>>
>>>>>
>>>>> my config file looks like this
>>>>>
>>>>> ArchiveEvents=yes
>>>>> ArchiveJobs=yes
>>>>> ArchiveResvs=yes
>>>>> ArchiveSteps=no
>>>>> ArchiveSuspend=no
>>>>> ArchiveTXN=no
>>>>> ArchiveUsage=no
>>>>> PurgeEventAfter=1month
>>>>> PurgeJobAfter=12month
>>>>> PurgeResvAfter=1month
>>>>> PurgeStepAfter=1month
>>>>> PurgeSuspendAfter=1month
>>>>> PurgeTXNAfter=12month
>>>>> PurgeUsageAfter=24month
>>>>> # Authentication info
>>>>> AuthType=auth/munge
>>>>> # slurmDBD info
>>>>> DbdAddr=localhost
>>>>> DbdHost=head-node
>>>>> DbdPort=7032
>>>>> SlurmUser=root
>>>>> MessageTimeout=100
>>>>> DebugLevel=5
>>>>> #DefaultQOS=normal,standby
>>>>> LogFile=/var/log/slurmdbd.log
>>>>> PidFile=/run/slurmdbd.pid
>>>>> #PrivateData=accounts,users,usage,jobs
>>>>> #TrackWCKey=yes
>>>>> #
>>>>> # Database info
>>>>> StorageType=accounting_storage/mysql
>>>>> StorageHost=localhost
>>>>> StoragePort=3306
>>>>> StoragePass=slurmdbpass
>>>>> StorageUser=slurm
>>>>> StorageLoc=slurm_acct_db
>>>>> I used standard names and passwords to get started and I will change
>>>>> later
>>>>>
>>>>> but everytime I try to start slurmdbd.service it crashes and I have
>>>>> that log that I shared with you
>>>>>
>>>>> I use these versions
>>>>>
>>>>> slurmdbd -V
>>>>> slurm-wlm 19.05.5
>>>>> mysql Ver 15.1 Distrib 10.3.39-MariaDB, for debian-linux-gnu (x86_64)
>>>>> using readline 5.2
>>>>> Everything else Is working properly except I cannot get slurmdbd to
>>>>> work and at this point I exhausted all my possible trials :) looking for
>>>>> some expert insights :)
>>>>>
>>>>>
>>>>> Any idea what I am doing wrong here ? Also I didn't compile any slurm
>>>>> package. I used the binary from apt repos
>>>>>
>>>>> Any help will be appreciated
>>>>>
>>>>> Cheers
>>>>>
>>>>> Rad
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> *Rad Aniba, PhD*
>>>
>>>
>>>
>>> --
>>> slurm-users mailing list -- slurm-users(a)lists.schedmd.com
>>> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com
>>>
>>
>>
>> --
>> *Rad Aniba, PhD*
>>
>>
>
> --
> *Rad Aniba, PhD*
>
>
> --
> slurm-users mailing list -- slurm-users(a)lists.schedmd.com
> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com
>
>
>
--
*Rad Aniba, PhD*