Hi all,
We are doing a simple setup for a Slurm cluster (version 23.11.6). We follow the documentation and we are trying a setup still without accounting or slurmdbd. The slurm.conf is really simple: ``` ClusterName=Develop SlurmctldHost=head
# Slurm configuration AuthType=auth/munge CryptoType=crypto/munge SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdLogFile=/var/log/slurm/slurmd.log SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurmctld
# Nodes NodeName=worker1 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1 NodeName=worker2 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1
# Partitions PartitionName=develop Default=YES MaxTime=UNLIMITED Nodes="worker1,worker2" ```
When running a simple `srun sleep 10`, all works well and the log file shows:
[2024-05-15T12:34:12.741] sched: _slurm_rpc_allocate_resources JobId=1 NodeList=worker1 usec=549 [2024-05-15T12:34:22.775] _job_complete: JobId=1 WEXITSTATUS 0 [2024-05-15T12:34:22.775] _job_complete: JobId=1 done
But when creating a scrip with the same sleep command, and submiting using `sbatch test.sh`, the log shows:
[2024-05-15T12:35:39.916] _slurm_rpc_submit_batch_job: JobId=2 InitPrio=1 usec=368 [2024-05-15T12:35:40.000] error: _refresh_assoc_mgr_qos_list: no new list given back keeping cached one. [2024-05-15T12:35:40.000] sched: JobId=2 has invalid account [2024-05-15T12:35:40.145] sched/backfill: _start_job: Started JobId=2 in develop on worker1 [2024-05-15T12:35:50.172] _job_complete: JobId=2 WEXITSTATUS 0 [2024-05-15T12:35:50.172] _job_complete: JobId=2 done
We have the same account with the UID and GID, as said in the documentation. Looking at the function that seems to spit out that error (https://github.com/SchedMD/slurm/blob/e9f28ede27795f525e62f998cb2d40931d884e...), it appears like there should be some accounting setup? We do not have slurmdbd setup and the documentation states we should test basic functionality before implementing that daemon.
Any tips? Thanks in advance. João