Hi all,
We are doing a simple setup for a Slurm cluster (version 23.11.6). We follow the documentation and we are trying a setup still without accounting or slurmdbd. The slurm.conf is really simple: ``` ClusterName=Develop SlurmctldHost=head
# Slurm configuration AuthType=auth/munge CryptoType=crypto/munge SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdLogFile=/var/log/slurm/slurmd.log SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurmctld
# Nodes NodeName=worker1 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1 NodeName=worker2 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1
# Partitions PartitionName=develop Default=YES MaxTime=UNLIMITED Nodes="worker1,worker2" ```
When running a simple `srun sleep 10`, all works well and the log file shows:
[2024-05-15T12:34:12.741] sched: _slurm_rpc_allocate_resources JobId=1 NodeList=worker1 usec=549 [2024-05-15T12:34:22.775] _job_complete: JobId=1 WEXITSTATUS 0 [2024-05-15T12:34:22.775] _job_complete: JobId=1 done
But when creating a scrip with the same sleep command, and submiting using `sbatch test.sh`, the log shows:
[2024-05-15T12:35:39.916] _slurm_rpc_submit_batch_job: JobId=2 InitPrio=1 usec=368 [2024-05-15T12:35:40.000] error: _refresh_assoc_mgr_qos_list: no new list given back keeping cached one. [2024-05-15T12:35:40.000] sched: JobId=2 has invalid account [2024-05-15T12:35:40.145] sched/backfill: _start_job: Started JobId=2 in develop on worker1 [2024-05-15T12:35:50.172] _job_complete: JobId=2 WEXITSTATUS 0 [2024-05-15T12:35:50.172] _job_complete: JobId=2 done
We have the same account with the UID and GID, as said in the documentation. Looking at the function that seems to spit out that error (https://github.com/SchedMD/slurm/blob/e9f28ede27795f525e62f998cb2d40931d884e...), it appears like there should be some accounting setup? We do not have slurmdbd setup and the documentation states we should test basic functionality before implementing that daemon.
Any tips? Thanks in advance. João
I figured out that the mailing list may not be appropriate for this message, so I've created a bug report instead: https://bugs.schedmd.com/show_bug.cgi?id=19894
Hi João,
did you get this problem solved? I have the exact same problem and would be very interested.
Help would be greatly appreciated!
Thank you and best regards, Andi
I had the same issue. After upgrading to slurm-24.05.2 problem is solved. Try it.
R. ________________________________ Od: andreas.wiedholz--- via slurm-users slurm-users@lists.schedmd.com Wysłane: poniedziałek, 15 lipca 2024 14:32 Do: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Temat: [slurm-users] Re: _refresh_assoc_mgr_qos_list: no new list given back keeping cached one
UWAGA: Wiadomość pochodzi od zewnętrznego nadawcy.
Hi João,
did you get this problem solved? I have the exact same problem and would be very interested.
Help would be greatly appreciated!
Thank you and best regards, Andi
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com