Hello,
I’m doing some test with “associations” with “sacctmgr”. I have created three users (user_1, user_2 and user_3). For each of these users, I have created an association:
[root@myserver log]# sacctmgr show user user_1 --associations
      User   Def Acct     Admin    Cluster    Account  Partition     Share   Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                 
 QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- --------------------
 ---------
    user_1       test      None     q50004       test    aolin.q         1                  4        2       10                                                
 normal
    user_1       test      None     q50004       test cuda-staf+         1                  4        2       10                                                
 normal
[root@myserver log]# sacctmgr show user user_2 --associations
      User   Def Acct     Admin    Cluster    Account  Partition     Share   Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                 
 QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- --------------------
 ---------
    user_2       test      None     q50004       test cuda-int.q         1                                    4                                                
 normal
[root@myserver log]# sacctmgr show user user_3 --associations
      User   Def Acct     Admin    Cluster    Account  Partition     Share   Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                 
 QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- --------------------
 ---------
    user_3       test      None     q50004       test research.q         1                           2        1                                                
 normal
    user_3       test      None     q50004       test     xeon.q         1                           2        1                                                
 normal
All users belong to “Test” account:
[root@myserver log]# sacctmgr show account test --association
   Account                Descr                  Org    Cluster ParentName       User     Share   Priority GrpJobs GrpNodes  GrpCPUs  GrpMem GrpSubmit     GrpWall  GrpCPUMins
 MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS
---------- -------------------- -------------------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- ------- --------- ----------- -----------
 ------- -------- -------- --------- ----------- ----------- -------------------- ---------
      test                 test                 test     q50004       root                    1                                                                                                                                                         
 normal
      test                 test                 test     q50004                user_1         1                                                                                     
 4        2       10                                                 normal
      test                 test                 test     q50004                user_1         1                                                                                     
 4        2       10                                                 normal
      test                 test                 test     q50004                user_2         1                                                                                                       
 4                                                 normal
      test                 test                 test     q50004                user_3         1                                                                                              
 2        1                                                 normal
      test                 test                 test     q50004                user_3         1                                                                                              
 2        1                                                 normal
When I submit with “user_1”, all tests are running fine: some jobs are queued and executed and some jobs are rejected because of the limits.
However, with users “user_2” and “user_3” I can’t submit any job. All jobs are rejected with these messages:
     11168 research.     test          user_3  PENDING         0:00  2024-04-17T12:53:21                  N/A    1    1     OK                  N/A
 (AssocMaxCpuPerJo (null)
     11173 research.     test          user_3  PENDING         0:00  2024-04-17T13:06:02                  N/A    1    1     OK                  N/A
 (AssocMaxCpuPerJo (null)
     11174 research.     test          user_3  PENDING         0:00  2024-04-17T13:06:16                  N/A    1    1     OK                  N/A
 (AssocMaxCpuPerJo (null)
     11176 research.     test          user_3  PENDING         0:00  2024-04-17T13:07:23                  N/A    1    1     OK                  N/A
 (AssocMaxCpuPerJo (null)
     11180 research.     test          user_3  PENDING         0:00  2024-04-17T13:08:45                  N/A    1    1     OK                  N/A
 (AssocMaxCpuPerJo (null)
For example, user “user_3” are trying to submit in this way (test.sh script only is a simple “sleep 50”:
sbatch -p aolin.q -N 2 ./test.sh
à sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
sbatch -p aolin.q -N 1 ./test.sh
à sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
sbatch -p research.q -N 1 ./test.sh
à submitted but not running
à nodelist(reason)=
(AssocMaxCpuPerJobLimit) -> WHY???
sbatch -p research.q -N 1 -n 1 ./test.sh
à submitted but not running
à nodelist(reason)=
(AssocMaxCpuPerJobLimit)
à WHY???
sbatch -p xeon.q -N 1 -n 1 ./test.sh
à submitted and running!!
[root@myserver log]# squeue
     JOBID PARTITION     NAME            USER    STATE         TIME          SUBMIT_TIME           START_TIME NODE CPUS OVER_S        TRES_PER_NODE
 NODELIST(REASON)  DEPENDENCY        REQ_NODES   NODELIST
     11202 research.     test          user_3  PENDING         0:00  2024-04-17T13:33:31                  N/A    1    1     OK                  N/A
(AssocMaxCpuPerJo (null)
     11200 research.     test          user_3  PENDING         0:00  2024-04-17T13:33:17                  N/A    1    1     OK                  N/A
(AssocMaxCpuPerJo (null)
     11212    xeon.q     test          user_3  RUNNING         0:18  2024-04-17T13:36:10  2024-04-17T13:36:10    1    1     OK                  N/A
 aolin-cpu-1       (null)             aolin-cpu-1
Why? What am I doing wrong? Where is the limit that I am not seeing?
Thanks a lot!