[slurm-users] Problem with job allocation,

Nicolas Sonoda nicolas.sonoda at versatushpc.com.br
Fri Apr 1 18:21:38 UTC 2022


Hi!

I'm dealing with a problem that when I try to allocate 2 or more nodes for my job, it went to state CF and didn't start.

In slurmctld.log a single node job have that messages:
[2022-04-01T14:47:43.959] _slurm_rpc_submit_batch_job: JobId=19966 InitPrio=62 usec=998
[2022-04-01T14:47:44.304] sched: Allocate JobId=19966 NodeList=n09 #CPUs=40 Partition=intel_large
[2022-04-01T14:47:44.316] prolog_running_decr: Configuration for JobId=19966 is complete

And 3 nodes job have that:
[2022-04-01T14:46:46.822] _slurm_rpc_submit_batch_job: JobId=19965 InitPrio=62 usec=1148
[2022-04-01T14:46:47.264] sched: Allocate JobId=19965 NodeList=n[07-08,10] #CPUs=120 Partition=intel_large

prolog_running_decr can't configurate the job, and in slurmd.log on nodes there are no messages about that jobs(with 3 nodes).

Can you help me with that?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220401/f24c03c4/attachment-0001.htm>


More information about the slurm-users mailing list