[slurm-users] slurm_rpc_node_registration invalid argument

Michael Di Domenico mdidomenico4 at gmail.com
Wed Aug 26 15:20:12 UTC 2020


looks like a similar issue is being tracked by:
https://bugs.schedmd.com/show_bug.cgi?id=9441

On Wed, Aug 26, 2020 at 11:04 AM Michael Di Domenico
<mdidomenico4 at gmail.com> wrote:
>
> sorry i meant to say, our slurm nodehealth script pushed the node to
> failed state.  slurm itself wasn't doing this
>
> On Wed, Aug 26, 2020 at 11:02 AM Michael Di Domenico
> <mdidomenico4 at gmail.com> wrote:
> >
> > i just upgraded from v18 to v20.  Did something change in the node
> > config validation?  it used to be that if i started slurm on a compute
> > node that had lower than expected memory or was missing gpu's, slurm
> > would push a node into a failed state that i could see in sinfo -R.
> > now it seems to be logging every second in the slurmctld
> > "slurm_rpc_node_registration invalid argument" log file for each node
> > that's broken
> >
> > Is there some function that got disabled/changed?  i use slurm to
> > ferret out bad hardware, but logging to the logfile every seconds
> > seems silly and since i don't routinely watch the log files things
> > will go unnoticed



More information about the slurm-users mailing list