I made some progress without need for the /etc/passwd sync. Sbatch is working fine on multi-node jobs it appears. Now only salloc runs fail, which I guess is expected behavior without user account sync and
ssh key setup. On bare metal OpenHPC, warewulf handles all that for me. So presumably I have MPI keys, but not user account sync and keys. Again, on OpenHPC they populated keys on account creation, and passwd was synced to the boot image.
Can anyone tell me what this is about?
slurmctld[46449]: slurmctld: error: failed to generate json for resume job/node list
Mark Moorcroft
Senior Linux Administrator
Analytical Mechanics Associates