I’d put something in cloud-init or the EC2 user data script to synchronise them as the instance comes up, whatever your preference is for doing that sort of thing; ansible, or just
simply copying the file (if you’re certain they should be identical on every node, and I’d hope they are!)
Tim
Tim Cutts
Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform
AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service
Catalogue |
From:
Feng Zhang via slurm-users <slurm-users@lists.schedmd.com>
Date: Tuesday, 11 February 2025 at 10:10 pm
To: mark.w.moorcroft@ama-inc.com <mark.w.moorcroft@ama-inc.com>
Cc: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: /etc/passwd sync?
Keep the /etc/password, group synced to all the nodes should work. And it will need to set up an SSH key for MPI.
If you set up slurm elastic cloud in EC2 without LDAP, what is the recommended method for sync of the passwd/group files? Is this necessary to get openmpi jobs to run. I would swear I had this working last week without synced passwd on
two nodes. But thinking about it now I'm not sure how this could have worked. My home directories are in an NFS mount, but the user accounts don't exist on the node AMI. I'm using ansible/packer to manage the AMI's. When I ran OpenHPC / Slurm on bare metal
there was a sync process. This is my first AWS Slurm cluster rodeo. I can't use the Amazon Parallel Computing tools because we are forced to be in GovCloud. I started with "ClusterInTheCloud", but it's all 4 years old, and semi-broken out of the box. My manager
had me ditch a lot of it (including LDAP). So I'm building out a fork that is getting heavily modded for our situation.
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
--
slurm-users mailing list --
slurm-users@lists.schedmd.com
To unsubscribe send an email to
slurm-users-leave@lists.schedmd.com
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge,
CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not
copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and
monitor communications, please see our privacy notice at
www.astrazeneca.com