[slurm-users] Slurm reservation for migrating user home directories
tina.friedrich at it.ox.ac.uk
Fri Apr 16 13:39:36 UTC 2021
Had to do home directory migrations a couple of times without 'full'
downtimes. Similar process, only I don't think we ever bothered
disabling users in LDAP or blocking their jobs. Generally, we told them
we'd move their directory at time X and would they please log out
everywhere; at time X, we killed their jobs & sessions (if any),
migrated everything (including automount information), and let then know
they can log in again.
Saying that clearing sssd etc caches sounds like a very good idea :)
Two suggestions to add:
- Make the old home directories read only/immutable directly after
migration, so that sessions forgotten or picking up the wrong automount
information throw errors when trying to use them.
- I'd rsync the whole file system across to the new machines way ahead
of 'migration day', so that during migration only a 'last pass' sort of
sync was required - generally much faster if most of the files are
On 16/04/2021 14:20, Ward Poelmans wrote:
> Hi Ole,
> On 16/04/2021 14:23, Ole Holm Nielsen wrote:
>> Question: Does anyone have experiences with this type of scenario? Any
>> good ideas or suggestions for other methods for data migration?
> We once did something like that.
> Basically it did something like that:
> - Process is kicked off per user by some trigger
> - Block all new jobs of the given user
> - Wait until all currently running jobs have finished
> - Disable the user in the LDAP and wipe the sssd cache for the user.
> - Kill all their processes on the login nodes
> - Move the data
> - Re-enable the user in the LDAP
> - Remove any blocks/limits of the user to start new job
> - Mail the user that he/she can continue working again.
> The whole process went pretty smooth.
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
More information about the slurm-users