[slurm-users] Execute parallel commands on all nodes running jobs of a particular user
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Aug 7 13:31:09 MDT 2018
On 06-08-2018 12:53, Bjørn-Helge Mevik wrote:
> There is also a Slurm plugin for pdsh (unfortunately not enabled in the
> default redhat/centos RPMs) that lets you run a command on each node
> belonging to a specific job with "pdsh -j <jobid> <command>". Not
> exactly the same, though. :)
Bjørn, that is a different task. I've documented pdsh usage with Slurm
in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/SLURM#pdsh-parallel-distributed-shell.
However, I find it easier to work with ClusterShell, see
https://wiki.fysik.dtu.dk/niflheim/SLURM#clustershell.
The functionality I proposed on this list is to run a command on *all*
nodes belonging to *all* jobs of a particular user:
> If you add a "slurmuser" section to the /etc/clustershell/groups.conf.d/slurm.conf file, you can now run commands such as:
>
> $ clush -bw at su:username 'df -Ph /scratch'
>
> $ clush -bw at su:username 'du -s /scratch/username'
This functionality will be available in the next release 1.8.1 of
ClusterShell.
/Ole
More information about the slurm-users
mailing list