[slurm-users] Execute parallel commands on all nodes running jobs of a particular user

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Aug 7 13:31:09 MDT 2018


On 06-08-2018 12:53, Bjørn-Helge Mevik wrote:
> There is also a Slurm plugin for pdsh (unfortunately not enabled in the
> default redhat/centos RPMs) that lets you run a command on each node
> belonging to a specific job with "pdsh -j <jobid> <command>".  Not
> exactly the same, though. :)

Bjørn, that is a different task.  I've documented pdsh usage with Slurm 
in my Wiki page 
https://wiki.fysik.dtu.dk/niflheim/SLURM#pdsh-parallel-distributed-shell. 
  However, I find it easier to work with ClusterShell, see 
https://wiki.fysik.dtu.dk/niflheim/SLURM#clustershell.

The functionality I proposed on this list is to run a command on *all* 
nodes belonging to *all* jobs of a particular user:

> If you add a "slurmuser" section to the /etc/clustershell/groups.conf.d/slurm.conf file, you can now run commands such as:
> 
> $ clush -bw at su:username 'df -Ph /scratch'
> 
> $ clush -bw at su:username 'du -s /scratch/username' 

This functionality will be available in the next release 1.8.1 of 
ClusterShell.

/Ole



More information about the slurm-users mailing list