[slurm-users] SLURM_JOB_NODELIST not available in prolog / epilog scripts

Sun Mar 4 23:31:50 MST 2018

I completely agree with what Chris says regarding cgroups.  Implement them,
and you will not regret it.

I have worked with other simulation frameworks, which work in a similar
fashion to Trick, ie a master process which spawns
off independent worker processes on compute nodes. I am thinking on an
internal application we have, and if I also say it Matlab.

In the Trick documentation:
<https://github.com/nasa/trick/wiki/UserGuide-Monte-Carlo#notes>Notes

   1. SSH <https://en.wikipedia.org/wiki/Secure_Shell> is used to launch
   slaves across the network
   2. Each slave machine will work in parallel with other slaves, greatly
   reducing the computation time of a simulation

However I must say that there must be plenty of folks at NASA who use this
simulation framework on HPC clusters with batch systems.
It would surprise me that there are not 'adapation layers' available for
Slurm, SGE, PBS etc.
So in SLurm, you would do an sbatch which would reserve your worker nodes
then run a series of srun which run the worker processes.

(I hope I have that round the right way - I seem to recall doing srun then
a series of sbatches in the past)

But looking at the Trick Wiki quickly, I am wrong. It does seem to work on
the model of "get a list of hosts allocated by your batch system"",
ie the SLURM_JOB_HOSTLIST then Trick will set up simulation queues which
spwan off models using ssh.
Looking at the Advanced Topics guide this does seem to be so:
https://github.com/nasa/trick/blob/master/share/doc/trick/Trick_Advanced_Topics.pdf
The model is that you allocate up to 16 remote worker hosts for a long
time. Then various modelling tasks are started on those hosts via ssh.
Trick expects those hosts to be available for more tasks during your
simulation session.
Also there is discussion there about turning off irqbalance and cpuspeed,
and disabling non necessary system services.

As someone who has spent endless oodles of hours either killing orphaned
processes on nodes, or seeing rogueprocess alarms,
or running ps --forest to trace connections into batch job nodes which
bypass the pbs/slurm daemons I despair slightly...
I am probably very wrong, and NASA have excellent slurm integration.

So I agree with Chris  - implement cgroups, and try to make sure your ssh
'lands'on a cgroup.
'lscgroup' is a nice command to see what cgroups are active on a compte
node.
Also run an interactive job, ssh into one of your allocated workr nodes,
then  cat /proc/self/cgroups   shows which cgroups you have landed iin.

On 5 March 2018 at 02:20, Christopher Samuel <chris at csamuel.org> wrote:

> On 05/03/18 12:12, Dan Jordan wrote:
>
> What is the /correct /way to clean up processes across the nodes
>> given to my program by SLURM_JOB_NODELIST?
>>
>
> I'd strongly suggest using cgroups in your Slurm config to ensure that
> processes are corralled and tracked correctly.
>
> You can use pam_slurm_adopt from the contrib directory to capture
> inbound SSH sessions into a running job on the node (and deny access to
> people who don't).
>
> Then Slurm should take care of everything for you without needing an
> epilog.
>
> Hope this helps!
> Chris
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180305/1e81027e/attachment-0001.html>