[slurm-users] Slurm overhead

Mahmood Naderan mahmood.nt at gmail.com
Thu Apr 19 10:05:25 MDT 2018

I have installed a program on all nodes since it is an rpm. Therefore,
when the program is running, it won't use the shared file system and
it just use its own /usr/local/program files.

I also set a scratch path in the bashrc which is actually the path on
the running node. For example, I set TMPFOLDER=/tmp/mahmood/program in
the bashrc (home is shared), then I ssh to the node and create that
path. Therefore, when the program wants to read/write some data during
the execution it won't go through the network.

Thing is that, when I directly ssh to the node and run the program
with time command, I see

real    7m34.738s

However, when I submit the job via slurm on the head node, I see

[mahmood at rocks7 g]$ sacct -X -j 66 --format=elapsed

So, I think the slurm overhead is large (about 50%). Is that correct?
How can I reduce that overhead?


