[slurm-users] Slurm overhead
Mahmood Naderan
mahmood.nt at gmail.com
Thu Apr 19 10:05:25 MDT 2018
Hi,
I have installed a program on all nodes since it is an rpm. Therefore,
when the program is running, it won't use the shared file system and
it just use its own /usr/local/program files.
I also set a scratch path in the bashrc which is actually the path on
the running node. For example, I set TMPFOLDER=/tmp/mahmood/program in
the bashrc (home is shared), then I ssh to the node and create that
path. Therefore, when the program wants to read/write some data during
the execution it won't go through the network.
Thing is that, when I directly ssh to the node and run the program
with time command, I see
real 7m34.738s
However, when I submit the job via slurm on the head node, I see
[mahmood at rocks7 g]$ sacct -X -j 66 --format=elapsed
Elapsed
----------
00:11:28
So, I think the slurm overhead is large (about 50%). Is that correct?
How can I reduce that overhead?
Regards,
Mahmood
More information about the slurm-users
mailing list