[slurm-users] Too many single-stream jobs?
Andy Riebs
andy.riebs at hpe.com
Mon Feb 12 17:01:49 MST 2018
Many thanks Matthieu!
Andy
On 02/12/2018 06:42 PM, Matthieu Hautreux wrote:
> Hi,
>
> your login node may have a heavy load while starting such a large
> number of independant sruns.
>
> This may induce issues not seen under normal load, like partial
> read/write on sockets, triggering bugs in slurm, for functions not
> properly protected against such events.
>
> Quickly looking at the source code of the function generating the
> "io_init_msg_read too small " message, it seems that at least this one
> is not properly protected against partial write :
>
> 217 | int
> 218 | io_init_msg_write_to_fd(int fd, struct slurm_io_init_msg *msg)
> 219 | {
> 220 | Buf buf;
> 221 | void *ptr;
> 222 | int n;
> 223 |
> 224 | xassert(msg);
> 225 |
> 226 | debug2("Entering io_init_msg_write_to_fd");
> 227 | msg->version = IO_PROTOCOL_VERSION;
> 228 | buf = init_buf(io_init_msg_packed_size());
> 229 | debug2(" msg->nodeid = %d", msg->nodeid);
> 230 | io_init_msg_pack(msg, buf);
> 231 |
> 232 | ptr = get_buf_data(buf);
> 233 | again:
> 234 | => if ((n = write(fd, ptr, io_init_msg_packed_size())) < 0) {
> 235 | if (errno == EINTR)
> 236 | goto again;
> 237 | free_buf(buf);
> 238 | return SLURM_ERROR;
> 239 | }
> 240 | if (n != io_init_msg_packed_size()) {
> 241 | error("io init msg write too small");
> 242 | free_buf(buf);
> 243 | return SLURM_ERROR;
> 244 | }
> 245 |
> 246 | free_buf(buf);
> 247 | debug2("Leaving io_init_msg_write_to_fd");
> 248 | return SLURM_SUCCESS;
> 249 | }
>
> A proper way to handle partial write is the following (from somewhere
> else in Slurm codebase) :
>
> 188 | ssize_t fd_write_n(int fd, void *buf, size_t n)
> 189 | {
> 190 | size_t nleft;
> 191 | ssize_t nwritten;
> 192 | unsigned char *p;
> 193 |
> 194 | p = buf;
> 195 | nleft = n;
> 196 | while (nleft > 0) {
> 197 | => if ((nwritten = write(fd, p, nleft)) < 0) {
> 198 | if (errno == EINTR)
> 199 | continue;
> 200 | else
> 201 | return(-1);
> 202 | }
> 203 | nleft -= nwritten;
> 204 | p += nwritten;
> 205 | }
> 206 | return(n);
> 207 | }
>
>
> It seems that some code cleaning/factoring could be performed in Slurm
> to limit risks of this kind of issues. Not sure that it would resolve
> your problem but at least it seems harmfull to still have that in the
> code.
>
> You should file a bug for that.
>
> HTH
> Matthieu
>
>
> 2018-02-12 22:42 GMT+01:00 Andy Riebs <andy.riebs at hpe.com
> <mailto:andy.riebs at hpe.com>>:
>
> We have a user who wants to run multiple instances of a single
> process job across a cluster, using a loop like
>
> -----
> for N in $nodelist; do
> srun -w $N program &
> done
> wait
> -----
>
> This works up to a thousand nodes or so (jobs are allocated by
> node here), but as the number of jobs submitted increases, we
> periodically see a variety of different error messages, such as
>
> * srun: error: Ignoring job_complete for job 100035 because our
> job ID is 102937
> * srun: error: io_init_msg_read too small
> * srun: error: task 0 launch failed: Unspecified error
> * srun: error: Unable to allocate resources: Job/step already
> completing or completed
> * srun: error: Unable to allocate resources: No error
> * srun: error: unpack error in io_init_msg_unpack
> * srun: Job step 211042.0 aborted before step completely launched.
>
> We have tried setting
>
> ulimit -n 500000
> ulimit -u 64000
>
> but that wasn't sufficient.
>
> The environment:
>
> * CentOS 7.3 (x86_64)
> * Slurm 17.11.0
>
> Does this ring any bells? Any thoughts about how we should proceed?
>
> Andy
>
> --
> Andy Riebs
> andy.riebs at hpe.com <mailto:andy.riebs at hpe.com>
> Hewlett-Packard Enterprise
> High Performance Computing Software Engineering
> +1 404 648 9024 <tel:%28404%29%20648-9024>
> My opinions are not necessarily those of HPE
> May the source be with you!
>
>
--
Andy Riebs
andy.riebs at hpe.com
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180212/070d3a16/attachment-0001.html>
More information about the slurm-users
mailing list