[slurm-users] ntasks and gres question

Wed Apr 6 20:48:54 UTC 2022

Hello,

In my cluster, every node has one instance of a gres called ‘io_nic’.  The intent of it is to make it easier for users to ensure that jobs that perform excessive network I/O do not get scheduled simultaneously on the same machine.

$ sinfo -N -o '%N %Gres'
NODELIST GRESres
chhq-supgcmp001 disk:300000,io_disk:1,io_nic:1res
chhq-supgcmp002 disk:300000,io_disk:1,io_nic:1res
[…]

The desired behavior when a user runs a job like ‘srun --ntasks=10 --gres=io_nic:1 foo.sh' would be for 10 instances of foo.sh to each be forced to run on separate nodes, since there is only on io_nic per node.  However, this is not what we see; the tasks will run concurrently on the same node.   From reading the docs, it appears that this is because the io_nic is allocated to the job and not to the tasks, and this is supported by the fact that running multiple such sruns concurrently (or using -N instead of -n)  invariably puts them on separate nodes, or they go into PENDING status and on resources (as expected).

Is there a way to tie the gres reservation to a task instead of an entire job?  Ideally I'd like my users to not need to worry about the implementation details, and simply attach a '--gres=io_nic:1' to high-I/O jobs

--

Chip Seraphine
Linux Admin (Grid)

E: cseraphine at drwholdings.com
M: 773 412 2608

This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.