inconsistent 'Requested node configuration is not available' error
In my cluster right now we have: $ sinfo ... pubgpu-req up 7-00:00:00 7 mix cobra,l40s-[01,03],luisa,rtx-03,sevilla,shelob pubgpu-req up 7-00:00:00 5 idle fiber,glaurung,gothmog,l40s-02,leo The following works fine: $ srun -p pubgpu-req -A sysadm --nodelist=cobra,l40s-02 --gres=gpu:1 -N 1 \ --ntasks-per-node=1 --mem=1G --time=1:00:00 --cpus-per-task=4 --pty /bin/bash srun: tres_per_node => gres/gpu:1 cobra[0]:~$ exit exit However just change the order of the nodelist and you get $ srun -p pubgpu-req -A sysadm --nodelist=l40s-02,cobra --gres=gpu:1 -N 1 \ --ntasks-per-node=1 --mem=1G --time=1:00:00 --cpus-per-task=4 --pty /bin/bash srun: tres_per_node => gres/gpu:1 srun: error: Unable to create step for job 8255390: Requested node configuration is not available More experimentation with this and it appears nodelist HAS to be in alphabetical order $ srun -p pubgpu-req -A sysadm --nodelist=l40s-02,leo,fiber \ --gres=gpu:1 -N 1 --ntasks-per-node=1 --mem=1G --time=1:00:00 \ --cpus-per-task=4 --pty /bin/bash srun: tres_per_node => gres/gpu:1 srun: error: Unable to create step for job 8255401: Requested node configuration is not available $ srun -p pubgpu-req -A sysadm --nodelist=fiber,l40s-02,leo \ --gres=gpu:1 -N 1 --ntasks-per-node=1 --mem=1G --time=1:00:00 \ --cpus-per-task=4 --pty /bin/bash srun: tres_per_node => gres/gpu:1 fiber[0]:~$ Surely there is no good reason for this? --------------------------------------------------------------- Paul Raines http://help.nmr.mgh.harvard.edu MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging 149 (2301) 13th Street Charlestown, MA 02129 USA The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> . Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
participants (1)
-
Paul Raines