<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">I have some users that are using ray on slurm.<div class="">I will preface by saying we are new slurm users, so may not be doing everything exactly correct.</div><div class=""><br class=""></div><div class="">The only issue that we came across so far as something that was somewhat ray specific that we ran into.</div><div class="">Specifically, and pardon my lack of specificity, the ray user I worked on this with is on vacation at the moment, there was an environment variable that needed to be unset so that ray wouldn’t kneecap itself if it hit a cpuset corner case in cgroup fencing.</div><div class=""><br class=""></div><div class="">Specifically, in this workload, the user spawns a “ray head,” and important to mention that this head worker may not have the same resources allocated to it as the “ray worker”.</div><div class="">TL;DR the ray head would be given fewer cpus than the worker(s), and in some corner cases, the worker pid spawned would inherit a smaller cpuset from an environment variable passed from the ray head that is then spawning workers via srun.</div><div class=""><br class=""></div><div class="">The user noticed that some workers would be able to get 100% util for their allocated cpu resources, where other workers running identical workloads would end up at partial usage, which we discovered were due to the cpuset getting inherited in a way we didn’t intend for it to.</div><div class="">I’ll have to follow up with the environment variable we had to unset when that user is back.</div><div class=""><br class=""></div><div class="">But here is my quick and dirty bash script that was able to show the cpu’s allocated to the cgroup, and the pid’s inside the cgroup, which should match, but didn’t always, which was our discovery.</div><div class="">Just use the uid of the user submitting the jobs.</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class=""><font face="Menlo" class="">#!/bin/bash</font></div><div class=""><font face="Menlo" class="">UID=$1</font></div><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class="">for JOB in $(ls -lah /sys/fs/cgroup/cpuset/slurm/uid_$UID/ | grep job | awk -F'_' '{print $2}' | xargs)</font></div><div class=""><font face="Menlo" class=""> do</font></div><div class=""><font face="Menlo" class=""> echo "Slurm JobID: “$JOB</font></div><div class=""><font face="Menlo" class=""> echo -n "Cgroup CPU set: "</font></div><div class=""><font face="Menlo" class=""> cat /sys/fs/cgroup/cpuset/slurm/uid_$UID/job_$JOB/cpuset.cpus</font></div><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class=""> for PID in $(cat /sys/fs/cgroup/cpuset/slurm/uid_$UID/job_$JOB/step_0/cgroup.procs | xargs)</font></div><div class=""><font face="Menlo" class=""> do</font></div><div class=""><font face="Menlo" class=""> echo -n "CPUs allocated for PID "$PID": "</font></div><div class=""><font face="Menlo" class=""> cat /proc/$PID/status | grep Cpus_allowed_list | awk '{print $2}'</font></div><div class=""><font face="Menlo" class=""> done</font></div><div class=""><font face="Menlo" class=""> echo ""</font></div><div class=""><font face="Menlo" class=""> done</font></div></blockquote></div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class=""><font face="Menlo" class="">slurmd3:</font></div><div class=""><font face="Menlo" class=""> Slurm Job: 409</font></div><div class=""><font face="Menlo" class=""> Cgroup CPU set: 0-7</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7907: 0-7</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7912: 0-3</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7931: 0-3</font></div><div class=""><font face="Menlo" class="">slurmd1:</font></div><div class=""><font face="Menlo" class=""> Slurm Job: 406</font></div><div class=""><font face="Menlo" class=""> Cgroup CPU set: 0-3</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7409: 0-3</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7414: 0-3</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7425: 0-3</font></div><div class=""><font face="Menlo" class="">slurmd2:</font></div><div class=""><font face="Menlo" class=""> Slurm Job: 408</font></div><div class=""><font face="Menlo" class=""> Cgroup CPU set: 0-7</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7491: 0-7</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7496: 0-3</font></div><div class=""><font face="Menlo" class=""> CPUs allocated for PID 7515: 0-3</font></div></blockquote><br class=""></div><div class="">But otherwise, I’ve not had issues with users spawning jobs from within jobs, but I’m not a seasoned slurm admin, so that may not hold up with others.</div><div class=""><br class=""></div><div class="">Reed</div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On Jul 15, 2022, at 4:17 AM, Kamil Wilczek <<a href="mailto:kmwil@mimuw.edu.pl" class="">kmwil@mimuw.edu.pl</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Dear Slurm Users,<br class=""><br class="">one of my cluster users would like to run a Ray cluster on Slurm.<br class="">I noticed that the batch script example requires running the "srun"<br class="">command on a compute node, which already is allocated:<br class=""><a href="https://docs.ray.io/en/latest/cluster/examples/slurm-template.html#slurm-template" class="">https://docs.ray.io/en/latest/cluster/examples/slurm-template.html#slurm-template</a><br class=""><br class="">This is the first time I see or hear about this type of usage<br class="">and I have problems wrapping my head around this.<br class="">Is there anything wrong or unusual about this? I understand that<br class="">this would allocate some resources on other nodes. Would<br class="">Slurm enforce limits properly ("qos" or "partition" limits)?<br class=""><br class="">Kind Regards<br class="">-- <br class="">Kamil Wilczek [https://keys.openpgp.org/]<br class="">[D415917E84B8DA5A60E853B6E676ED061316B69B]<br class=""></div></div></blockquote></div><br class=""></div></body></html>