Bill, would this allow allocating all the remaining harts when the node is initially half full ? How are the parameters set up for that ? The cluster has 14 machines with 56 harts and 128 GB RAM and 12 machines with 104 harts and 256 GB RAM.
Some of the algorithms used have hot loops that scale close to or beyond the number of harts, so it will always be beneficial to use all harts available in an opportunistic, best-effort way. The algorithms are for training photometric galaxy redshift estimators (galaxy distance calculators). Training will be done with a certain frequency due to the large amount of available physical parameters. The amount of memory that's being required right now seems to be below 10 GB, but I can't say for all algorithms that will be used (at least 6 different ones), nor for different parameters expected to be required.
On Thu, Aug 1, 2024 at 4:27 PM Bill via slurm-users slurm-users@lists.schedmd.com wrote:
Either allocate the whole node's cores or the whole node's memory? Both will allocate the node exclusively for you.
So you'll need to know what a node looks like. For a homogeneous cluster, this is straightforward. For a heterogeneous cluster, you may also need to specify a nodelist for say those 28 core nodes and then those 64 core nodes.
But going back to the original answer, --exclusive, is the answer here. You DO know how many cores you need right? (Scaling study should give you that). And you DO know the memory footprint by past jobs with similar inputs I hope.
Bill
On 8/1/24 3:17 PM, Henrique Almeida via slurm-users wrote:
Hello, maybe rephrase the question to fill a whole node ?
On Thu, Aug 1, 2024 at 3:08 PM Jason Simms jsimms1@swarthmore.edu wrote:
On the one hand, you say you want "to allocate a whole node for a single multi-threaded process," but on the other you say you want to allow it to "share nodes with other running jobs." Those seem like mutually exclusive requirements.
Jason
On Thu, Aug 1, 2024 at 1:32 PM Henrique Almeida via slurm-users slurm-users@lists.schedmd.com wrote:
Hello, I'm testing it right now and it's working pretty well in a normal situation, but that's not exactly what I want. --exclusive documentation says that the job allocation cannot share nodes with other running jobs, but I want to allow it to do so, if that's unavoidable. Are there other ways to configure it ?
The current parameters I'm testing:
sbatch -N 1 --exclusive --ntasks-per-node=1 --mem=0 pz-train.batch
On Thu, Aug 1, 2024 at 12:29 PM Davide DelVento davide.quantum@gmail.com wrote:
In part, it depends on how it's been configured, but have you tried --exclusive?
On Thu, Aug 1, 2024 at 7:39 AM Henrique Almeida via slurm-users slurm-users@lists.schedmd.com wrote:
Hello, everyone, with slurm, how to allocate a whole node for a single multi-threaded process?
https://stackoverflow.com/questions/78818547/with-slurm-how-to-allocate-a-wh...
-- Henrique Dante de Almeida hdante@gmail.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- Henrique Dante de Almeida hdante@gmail.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- Jason L. Simms, Ph.D., M.P.H. Manager of Research Computing Swarthmore College Information Technology Services (610) 328-8102 Schedule a meeting: https://calendly.com/jlsimms
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com