[slurm-users] ulimit in sbatch script

Sun Apr 15 13:32:57 MDT 2018

Specifying --mem to Slurm only tells it to find a node that has that much, not to enforce a limit as far as I know. That node has that much so it finds it. You probably want to enable UsePAM and setup the pam.d slurm files and /etc/security/limits.conf to keep users under the 64000MB physical memory that the node has (minus some padding for the OS, etc.). IS UsePAM enabled in your slurm.conf, maybe that’s doing it.

Best,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445

On 4/15/18, 2:28 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-bounces at lists.schedmd.com on behalf of mahmood.nt at gmail.com> wrote:

    Bill,
    Thing is that both user and root see unlimited virtual memory when
    they directly ssh to the node. However, when the job is submitted, the
    user limits change. That means, slurm modifies something.

    The script is

    #SBATCH --job-name=hvacSteadyFoam
    #SBATCH --output=hvacSteadyFoam.log
    #SBATCH --ntasks=32
    #SBATCH --time=100:00:00
    #SBATCH --mem=64000M
    ulimit -a
    mpirun hvacSteadyFoam -parallel

    The physical memory on the node is 64GB, therefore, I specified 64000M
    for --mem. Is that correct? the only thing I am guessing is that --mem
    also modifies virtual memory limit. Though I am not sure.

    Regards,
    Mahmood

    On Sun, Apr 15, 2018 at 11:32 PM, Bill Barth <bbarth at tacc.utexas.edu> wrote:
    > Mahmood, sorry to presume. I meant to address the root user and your ssh to the node in your example.
    >
    > At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm and slurm.pam files both contain pam_limits.so, so it could be that way for you, too. I.e. Slurm could be setting the limits for jobscripts for your users, but for root SSHes, where that’s being set by PAM through another config file. Also, root’s limits are potentially differently set by PAM (in /etc/security/limits.conf) or the kernel at boot time.
    >
    > Finally, users should be careful using ulimit in their job scripts b/c that can only change the limits for that shell script process and not across nodes. That jobscript appears to only apply to one node, but if they want different limits for jobs that span nodes, they may need to use other features of SLURM to get them across all  the nodes their job wants (cgroups, perhaps?).
    >
    > Best,
    > Bill.
    >
    > --
    > Bill Barth, Ph.D., Director, HPC
    > bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
    > Office: ROC 1.435            |   Fax:   (512) 475-9445
    >