<div dir="ltr"><div>Hi <span class="gmail-il">Mahmood</span>,</div><div><br></div><div>If you want the virtual memory size to be unrestricted by slurm, set VSizeFactor to 0 in slurm.conf, which according to the documentation disables virtual memory limit enforcement.</div><div><br></div><div><a href="https://slurm.schedmd.com/slurm.conf.html#OPT_VSizeFactor">https://slurm.schedmd.com/slurm.conf.html#OPT_VSizeFactor</a></div><div><br></div><div>-Sean<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 27, 2020 at 11:47 PM Mahmood Naderan <<a href="mailto:mahmood.nt@gmail.com">mahmood.nt@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div style="font-family:tahoma,sans-serif" class="gmail_default">
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">>This line is probably what is limiting you to around 40gb.</span></p>
<p class="MsoNormal"><span style="font-family:"Tahoma",sans-serif">>#SBATCH --mem=38GB</span></p>

</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Yes. If I change that value, the "ulimit -v" also changes. See below</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">[shams@hpc ~]$ cat slurm_blast.sh | grep mem<br>#SBATCH --mem=50GB<br>[shams@hpc ~]$ cat my_blast.log<br>virtual memory          (kbytes, -v) 57671680<br>/var/spool/slurmd/job00306/slurm_script: line 13: ulimit: virtual memory: cannot modify limit: Operation not permitted<br>virtual memory          (kbytes, -v) 57671680<br>Error memory mapping:/home/shams/ncbi-blast-2.9.0+/bin/nr.69.psq openedFilesCount=168 threadID=0<br>Error: NCBI C++ Exception:</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">However, the solution is not to change that parameter. There are two issue with that:</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">1) --mem belongs to the physical memory which is requested by job and is later reserved for the job by slurm.</div><div class="gmail_default" style="font-family:tahoma,sans-serif">So, on a 64GB node, if a user requests --mem=50GB, actually no one else can run a job with 10GB memory need.</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">2) The virtual size of the program (according) to the top is about 140GB. So, if I set --mem=140GB, the job stuck in the queue because requested information is invalid (node has 64GB of memory).</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">I really think there is a problem with slurm but can not find the root of the problem. The slurm config parameters are</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Configuration data as of 2020-01-28T08:04:55<br>AccountingStorageBackupHost = (null)<br>AccountingStorageEnforce = associations,limits,qos,safe,wckeys<br>AccountingStorageHost   = hpc<br>AccountingStorageLoc    = N/A<br>AccountingStoragePort   = 6819<br>AccountingStorageTRES   = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu<br>AccountingStorageType   = accounting_storage/slurmdbd<br>AccountingStorageUser   = N/A<br>AccountingStoreJobComment = Yes<br>AcctGatherEnergyType    = acct_gather_energy/none<br>AcctGatherFilesystemType = acct_gather_filesystem/none<br>AcctGatherInterconnectType = acct_gather_interconnect/none<br>AcctGatherNodeFreq      = 0 sec<br>AcctGatherProfileType   = acct_gather_profile/none<br>AllowSpecResourcesUsage = 0<br>AuthAltTypes            = (null)<br>AuthInfo                = (null)<br>AuthType                = auth/munge<br>BatchStartTimeout       = 10 sec<br>BOOT_TIME               = 2020-01-27T09:53:58<br>BurstBufferType         = (null)<br>CheckpointType          = checkpoint/none<br>CliFilterPlugins        = (null)<br>ClusterName             = jupiter<br>CommunicationParameters = (null)<br>CompleteWait            = 0 sec<br>CoreSpecPlugin          = core_spec/none<br>CpuFreqDef              = Unknown<br>CpuFreqGovernors        = Performance,OnDemand,UserSpace<br>CredType                = cred/munge<br>DebugFlags              = Backfill,BackfillMap,NO_CONF_HASH,Priority<br>DefMemPerNode           = UNLIMITED<br>DisableRootJobs         = No<br>EioTimeout              = 60<br>EnforcePartLimits       = NO<br>Epilog                  = (null)<br>EpilogMsgTime           = 2000 usec<br>EpilogSlurmctld         = (null)<br>ExtSensorsType          = ext_sensors/none<br>ExtSensorsFreq          = 0 sec<br>FairShareDampeningFactor = 5<br>FastSchedule            = 0<br>FederationParameters    = (null)<br>FirstJobId              = 1<br>GetEnvTimeout           = 2 sec<br>GresTypes               = gpu<br>GpuFreqDef              = high,memory=high<br>GroupUpdateForce        = 1<br>GroupUpdateTime         = 600 sec<br>HASH_VAL                = Match<br>HealthCheckInterval     = 0 sec<br>HealthCheckNodeState    = ANY<br>HealthCheckProgram      = (null)<br>InactiveLimit           = 30 sec<br>JobAcctGatherFrequency  = 30<br>JobAcctGatherType       = jobacct_gather/linux<br>JobAcctGatherParams     = (null)<br>JobCheckpointDir        = /var/spool/slurm.checkpoint<br>JobCompHost             = hpc<br>JobCompLoc              = /var/log/slurm_jobcomp.log<br>JobCompPort             = 0<br>JobCompType             = jobcomp/none<br>JobCompUser             = root<br>JobContainerType        = job_container/none<br>JobCredentialPrivateKey = (null)<br>JobCredentialPublicCertificate = (null)<br>JobDefaults             = (null)<br>JobFileAppend           = 0<br>JobRequeue              = 1<br>JobSubmitPlugins        = (null)<br>KeepAliveTime           = SYSTEM_DEFAULT<br>KillOnBadExit           = 0<br>KillWait                = 60 sec<br>LaunchParameters        = (null)<br>LaunchType              = launch/slurm<br>Layouts                 =<br>Licenses                = (null)<br>LicensesUsed            = (null)<br>LogTimeFormat           = iso8601_ms<br>MailDomain              = (null)<br>MailProg                = /bin/mail<br>MaxArraySize            = 1001<br>MaxJobCount             = 10000<br>MaxJobId                = 67043328<br>MaxMemPerNode           = UNLIMITED<br>MaxStepCount            = 40000<br>MaxTasksPerNode         = 512<br>MCSPlugin               = mcs/none<br>MCSParameters           = (null)<br>MessageTimeout          = 10 sec<br>MinJobAge               = 300 sec<br>MpiDefault              = none<br>MpiParams               = (null)<br>MsgAggregationParams    = (null)<br>NEXT_JOB_ID             = 305<br>NodeFeaturesPlugins     = (null)<br>OverTimeLimit           = 0 min<br>PluginDir               = /usr/lib64/slurm<br>PlugStackConfig         = /etc/slurm/plugstack.conf<br>PowerParameters         = (null)<br>PowerPlugin             =<br>PreemptMode             = OFF<br>PreemptType             = preempt/none<br>PreemptExemptTime       = 00:00:00<br>PriorityParameters      = (null)<br>PrioritySiteFactorParameters = (null)<br>PrioritySiteFactorPlugin = (null)<br>PriorityDecayHalfLife   = 14-00:00:00<br>PriorityCalcPeriod      = 00:05:00<br>PriorityFavorSmall      = No<br>PriorityFlags           =<br>PriorityMaxAge          = 1-00:00:00<br>PriorityUsageResetPeriod = NONE<br>PriorityType            = priority/multifactor<br>PriorityWeightAge       = 10<br>PriorityWeightAssoc     = 0<br>PriorityWeightFairShare = 10000<br>PriorityWeightJobSize   = 100<br>PriorityWeightPartition = 10000<br>PriorityWeightQOS       = 0<br>PriorityWeightTRES      = cpu=2000,mem=1,gres/gpu=400<br>PrivateData             = none<br>ProctrackType           = proctrack/linuxproc<br>Prolog                  = (null)<br>PrologEpilogTimeout     = 65534<br>PrologSlurmctld         = (null)<br>PrologFlags             = (null)<br>PropagatePrioProcess    = 0<br>PropagateResourceLimits = ALL<br>PropagateResourceLimitsExcept = (null)<br>RebootProgram           = (null)<br>ReconfigFlags           = (null)<br>RequeueExit             = (null)<br>RequeueExitHold         = (null)<br>ResumeFailProgram       = (null)<br>ResumeProgram           = /etc/slurm/resumehost.sh<br>ResumeRate              = 4 nodes/min<br>ResumeTimeout           = 450 sec<br>ResvEpilog              = (null)<br>ResvOverRun             = 0 min<br>ResvProlog              = (null)<br>ReturnToService         = 2<br>RoutePlugin             = route/default<br>SallocDefaultCommand    = (null)<br>SbcastParameters        = (null)<br>SchedulerParameters     = (null)<br>SchedulerTimeSlice      = 30 sec<br>SchedulerType           = sched/backfill<br>SelectType              = select/cons_res<br>SelectTypeParameters    = CR_CORE_MEMORY<br>SlurmUser               = root(0)<br>SlurmctldAddr           = (null)<br>SlurmctldDebug          = info<br>SlurmctldHost[0]        = hpc(10.1.1.1)<br>SlurmctldLogFile        = /var/log/slurm/slurmctld.log<br>SlurmctldPort           = 6817<br>SlurmctldSyslogDebug    = unknown<br>SlurmctldPrimaryOffProg = (null)<br>SlurmctldPrimaryOnProg  = (null)<br>SlurmctldTimeout        = 300 sec<br>SlurmctldParameters     = (null)<br>SlurmdDebug             = info<br>SlurmdLogFile           = /var/log/slurm/slurmd.log<br>SlurmdParameters        = (null)<br>SlurmdPidFile           = /var/run/slurmd.pid<br>SlurmdPort              = 6818<br>SlurmdSpoolDir          = /var/spool/slurmd<br>SlurmdSyslogDebug       = unknown<br>SlurmdTimeout           = 300 sec<br>SlurmdUser              = root(0)<br>SlurmSchedLogFile       = (null)<br>SlurmSchedLogLevel      = 0<br>SlurmctldPidFile        = /var/run/slurmctld.pid<br>SlurmctldPlugstack      = (null)<br>SLURM_CONF              = /etc/slurm/slurm.conf<br>SLURM_VERSION           = 19.05.2<br>SrunEpilog              = (null)<br>SrunPortRange           = 0-0<br>SrunProlog              = (null)<br>StateSaveLocation       = /var/spool/slurm.state<br>SuspendExcNodes         = (null)<br>SuspendExcParts         = (null)<br>SuspendProgram          = /etc/slurm/suspendhost.sh<br>SuspendRate             = 4 nodes/min<br>SuspendTime             = NONE<br>SuspendTimeout          = 45 sec<br>SwitchType              = switch/none<br>TaskEpilog              = (null)<br>TaskPlugin              = task/affinity<br>TaskPluginParam         = (null type)<br>TaskProlog              = (null)<br>TCPTimeout              = 2 sec<br>TmpFS                   = /state/partition1<br>TopologyParam           = (null)<br>TopologyPlugin          = topology/none<br>TrackWCKey              = Yes<br>TreeWidth               = 50<br>UsePam                  = 0<br>UnkillableStepProgram   = (null)<br>UnkillableStepTimeout   = 60 sec<br>VSizeFactor             = 110 percent<br>WaitTime                = 60 sec<br>X11Parameters           = (null)<br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div><div dir="ltr"><div dir="ltr"><font face="tahoma,sans-serif">Regards,<br>Mahmood</font><br><br><br></div></div></div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div><div>
</div>
</div>
</div>

</blockquote></div></div>
</blockquote></div>