<div dir="ltr"><div dir="ltr"><div style="font-family:tahoma,sans-serif" class="gmail_default">
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">>This line is probably what is limiting you to around 40gb.</span></p>
<p class="MsoNormal"><span style="font-family:"Tahoma",sans-serif">>#SBATCH --mem=38GB</span></p>
</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Yes. If I change that value, the "ulimit -v" also changes. See below</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">[shams@hpc ~]$ cat slurm_blast.sh | grep mem<br>#SBATCH --mem=50GB<br>[shams@hpc ~]$ cat my_blast.log<br>virtual memory (kbytes, -v) 57671680<br>/var/spool/slurmd/job00306/slurm_script: line 13: ulimit: virtual memory: cannot modify limit: Operation not permitted<br>virtual memory (kbytes, -v) 57671680<br>Error memory mapping:/home/shams/ncbi-blast-2.9.0+/bin/nr.69.psq openedFilesCount=168 threadID=0<br>Error: NCBI C++ Exception:</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">However, the solution is not to change that parameter. There are two issue with that:</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">1) --mem belongs to the physical memory which is requested by job and is later reserved for the job by slurm.</div><div class="gmail_default" style="font-family:tahoma,sans-serif">So, on a 64GB node, if a user requests --mem=50GB, actually no one else can run a job with 10GB memory need.</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">2) The virtual size of the program (according) to the top is about 140GB. So, if I set --mem=140GB, the job stuck in the queue because requested information is invalid (node has 64GB of memory).</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">I really think there is a problem with slurm but can not find the root of the problem. The slurm config parameters are</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Configuration data as of 2020-01-28T08:04:55<br>AccountingStorageBackupHost = (null)<br>AccountingStorageEnforce = associations,limits,qos,safe,wckeys<br>AccountingStorageHost = hpc<br>AccountingStorageLoc = N/A<br>AccountingStoragePort = 6819<br>AccountingStorageTRES = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu<br>AccountingStorageType = accounting_storage/slurmdbd<br>AccountingStorageUser = N/A<br>AccountingStoreJobComment = Yes<br>AcctGatherEnergyType = acct_gather_energy/none<br>AcctGatherFilesystemType = acct_gather_filesystem/none<br>AcctGatherInterconnectType = acct_gather_interconnect/none<br>AcctGatherNodeFreq = 0 sec<br>AcctGatherProfileType = acct_gather_profile/none<br>AllowSpecResourcesUsage = 0<br>AuthAltTypes = (null)<br>AuthInfo = (null)<br>AuthType = auth/munge<br>BatchStartTimeout = 10 sec<br>BOOT_TIME = 2020-01-27T09:53:58<br>BurstBufferType = (null)<br>CheckpointType = checkpoint/none<br>CliFilterPlugins = (null)<br>ClusterName = jupiter<br>CommunicationParameters = (null)<br>CompleteWait = 0 sec<br>CoreSpecPlugin = core_spec/none<br>CpuFreqDef = Unknown<br>CpuFreqGovernors = Performance,OnDemand,UserSpace<br>CredType = cred/munge<br>DebugFlags = Backfill,BackfillMap,NO_CONF_HASH,Priority<br>DefMemPerNode = UNLIMITED<br>DisableRootJobs = No<br>EioTimeout = 60<br>EnforcePartLimits = NO<br>Epilog = (null)<br>EpilogMsgTime = 2000 usec<br>EpilogSlurmctld = (null)<br>ExtSensorsType = ext_sensors/none<br>ExtSensorsFreq = 0 sec<br>FairShareDampeningFactor = 5<br>FastSchedule = 0<br>FederationParameters = (null)<br>FirstJobId = 1<br>GetEnvTimeout = 2 sec<br>GresTypes = gpu<br>GpuFreqDef = high,memory=high<br>GroupUpdateForce = 1<br>GroupUpdateTime = 600 sec<br>HASH_VAL = Match<br>HealthCheckInterval = 0 sec<br>HealthCheckNodeState = ANY<br>HealthCheckProgram = (null)<br>InactiveLimit = 30 sec<br>JobAcctGatherFrequency = 30<br>JobAcctGatherType = jobacct_gather/linux<br>JobAcctGatherParams = (null)<br>JobCheckpointDir = /var/spool/slurm.checkpoint<br>JobCompHost = hpc<br>JobCompLoc = /var/log/slurm_jobcomp.log<br>JobCompPort = 0<br>JobCompType = jobcomp/none<br>JobCompUser = root<br>JobContainerType = job_container/none<br>JobCredentialPrivateKey = (null)<br>JobCredentialPublicCertificate = (null)<br>JobDefaults = (null)<br>JobFileAppend = 0<br>JobRequeue = 1<br>JobSubmitPlugins = (null)<br>KeepAliveTime = SYSTEM_DEFAULT<br>KillOnBadExit = 0<br>KillWait = 60 sec<br>LaunchParameters = (null)<br>LaunchType = launch/slurm<br>Layouts =<br>Licenses = (null)<br>LicensesUsed = (null)<br>LogTimeFormat = iso8601_ms<br>MailDomain = (null)<br>MailProg = /bin/mail<br>MaxArraySize = 1001<br>MaxJobCount = 10000<br>MaxJobId = 67043328<br>MaxMemPerNode = UNLIMITED<br>MaxStepCount = 40000<br>MaxTasksPerNode = 512<br>MCSPlugin = mcs/none<br>MCSParameters = (null)<br>MessageTimeout = 10 sec<br>MinJobAge = 300 sec<br>MpiDefault = none<br>MpiParams = (null)<br>MsgAggregationParams = (null)<br>NEXT_JOB_ID = 305<br>NodeFeaturesPlugins = (null)<br>OverTimeLimit = 0 min<br>PluginDir = /usr/lib64/slurm<br>PlugStackConfig = /etc/slurm/plugstack.conf<br>PowerParameters = (null)<br>PowerPlugin =<br>PreemptMode = OFF<br>PreemptType = preempt/none<br>PreemptExemptTime = 00:00:00<br>PriorityParameters = (null)<br>PrioritySiteFactorParameters = (null)<br>PrioritySiteFactorPlugin = (null)<br>PriorityDecayHalfLife = 14-00:00:00<br>PriorityCalcPeriod = 00:05:00<br>PriorityFavorSmall = No<br>PriorityFlags =<br>PriorityMaxAge = 1-00:00:00<br>PriorityUsageResetPeriod = NONE<br>PriorityType = priority/multifactor<br>PriorityWeightAge = 10<br>PriorityWeightAssoc = 0<br>PriorityWeightFairShare = 10000<br>PriorityWeightJobSize = 100<br>PriorityWeightPartition = 10000<br>PriorityWeightQOS = 0<br>PriorityWeightTRES = cpu=2000,mem=1,gres/gpu=400<br>PrivateData = none<br>ProctrackType = proctrack/linuxproc<br>Prolog = (null)<br>PrologEpilogTimeout = 65534<br>PrologSlurmctld = (null)<br>PrologFlags = (null)<br>PropagatePrioProcess = 0<br>PropagateResourceLimits = ALL<br>PropagateResourceLimitsExcept = (null)<br>RebootProgram = (null)<br>ReconfigFlags = (null)<br>RequeueExit = (null)<br>RequeueExitHold = (null)<br>ResumeFailProgram = (null)<br>ResumeProgram = /etc/slurm/resumehost.sh<br>ResumeRate = 4 nodes/min<br>ResumeTimeout = 450 sec<br>ResvEpilog = (null)<br>ResvOverRun = 0 min<br>ResvProlog = (null)<br>ReturnToService = 2<br>RoutePlugin = route/default<br>SallocDefaultCommand = (null)<br>SbcastParameters = (null)<br>SchedulerParameters = (null)<br>SchedulerTimeSlice = 30 sec<br>SchedulerType = sched/backfill<br>SelectType = select/cons_res<br>SelectTypeParameters = CR_CORE_MEMORY<br>SlurmUser = root(0)<br>SlurmctldAddr = (null)<br>SlurmctldDebug = info<br>SlurmctldHost[0] = hpc(10.1.1.1)<br>SlurmctldLogFile = /var/log/slurm/slurmctld.log<br>SlurmctldPort = 6817<br>SlurmctldSyslogDebug = unknown<br>SlurmctldPrimaryOffProg = (null)<br>SlurmctldPrimaryOnProg = (null)<br>SlurmctldTimeout = 300 sec<br>SlurmctldParameters = (null)<br>SlurmdDebug = info<br>SlurmdLogFile = /var/log/slurm/slurmd.log<br>SlurmdParameters = (null)<br>SlurmdPidFile = /var/run/slurmd.pid<br>SlurmdPort = 6818<br>SlurmdSpoolDir = /var/spool/slurmd<br>SlurmdSyslogDebug = unknown<br>SlurmdTimeout = 300 sec<br>SlurmdUser = root(0)<br>SlurmSchedLogFile = (null)<br>SlurmSchedLogLevel = 0<br>SlurmctldPidFile = /var/run/slurmctld.pid<br>SlurmctldPlugstack = (null)<br>SLURM_CONF = /etc/slurm/slurm.conf<br>SLURM_VERSION = 19.05.2<br>SrunEpilog = (null)<br>SrunPortRange = 0-0<br>SrunProlog = (null)<br>StateSaveLocation = /var/spool/slurm.state<br>SuspendExcNodes = (null)<br>SuspendExcParts = (null)<br>SuspendProgram = /etc/slurm/suspendhost.sh<br>SuspendRate = 4 nodes/min<br>SuspendTime = NONE<br>SuspendTimeout = 45 sec<br>SwitchType = switch/none<br>TaskEpilog = (null)<br>TaskPlugin = task/affinity<br>TaskPluginParam = (null type)<br>TaskProlog = (null)<br>TCPTimeout = 2 sec<br>TmpFS = /state/partition1<br>TopologyParam = (null)<br>TopologyPlugin = topology/none<br>TrackWCKey = Yes<br>TreeWidth = 50<br>UsePam = 0<br>UnkillableStepProgram = (null)<br>UnkillableStepTimeout = 60 sec<br>VSizeFactor = 110 percent<br>WaitTime = 60 sec<br>X11Parameters = (null)<br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><font face="tahoma,sans-serif">Regards,<br>Mahmood</font><br><br><br></div></div></div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div class="gmail-m_4357538815788791602WordSection1"><div>
</div>
</div>
</div>
</blockquote></div></div>