[slurm-users] Getting current memory size of a job

Mahmood Naderan mahmood.nt at gmail.com
Sun Apr 7 17:13:49 UTC 2019


Hi again and sorry for the delay....



>When I was at Swinburne we asked for this as an enhancement here:
>https://bugs.schedmd.com/show_bug.cgi?id=4966
<https://bugs.schedmd.com/show_bug.cgi?id=4966>

The output of sstat shows the following error

# squeue -j 821
             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
               821   EMERALD g09-test shakerza  R   21:07:18      1
compute-0-0
# sstat -j 821 --format="MaxVMSize,AveRSS,AveVMSize,MaxRSS"
 MaxVMSize     AveRSS  AveVMSize     MaxRSS
---------- ---------- ---------- ----------
sstat: error: no steps running for job 821



> The /proc/<pid>/cgroup file indicates to which cgroups a process is
assigned, so:
While the job is running, I ssh to node and see

[root at compute-0-0 11220]# cat cgroup
11:hugetlb:/
10:devices:/system.slice/slurmd.service
9:cpuset:/
8:freezer:/
7:cpuacct,cpu:/system.slice/slurmd.service
6:net_prio,net_cls:/
5:blkio:/system.slice/slurmd.service
4:memory:/system.slice/slurmd.service
3:pids:/
2:perf_event:/
1:name=systemd:/system.slice/slurmd.service

But I didn't understand the rest of the message. Can you explain more?



>We use a simple web interface <https://github.com/shawarden/simple-web>which
is ok for our small cluster.
I moved the files to the following paths and restarted httpd

[root at rocks7 var]# ls -l www/html/simple-web/
total 44
-rwxrwxr-x 1 mahmood mahmood 20729 Apr  7 20:52 code.js
-rwxrwxr-x 1 mahmood mahmood  1406 Apr  7 20:52 favicon.ico
-rwxrwxr-x 1 mahmood mahmood   911 Apr  7 20:52 index.html
-rwxrwxr-x 1 mahmood mahmood  1557 Apr  7 20:52 settings.js
-rwxrwxr-x 1 mahmood mahmood  1322 Apr  7 20:52 style.css
-rwxrwxr-x 1 mahmood mahmood    72 Apr  7 20:52 userlist.txt
[root at rocks7 var]# ls -l bin/
total 44
-rwxrwxr-x 1 mahmood mahmood  2058 Apr  7 20:52 myfuncs.py
-rwxrwxr-x 1 mahmood mahmood  1462 Apr  7 20:52 settings.py
-rwxrwxr-x 1 mahmood mahmood  3084 Apr  7 20:52 slurm_cluster_stats.py
-rwxrwxr-x 1 mahmood mahmood  1635 Apr  7 20:52 slurm_pending_tasks.py
-rwxrwxr-x 1 mahmood mahmood   146 Apr  7 20:52 slurm_report_usage_from.sh
-rwxrwxr-x 1 mahmood mahmood   743 Apr  7 20:52
slurm_report_usagepercent_from.sh
-rwxrwxr-x 1 mahmood mahmood    95 Apr  7 20:52 slurm_report_years_from.sh
-rwxrwxr-x 1 mahmood mahmood 12617 Apr  7 20:52 slurm_task_tracker.py

However, by entering 10.1.1.1/simple-web a black screen is shown.

Regards,
Mahmood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190407/9b6c03f9/attachment.html>


More information about the slurm-users mailing list