[slurm-users] About x11 support

Mahmood Naderan mahmood.nt at gmail.com
Sat Nov 17 10:24:08 MST 2018


 >What does this command say?
>scontrol show config | fgrep PrologFlags

[root at rocks7 ~]#  scontrol show config | fgrep PrologFlags
PrologFlags             = Alloc,Contain,X11

That means x11 has been compiled in the code (while Werner created the
roll).




>Check your slurmd logs on the compute node.  What errors are there?

In one terminal, I run the following command

[mahmood at rocks7 ~]$ srun --nodelist=compute-0-5 -n 1 -c 6 --mem=8G -A y8 -p
RUBY xclock
Error: Can't open display :1
srun: error: compute-0-5: task 0: Exited with exit code 1

At the same time, in another terminal I see this

[root at compute-0-5 ~]# tail -f /var/log/slurm/slurmd.log
[2018-11-17T20:47:23.017] _run_prolog: run job script took usec=4
[2018-11-17T20:47:23.017] _run_prolog: prolog with lock for job 1580 ran
for 0 seconds
[2018-11-17T20:47:23.131] launch task 1580.0 request from UID:1000 GID:1000
HOST:10.1.1.1 PORT:54950
[2018-11-17T20:47:23.131] lllp_distribution jobid [1580] implicit auto
binding: sockets,one_thread, dist 1
[2018-11-17T20:47:23.131] _task_layout_lllp_cyclic
[2018-11-17T20:47:23.131] _lllp_generate_cpu_bind jobid [1580]:
mask_cpu,one_thread, 0x00000070000007
[2018-11-17T20:47:23.204] [1580.0] task_p_pre_launch: Using sched_affinity
for tasks
[2018-11-17T20:47:23.231] [1580.0] done with job
[2018-11-17T20:47:23.263] [1580.extern] done with job
^C



Also, at the same time, I see this in the frontend log

[root at rocks7 ~]# tail -f /var/log/slurm/slurmctld.log
[2018-11-17T20:52:10.908] Fairhare priority of job 1582 for user mahmood in
acct y8 is 0.242424
[2018-11-17T20:52:10.908] Weighted Age priority is 0.000000 * 10 = 0.00
[2018-11-17T20:52:10.908] Weighted Fairshare priority is 0.242424 * 10000 =
2424.24
[2018-11-17T20:52:10.908] Weighted JobSize priority is 0.097756 * 100 = 9.78
[2018-11-17T20:52:10.908] Weighted Partition priority is 0.001000 * 10000 =
10.00
[2018-11-17T20:52:10.908] Weighted QOS priority is 0.000000 * 0 = 0.00
[2018-11-17T20:52:10.908] Weighted TRES:cpu is 0.041667 * 2000.00 = 83.33
[2018-11-17T20:52:10.908] Weighted TRES:mem is 0.031884 * 1.00 = 0.03
[2018-11-17T20:52:10.908] Job 1582 priority: 0.00 + 2424.24 + 9.78 + 10.00
+ 0.00 + 83 - 0 = 2527.38
[2018-11-17T20:52:10.909] BillingWeight: JobId=1582 is either new or it was
resized
[2018-11-17T20:52:10.909] sched: _slurm_rpc_allocate_resources JobId=1582
NodeList=compute-0-5 usec=977
[2018-11-17T20:52:11.123] _job_complete: JobId=1582 WEXITSTATUS 1
[2018-11-17T20:52:11.123] priority_p_job_end: called for job 1582
[2018-11-17T20:52:11.123] job 1582 ran for 1 seconds with TRES counts of
[2018-11-17T20:52:11.123] TRES cpu: 6
[2018-11-17T20:52:11.123] TRES mem: 8192
[2018-11-17T20:52:11.123] TRES node: 1
[2018-11-17T20:52:11.123] TRES billing: 6
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed
15552000 unused seconds from QOS normal TRES cpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed
21233664000 unused seconds from QOS normal TRES mem grp_used_tres_run_secs
= 0
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed
2592000 unused seconds from QOS normal TRES node grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed
15552000 unused seconds from QOS normal TRES billing grp_used_tres_run_secs
= 0
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0
unused seconds from QOS normal TRES fs/disk grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0
unused seconds from QOS normal TRES vmem grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0
unused seconds from QOS normal TRES pages grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0
unused seconds from QOS normal TRES gres/gpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] Adding 5.999997 new usage to assoc 42
(y8/mahmood/ruby) raw usage is now 437603.824918.  Group wall added
0.999999 making it 72831.944878.
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
15552000 unused seconds from assoc 42 TRES cpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
21233664000 unused seconds from assoc 42 TRES mem grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
2592000 unused seconds from assoc 42 TRES node grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
15552000 unused seconds from assoc 42 TRES billing grp_used_tres_run_secs =
0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 42 TRES fs/disk grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 42 TRES vmem grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 42 TRES pages grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 42 TRES gres/gpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] Adding 5.999997 new usage to assoc 41
(y8/(null)/(null)) raw usage is now 28311279.361228.  Group wall added
0.999999 making it 1466496.669595.
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
15552000 unused seconds from assoc 41 TRES cpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
21233664000 unused seconds from assoc 41 TRES mem grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
2592000 unused seconds from assoc 41 TRES node grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed
15552000 unused seconds from assoc 41 TRES billing grp_used_tres_run_secs =
0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 41 TRES fs/disk grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 41 TRES vmem grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 41 TRES pages grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 41 TRES gres/gpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] Adding 5.999997 new usage to assoc 1
(root/(null)/(null)) raw usage is now 107651994.109022.  Group wall added
0.999999 making it 4989938.597661.
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed
15552000 unused seconds from assoc 1 TRES cpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed
21233664000 unused seconds from assoc 1 TRES mem grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed
2592000 unused seconds from assoc 1 TRES node grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed
15552000 unused seconds from assoc 1 TRES billing grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 1 TRES fs/disk grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 1 TRES vmem grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 1 TRES pages grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed 0
unused seconds from assoc 1 TRES gres/gpu grp_used_tres_run_secs = 0
[2018-11-17T20:52:11.124] _job_complete: JobId=1582 done





All those happened with the following two entries in slurm.conf

PrologFlags=x11
X11Parameters=local_xauthority



Regards,
Mahmood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181117/2d6a2f35/attachment.html>


More information about the slurm-users mailing list