[slurm-users] An observation on SLURM's logging

Mon Dec 3 00:41:03 MST 2018

On 27/11/2018 12.18, Kevin Buckley wrote:
> Here are our usual Slurm log-related config settings
> 
> (where we've increased the default "Syslog" level as we're trying
>   to get the Crays to funnel the slurmd log messages all the way up
>   to the machine that syslog messages go to on Crays)
> 
> 
> SlurmctldDebug=info
> SlurmctldLogFile=/var/spool/slurm/slurmctld.log
> SlurmctldSyslogDebug=info
> SlurmdDebug=info
> SlurmdLogFile=/var/spool/slurmd/%h.log
> SlurmdSyslogDebug=info
> 
> 
> and here are the "logged" outputs to both, the systemd journal,
> and slurmd's dedicated log file, channels, for a particular job:
> 
> 
> SYSTEMD JOURNAL
> 
> 
> Nov 27 16:57:32 nid00024 slurmd[18994]: task_p_slurmd_batch_request: 796
> Nov 27 16:57:32 nid00024 slurmd[18994]: task/affinity: job 796 CPU input 
> mask for node: 0x00FF
> Nov 27 16:57:32 nid00024 slurmd[18994]: task/affinity: job 796 CPU final 
> HW mask for node: 0x0F0F
> Nov 27 16:57:32 nid00024 slurmd[18994]: _run_prolog: run job script took 
> usec=10517
> Nov 27 16:57:32 nid00024 slurmd[18994]: _run_prolog: prolog with lock 
> for job 796 ran for 0 seconds
> Nov 27 16:57:33 nid00024 slurmd[18994]: Launching batch job 796 for UID 
> 20480
> Nov 27 16:57:33 nid00024 slurmd[18994]: launch task 796.0 request from 
> 20480.20480 at 10.128.0.25 (port 25773)
> Nov 27 16:57:33 nid00024 slurmd[18994]: lllp_distribution jobid [796] 
> auto binding off: mask_cpu,one_thread
> Nov 27 16:57:34 nid00024 slurmd[18994]: launch task 796.1 request from 
> 20480.20480 at 10.128.0.25 (port 30381)
> Nov 27 16:57:34 nid00024 slurmd[18994]: lllp_distribution jobid [796] 
> auto binding off: mask_cpu,one_thread
> 
> 
> SLURMD LOG FILE
> 
> 
> [2018-11-27T16:57:32.850] task_p_slurmd_batch_request: 796
> [2018-11-27T16:57:32.850] task/affinity: job 796 CPU input mask for 
> node: 0x00FF
> [2018-11-27T16:57:32.850] task/affinity: job 796 CPU final HW mask for 
> node: 0x0F0F
> [2018-11-27T16:57:32.862] _run_prolog: run job script took usec=10517
> [2018-11-27T16:57:32.862] _run_prolog: prolog with lock for job 796 ran 
> for 0 seconds
> [2018-11-27T16:57:32.874] [796.extern] core_spec/cray: init
> [2018-11-27T16:57:32.898] [796.extern] task/cgroup: 
> /slurm/uid_20480/job_796: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> [2018-11-27T16:57:32.898] [796.extern] task/cgroup: 
> /slurm/uid_20480/job_796/step_extern: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> [2018-11-27T16:57:33.666] Launching batch job 796 for UID 20480
> [2018-11-27T16:57:33.680] [796.batch] core_spec/cray: init
> [2018-11-27T16:57:33.690] [796.batch] task/cgroup: 
> /slurm/uid_20480/job_796: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> [2018-11-27T16:57:33.690] [796.batch] task/cgroup: 
> /slurm/uid_20480/job_796/step_batch: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> [2018-11-27T16:57:33.696] [796.batch] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:33.871] launch task 796.0 request from 
> 20480.20480 at 10.128.0.25 (port 25773)
> [2018-11-27T16:57:33.871] lllp_distribution jobid [796] auto binding 
> off: mask_cpu,one_thread
> [2018-11-27T16:57:33.882] [796.0] core_spec/cray: init
> [2018-11-27T16:57:33.918] [796.0] (switch_cray.c: 654: 
> switch_p_job_init) gres_cnt: 0 1
> [2018-11-27T16:57:33.934] [796.0] task/cgroup: /slurm/uid_20480/job_796: 
> alloc=28000MB mem.limit=26600MB memsw.limit=26600MB
> [2018-11-27T16:57:33.934] [796.0] task/cgroup: 
> /slurm/uid_20480/job_796/step_0: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> [2018-11-27T16:57:33.940] [796.0] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:33.940] [796.0] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:33.940] [796.0] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:33.940] [796.0] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:34.580] [796.0] done with job
> [2018-11-27T16:57:34.940] launch task 796.1 request from 
> 20480.20480 at 10.128.0.25 (port 30381)
> [2018-11-27T16:57:34.941] lllp_distribution jobid [796] auto binding 
> off: mask_cpu,one_thread
> [2018-11-27T16:57:34.951] [796.1] core_spec/cray: init
> [2018-11-27T16:57:34.994] [796.1] (switch_cray.c: 654: 
> switch_p_job_init) gres_cnt: 0 1
> [2018-11-27T16:57:35.006] [796.1] task/cgroup: /slurm/uid_20480/job_796: 
> alloc=28000MB mem.limit=26600MB memsw.limit=26600MB
> [2018-11-27T16:57:35.006] [796.1] task/cgroup: 
> /slurm/uid_20480/job_796/step_1: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> [2018-11-27T16:57:35.012] [796.1] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:35.012] [796.1] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:35.012] [796.1] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:35.013] [796.1] task_p_pre_launch: Using 
> sched_affinity for tasks
> [2018-11-27T16:57:46.460] [796.1] done with job
> [2018-11-27T16:57:46.602] [796.batch] sending 
> REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0
> [2018-11-27T16:57:46.608] [796.batch] done with job
> [2018-11-27T16:57:46.760] [796.extern] done with job
> 
> 
> where we see that SOME, but not ALL, of the log messages appear on both 
> of the channels.
> 
> 
> 
> Now let's comment out the use of a dedicated slurmd log file and run 
> exactly
> the same job,
> 
> FWIW:
> 
> SlurmctldDebug=info
> SlurmctldLogFile=/var/spool/slurm/slurmctld.log
> SlurmctldSyslogDebug=info
> SlurmdDebug=info
> #lurmdLogFile=/var/spool/slurmd/%h.log
> SlurmdSyslogDebug=info
> 
> 
> SYSTEMD JOURNAL
> 
> 
> Nov 27 16:51:51 nid00024 slurmd[16415]: task_p_slurmd_batch_request: 794
> Nov 27 16:51:51 nid00024 slurmd[16415]: task/affinity: job 794 CPU input 
> mask for node: 0x00FF
> Nov 27 16:51:51 nid00024 slurmd[16415]: task/affinity: job 794 CPU final 
> HW mask for node: 0x0F0F
> Nov 27 16:51:51 nid00024 slurmd[16415]: _run_prolog: run job script took 
> usec=10472
> Nov 27 16:51:51 nid00024 slurmd[16415]: _run_prolog: prolog with lock 
> for job 794 ran for 0 seconds
> Nov 27 16:51:51 nid00024 slurmstepd[18143]: core_spec/cray: init
> Nov 27 16:51:51 nid00024 slurmstepd[18143]: task/cgroup: 
> /slurm/uid_20480/job_794: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:51 nid00024 slurmstepd[18143]: task/cgroup: 
> /slurm/uid_20480/job_794/step_extern: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:52 nid00024 slurmd[16415]: Launching batch job 794 for UID 
> 20480
> Nov 27 16:51:52 nid00024 slurmstepd[18411]: core_spec/cray: init
> Nov 27 16:51:52 nid00024 slurmstepd[18411]: task/cgroup: 
> /slurm/uid_20480/job_794: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:52 nid00024 slurmstepd[18411]: task/cgroup: 
> /slurm/uid_20480/job_794/step_batch: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:52 nid00024 slurmstepd[18419]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:52 nid00024 slurmd[16415]: launch task 794.0 request from 
> 20480.20480 at 10.128.0.25 (port 13485)
> Nov 27 16:51:52 nid00024 slurmd[16415]: lllp_distribution jobid [794] 
> auto binding off: mask_cpu,one_thread
> Nov 27 16:51:52 nid00024 slurmstepd[18520]: core_spec/cray: init
> Nov 27 16:51:52 nid00024 slurmstepd[18520]: (switch_cray.c: 654: 
> switch_p_job_init) gres_cnt: 0 1
> Nov 27 16:51:52 nid00024 slurmstepd[18520]: task/cgroup: 
> /slurm/uid_20480/job_794: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:52 nid00024 slurmstepd[18520]: task/cgroup: 
> /slurm/uid_20480/job_794/step_0: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:52 nid00024 slurmstepd[18527]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:52 nid00024 slurmstepd[18529]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:52 nid00024 slurmstepd[18528]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:52 nid00024 slurmstepd[18530]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:53 nid00024 slurmstepd[18520]: done with job
> Nov 27 16:51:53 nid00024 slurmd[16415]: launch task 794.1 request from 
> 20480.20480 at 10.128.0.25 (port 18093)
> Nov 27 16:51:53 nid00024 slurmd[16415]: lllp_distribution jobid [794] 
> auto binding off: mask_cpu,one_thread
> Nov 27 16:51:53 nid00024 slurmstepd[18551]: core_spec/cray: init
> Nov 27 16:51:53 nid00024 slurmstepd[18551]: (switch_cray.c: 654: 
> switch_p_job_init) gres_cnt: 0 1
> Nov 27 16:51:53 nid00024 slurmstepd[18551]: task/cgroup: 
> /slurm/uid_20480/job_794: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:53 nid00024 slurmstepd[18551]: task/cgroup: 
> /slurm/uid_20480/job_794/step_1: alloc=28000MB mem.limit=26600MB 
> memsw.limit=26600MB
> Nov 27 16:51:53 nid00024 slurmstepd[18558]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:53 nid00024 slurmstepd[18559]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:53 nid00024 slurmstepd[18560]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:51:53 nid00024 slurmstepd[18561]: task_p_pre_launch: Using 
> sched_affinity for tasks
> Nov 27 16:52:04 nid00024 slurmstepd[18551]: done with job
> Nov 27 16:52:05 nid00024 slurmstepd[18411]: sending 
> REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0
> Nov 27 16:52:05 nid00024 slurmstepd[18411]: done with job
> Nov 27 16:52:05 nid00024 slurmstepd[18143]: done with job
> 
> 
> SLURMD LOG FILE
> 
> 
> Well, nothing: obviously.
> 
> 
> The thing to note here is that log messages that ONLY appeared in the
> dedicated log file, when there was one, now show up in the systemd
> journal (and of course, have that nice "slurmstepd" tag that makes
> identifying them much easier) along with the log messages that ALWAYS
> appear on BOTH channels.
> 
> 
> 
> So, whilst it seems to be the case that,
> 
>     you will get SOME of the messages on ALL of the channels, ALL of the 
> time
> 
> and even though it seems that,
> 
>     you can get ALL of the messages on SOME of the channels, SOME of the 
> time
> 
> sadly it would appear that,
> 
>    you can't get ALL of the messages on ALL of the channels, ALL of the 
> time
> 
> 
> 
> Am I a fool for thinking that you should be able to ?

We do a variation of your second approach, that is, leave the *LogFile 
config file options unset, and then we configure rsyslog to log slurm 
stuff to both the local /var/log as well as forward it to the central 
syslog server.

To some extent I think what you're seeing is the result of the slurm 
logging system being a bit ad hoc, with various options heaped upon each 
other with little concern for how they should interact. And in 
particular, logging to a file is just reinventing the syslog wheel 
badly. IMHO, daemons, in general, should offer logging

1) to stderr, for debugging purposes when running in the foreground.

2) to syslog when running daemonized.

PS.: And if one doesn't care about non-systemd users, one can drop 
option #2 and let systemd forward stderr to syslog.

PS2.: If one needs some kind of more structured and/or binary logging, 
then yes, syslog doesn't cut it.

-- 
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqvist at aalto.fi