log files use many strings to identify job, including not having a jobID in the message
NUMBER=$SLURM_JOBID
egrep "\.\<$NUMBER\>\] |\<$NUMBER\>\.batch|jobid \<$NUMBER\>|JObId=\<$NUMBER\>|job id \<$NUMBER\>|job\.\<$NUMBER\>|job \<$NUMBER\>|jobid \[\<$NUMBER\>\]|task_p_slurmd_batch_request: \<$NUMBER\>" /var/log/slurm*
Even that misses cruciall data that does not even contain the jobid
[2024-02-03T11:50:33.052] _get_user_env: get env for user jsu here
[2024-02-03T11:52:33.152] timeout waiting for /bin/su to complete
[2024-02-03T11:52:34.152] error: Failed to load current user environment variables
[2024-02-03T11:52:34.153] error: _get_user_env: Unable to get user's local environment, running only with passed environment
It would be very useful if all messages related to a job had a consistent string in them for grepping the log files;
even better might be a command like "scontrol show jobid=NNNN log_messages
But I could not find what I wanted (an easy way to find all daemon log messages related to a specific job). I would find it particularly useful if there were a way to automatically append such information to the stdout of the job at job termination so users would automatically get information about job failures or warnings.
Is there such a feature available I have missed?
Sent with [Proton Mail](https://proton.me/) secure email.