TLDR: I have a centralized syslog server that suddenly stopped receiving syslog message from applications servers. Messages only resumed when I restarted each individual service running on every app server. Why?
Each application server has N different apps that were defined as systemd services. They all log to STDOUT and their ~.service~ file is configured as this:
SyslogFacility=local0
SyslogIdentifier=app_name
On each app server I have rsyslog running and configured to forward to the centralized syslog server:
local0.* action(type="omfwd"
queue.type="LinkedList"
action.resumeRetryCount="-1"
queue.size="10000"
queue.saveonshutdown="on"
target="syslog_server1" Port="514" Protocol="tcp")
This works and has worked so far with no problems.
Today, after what appears to have been an automated apt-get update (its the only thing that coincides with the time) no more messages where logged to the centralized syslog server.
I tcpdump
ed on the syslog server and no messages were incoming from the app servers. Messages where being logged from a different outside network, but just not from the app servers that are in the same vpc. I suspected a network issue, but connectivity was working. I even disabled the firewall temporarily.
I now ran tcpdump in an app server to see if syslog messages were being sent to port 514, but no messages were. Restarting rsyslog on the app servers made no difference.
The only thing that made messages flow again was to restart each individual systemd service on those app servers.
It is worth noting that during all this time, messages were being logged into the local syslog of each app server, they were just not being forwarded to the centralized syslog server.
And this is the confusing part for me: if the messages were in the local syslog, it means that syslog had them at some point, right? And if it did, then restarting rsyslog
might have some effect. But restarting rsyslog
changed nothing.
Restarting the individual app service did have an effect. But then, how does systemd work so that a message a STDOUT message can end up on /var/log/syslog
but then not being forwarded by rsyslog? Does systemd write directly to /var/log/syslog
and at the same time to rsyslog proper and that's why a restart was needed? And why was the restart needed? What was freed/reset after the restart?