Score:4

Influxdb is restarting constantly since my last reboot

kz flag

Since my last reboot, I am seeing the following every 1-2 minutes:

Aug 02 13:53:00 monitor systemd[1]: influxdb.service: start operation timed out. Terminating.
Aug 02 13:53:00 monitor systemd[1]: influxdb.service: Failed with result 'timeout'.
Aug 02 13:53:00 monitor systemd[1]: Failed to start InfluxDB is an open-source, distributed, time series database.
Aug 02 13:53:00 monitor systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 4.
Aug 02 13:53:00 monitor systemd[1]: Stopped InfluxDB is an open-source, distributed, time series database.
Aug 02 13:53:00 monitor systemd[1]: Starting InfluxDB is an open-source, distributed, time series database...
Aug 02 13:53:00 monitor influxd-systemd-start.sh[3539]: Merging with configuration at: /etc/influxdb/influxdb.conf

on 29/07/2021 influx was updated from 1.8.6-1 to 1.8.7-1. The OS is Ubuntu 20.04 server. The first reboot after this is when the issues started.
Initially there was a permissions issue with /usr/lib/influxdb/scripts/influxd-systemd-start.sh, which prevented it starting. I changed the perms to 0755 and it started, but keeps restarting. It seems that it is accepting connections and data between the restarts, as telegraf is still populating the database, and Grafana is able to display the stats, so long as it doesn't coincide with the restart.

I am also seeing the message

influxd-systemd-start.sh[12171]: [tcp] 2021/08/02 14:21:40 tcp.Mux: Listener at 127.0.0.1:8088 failed failed to accept a connection, closing all listeners

It is listening on those ports

root@monitor$ ss -ilpn | grep influx
tcp     LISTEN   0        4096                                        127.0.0.1:8088                                              0.0.0.0:*                      users:(("influxd",pid=15115,fd=3))
tcp     LISTEN   0        4096                                                *:8086                                                    *:*                      users:(("influxd",pid=15115,fd=32))

As far as I am aware there have been no config changed. There is no firewall rules active.

Anybody have any idea why it started misbehaving?

digijay avatar
mx flag
Do you maybe get a hint when you do `sudo service influxdb status`?
SlyOne avatar
kz flag
It looks like it is constantly trying to start, but not detecting the fact that it has started. The ```systemctl status influxdb``` show that it is activating or inactive, despite the fact that it is running and receiving and serving data between automatic restarts.
Score:3
cf flag

This is a bug introduced in Influxdb v1.8.7. Github Issue.

There's a variety of ways of fixing this, your solution being one of the ways. In our case Influx took a bit longer to startup than the 10 second window the startup script allows, so I simply changed the line sleep 1 in the file /usr/lib/influxdb/scripts/influxd-systemd-start.sh to sleep 2 to give Influx more time to startup.

MikeKulls avatar
bl flag
This doesn't work. It still times out after the same amount of time, this just reduces how often it checks. I changed the sleep to 60s and it didn't help at all
Score:3
kz flag

It looks like /usr/lib/influxdb/scripts/influxd-systemd-start.sh is trying to do a health check:

 while [ "$result" != "200" ]; do
   sleep 1
   result=$(curl -s -o /dev/null http://$HOST:$PORT/health -w %{http_code})
 done

this is failing. From the file date, the start wrapper was only created on 21 July, so it looks like the start check is new.

If I manually try I get:

root@monitor$ curl https://127.0.0.1:8088/health
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to 127.0.0.1:8088 

It fails for several reasons.

  1. Because I have configured TLS it needs to be https
  2. Because I have not explicitly defined the Bind Port, because I am using the default, the script gets the wrong port.
  3. because TLS is enabled, it needs the FQDN, not localhost or the cert validation check fails.
  4. the perms were also wrong on the default startup script

To resolve it I edited the /lib/systemd/system/influxdb.service file and

  1. change Type=forking to Type=simple
  2. change ExecStart to : ExecStart=/usr/bin/influxd -config /etc/influxdb/influxdb.conf --pidfile /var/lib/influxdb/influxd.pid $INFLUXD_OPTS
Ginnungagap avatar
gu flag
FFS, please stop suggesting editing files in /lib, `systemctl edit influxdb.service` will allow you to override settings just as well, won't mess with package manager managed files, and will survive upgrades. There is not a single valid reason to edit files in /lib.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.