Since I updated from 1.7.3 to 1.7.4, I've had runaway memory consumption from the influxdb process, steadily growing to consume all available memory until restarted. This seems to be new behavior, but I don't see any other threads discussing this. See annotated screenshot from my system monitoring below. (Restarts free up all locked memory -- the chart doesn't show this because of datapoint decimation.)
Data source is Icinga 2 performance data.
Any similar observations, or pointers on options to debug?
We are seeing a similar pattern after updating to 1.7.4. Memory consumption has been ramping up, causing the host to run out of memory periodically. Unfortunately we are not sure how to debug of circumvent this. Any hints are very welcome!
Thanks Marc. Someone else commented and then deleted with a similar use case; I don't know if they resolved their problem. I wonder if an Icinga2 update (concurrent with the InfluxDB update) is now submitting data in a way that is problematic, but I can't come up with a likely scenario.
I've been serially updating since 1.0.2 (or possibly earler), so TSI did not yet exist. I'll switch and see if that helps, but I think this is still indicative of a 1.7.4 bug because my dataset is time-limited.) Thanks for the suggestion, though -- I'm sure it will help!
Nothing else using an unusual amount of memory. The largest is Icinga2. Beyond that there's small things to support monitoring and visualization: mysql (for icinga's config), apache, saslauthd, postfix, grafana.
The culprit is clearly influxd -- it grows slowly from about 6% of memory to 60-70%. I haven't let the OOM-killer get it yet because I get alerting on low memory, and I've added a cron job that just restarts influxd every 6 hours (which has solved the problem for very low values of "solved").
Neither influx or icinga logs say much interesting. Influx is doing detailed httpd logging now -- I'll turn that off.
A few years ago I ran into a problem where a version of Icinga2 wouldn't reconnect after losing an SSL connection to influxd, queue up data, and eventually blow up. That's not the case here -- influxd is the big process, not icinga2.
*sigh* This may yet be Icinga2 related. They just released 2.10.4 today with a changelog entry of "Fix TLS connections in Influxdb/Elasticsearch features leaking file descriptors (#6989 #7018 ref/IP/12219)". I'll report back if this resolves problems.