I continue to observe an elasticsearch on docker instance that continues to grab more and more off-heap memory until Mesos kills the docker container.
Since there is no data going in with the exception of the .monitor index and searches are pretty quiet on this ES instance, this is really peculiar. Note: the memory heap stays constant at just under 1GB.
wild guess: The ergonomic choice of the JVM for the maximum allowed direct memory is too high.
Unfortunately, it is a bit hard to find out which value the JVM has chosen (you can write a small Java program which reads `sun.misc.VM.maxDirectMemory()` and invoke it in the Docker container).
You can restrict direct memory explicitly e.g. to 2G with `-XX:MaxDirectMemory=2G`. Be aware that you will see more frequent garbage collections if the value you choose is too small. Also, ensure that `-XX:+DisableExplicitGC` is **not** set in your `jvm.options` (it was set by default before Elasticsearch 5.5.2).
Thanks for your response! Um, I believe you mean -XX:MaxDirectMemorySize=2G. :slight_smile: I tried that and it had no effect, with off-heap memory usage growing until Mesos killed the docker container.
yes, I meant that parameter. Did you check what direct memory size the JVM has chosen for your container? I don't know anything about your environment and 2G might be the wrong choice as well; it was just an example how to set it.
Very interesting blog post, thanks for sharing that! BTW, jvm.options...please share what your recommended settings (at least the ones you start out with). It's understood that the Xms and Xmx settings will be different.
After monitoring for 1 hour-ish, it appears that the Thread stack memory is increasing way faster than Internal.
A couple of things: 1. -xss--I have this set to 1 MB (-xss1m). 2. Activity--this is a one-node cluster that has very few reads and even fewer writes.
I wonder if I have long-lived threads that are not being cleaned up due to a dearth of activity and that setting the tread stack size to a relatively high value accounts for more and more off-heap being used? I further wonder if it makes sense to remove -Xssl1m would make a diff. I am gonna try that and see what happens.
Definitely spit-balling here, but hey, what's the worst that can happen? :rofl:
Cool, thanks, makes sense re: thread stack size. My testing on that was not good as the off-heap memory usage actually appeared to accelerate.
A co-worker of mine had his thread_pool.generic.keep_alive set to 30s. I set that, kept the default thread stack size by removing -xss1m and let this run overnight. Interesting. Whereas the total native memory usage went up 521MB, the Thread stack portion only went up 197MB. It appears setting keep_alive to 30s helped and *appears* to point to thread stacks gobbling up page cache.
I restarted with a thread_pool.generic.max set to 100 and I am monitoring.
At this point, ES keeps grabbing more and more off-heap memory until Mesos kills the Docker container. However, as I noted above, the percentage of RAM grabbed by the thread stack is way down from the first configuration I reported (35MB of 226278MB total, or 15%). Internal is 154MB or 68%.