I have a cluster of 5 graylog nodes connected with 5 elasticsearch nodes, all with similar size disk , RAM, cpu, network. elastricsearch is version 2.3.4.
The message traffic is about constant, around 30 GB per hour, a bit less during night.
I use 2 Index sets. default and secondary. each index uses 5 shards, and 0 replicas.
It was working well for months, but it looks like something happened few days ago.
I am checking logfiles of both elasticsearch and graylog but nothing relevant found.
The messages are still being processed and stored OK, just I noticed that one(primary) server is getting disk alarms all the time (over 90%), but other nodes have plenty of space under 70 %..
The reason for this disk alarms is that all indexes created in Secondary index set have 5 shards, but all shars are located on primary node.
Interestingly enough, the shards in the Default index are still well distributed over 5 nodes.
First I tried to set the disk allocation, but no change cluster.routing.allocation.disk.watermark.low: "76%" cluster.routing.allocation.disk.watermark.high: "84%"
I also tried to rotate the active write index, it created new one, but also this one had 5 shards on primary node.
I managed to manualy run commnad to move some shards to other nodes. But it is not enogh, as there are more new messages that are reallocate. Any suggestions, what can I check or set to "force" this Secondary index set to create new indexes on different shards ?
[quote="lecko, post:1, topic:100361"] Interestingly enough, the shards in the Default index are still well distributed over 5 nodes. [/quote]
That is how Elasticsearch does allocation, not based on disk space (at least for now).
[quote="lecko, post:1, topic:100361"] what can I check or set to "force" this Secondary index set to create new indexes on different shards ? [/quote]
You mean different hosts? You could use forced allocation.
But perhaps you need to look at this in a different way and reduce your shard count. 5 shards for 30GB is a bit wasteful, I'd look at doing just 2 and then using `_shrink` to reduce the counts for the old ones to a single primary.