Skip to content
share library_books

Monitoring Elasticsearch

Integration

Metrics

Metric Name
Key (Type) (Unit)
Description
fieldData max size
es.circuitBreaker.fieldData.size.max
(long gauge) (bytes)
max fieldData size
fieldData estimated size
es.circuitBreaker.fieldData.size.estimate
(long gauge) (bytes)
estimated fieldData size
fieldData overhead
es.circuitBreaker.fieldData.size.overhead
(double gauge)
fieldData overhead
request maximum size
es.circuitBreaker.request.size.max
(long gauge) (bytes)
max request size
request estimated size
es.circuitBreaker.request.size.estimate
(long gauge) (bytes)
estimated request size
request overhead
es.circuitBreaker.request.size.overhead
(double gauge)
request overhead
ES nodes
es.cluster.nodes
(long gauge)
Number of nodes in the ES cluster
ES data nodes
es.cluster.nodes.data
(long gauge)
Number of data nodes in the ES cluster
active primary shards
es.cluster.health.shards.active.primary
(long gauge)
Number of active primary shards
active shards
es.cluster.health.shards.active
(long gauge)
Number of active shards
relocating shards
es.cluster.health.shards.relocating
(long gauge)
Number of currently relocating shards
initializing shards
es.cluster.health.shards.initializing
(long gauge)
Number of currently initializing shards
unassigned shards
es.cluster.health.shards.unassigned
(long gauge)
Number of currently unassigned shards
open HTTP conns
es.connection.http.current.open
(long gauge)
open HTTP conns (current_open)
total opened HTTP conns
es.connection.http.total.opened
(long gauge)
total opened HTTP conns (total_opened)
open TCP conns
es.connection.tcp.server.open
(long gauge)
open TCP conns (server_open)
network received packets
es.transport.rx.packets
(long counter)
network received packets count (rx_count)
network received size
es.transport.rx.bytes
(long counter) (bytes)
network received size (rx_size)
network transmitted packets
es.transport.tx.packets
(long counter)
network transmitted packets count (tx_count)
network transmitted size
es.transport.tx.bytes
(long counter) (bytes)
network transmitted size (tx_size)
active conn openings
es.connection.tcp.active.opens
(long counter)
active conn openings (active_opens)
passive conn openings
es.connection.tcp.passive.opens
(long counter)
passive conn openings (passive_opens)
open sockets
es.connection.tcp.current.estab
(long gauge)
open sockets (current_estab)
inbound segments (in_segs)
es.connection.in.segs
(long counter)
inbound segments (in_segs)
outbound segments (out_segs)
es.connection.out.segs
(long counter)
outbound segments (out_segs)
retransmitted segments (retrans_segs)
es.connection.retrans.segs
(long counter)
retransmitted segments (retrans_segs)
socket resets (estab_resets)
es.connection.tcp.estab.resets
(long counter)
socket resets (estab_resets)
failed socket open (attempt_fails)
es.connection.tcp.attempt.fails
(long counter)
failed socket open (attempt_fails)
connection errors
es.connection.in.errors
(long counter)
connection errors
socket resets sent (out_rsts)
es.connection.tcp.out.rsts
(long counter)
socket resets sent (out_rsts)
docs count (prim)
es.index.docs.primaries
(long gauge)
docs count on primary shards
docs deleted (prim)
es.index.docs.deleted.primaries
(long gauge)
docs deleted on primary shards
docs count (all)
es.index.docs.totals
(long gauge)
docs count on all (primary and replica) shards
docs deleted (all)
es.index.docs.deleted.total
(long gauge)
docs deleted on all (primary and replica) shards
size on disk (prim)
es.index.files.size.primaries
(long gauge) (bytes)
size on the disk of primary shards
size on disk (all)
es.index.files.size.total
(long gauge) (bytes)
size on the disk of all (primary and replica) shards
indexed docs (prim)
es.indexing.docs.added.primaries
(long counter)
docs indexed on primary shards
deleted docs (prim)
es.indexing.docs.deleted.primaries
(long counter)
docs deleted on primary shards
indexing time (prim)
es.indexing.time.added.primaries
(long counter) (ms)
time spent indexing on primary shards
deleting time (prim)
es.indexing.time.deleted.primaries
(long counter) (ms)
time spent deleting on primary shards
indexed docs (all)
es.indexing.docs.added.total
(long counter)
docs indexed on all (primary and replica) shards
deleted docs (all)
es.indexing.docs.deleted.total
(long counter)
docs deleted on all (primary and replica) shards
indexing time (all)
es.indexing.time.added.total
(long counter) (ms)
time spent indexing on all (primary and replica) shards
deleting time (all)
es.indexing.time.deleted.total
(long counter) (ms)
time spent deleting on all (primary and replica) shards
gc collection count
jvm.gc.collection.count
(long counter)
count of GC collections
gc collection time
jvm.gc.collection.time
(long counter) (ms)
duration of GC collections
open files
jvm.files.open
(long gauge)
jvm currently open files
max open files
jvm.files.max
(long gauge)
jvm max open files limit
used
jvm.pool.used
(long gauge) (bytes)
jvm pool used memory
used
jvm.pool.max
(long gauge) (bytes)
jvm pool max memory
thread count
jvm.threads
(long gauge)
current jvm thread count
peak thread count
jvm.threads.peak
(long gauge)
peak jvm thread count
merge count (prim)
es.indexing.merges.primaries
(long counter)
merge count on primary shards
merge time (prim)
es.indexing.merges.time.primaries
(long counter) (ms)
merge time on primary shards
merged docs count (prim)
es.indexing.merges.docs.primaries
(long counter)
merged docs count on primary shards
merged docs size (prim)
es.indexing.merges.docs.size.primaries
(long counter) (bytes)
merged docs size on primary shards
merge count (all)
es.indexing.merges.total
(long counter)
merge count on all (primary and replica) shards
merge time (all)
es.indexing.merges.time.total
(long counter) (ms)
merge time on all (primary and replica) shards
merged docs count (all)
es.indexing.merges.docs.total
(long counter)
merged docs count on all (primary and replica) shards
merged docs size (all)
es.indexing.merges.docs.size.total
(long counter) (bytes)
merged docs size on all (primary and replica) shards
field cache evictions
es.cache.field.evicted
(long counter)
Field cache evictions
field cache size
es.cache.field.size
(long gauge) (bytes)
Field cache size
filter cache evictions
es.cache.filter.evicted
(long counter)
Filter cache evictions
filter cache size
es.cache.filter.size
(long gauge) (bytes)
Filter cache size
warmer current
es.cache.warmer.current
(long gauge)
Warmer current
warmer total
es.cache.warmer.total
(long counter) (bytes)
Warmer total
warmer total time
es.cache.warmer.time
(long counter) (ms)
Warmer total time
filter/query cache count
es.cache.filter.size.count
(long counter)
Filter/query cache count of elements
refresh count (prim)
es.indexing.refreshes.primaries
(long counter)
refresh count on primary shards
refresh time (prim)
es.indexing.refreshes.time.primaries
(long counter) (ms)
refresh time on primary shards
refresh count (all)
es.indexing.refreshes.total
(long counter)
refresh count on all (primary and replica) shards
refresh time (all)
es.indexing.refreshes.time.total
(long counter) (ms)
refresh time on all (primary and replica) shards
flush count (prim)
es.indexing.flushes.primaries
(long counter)
flush count on primary shards
flush time (prim)
es.indexing.flushes.time.primaries
(long counter) (ms)
flush time on primary shards
flush count (all)
es.indexing.flushes.total
(long counter)
flush count on all (primary and replica) shards
flush time (all)
es.indexing.flushes.time.total
(long counter) (ms)
flush time on all (primary and replica) shards
query count (prim)
es.query.count.primaries
(long counter)
query count on primary shards
query latency (prim)
es.query.latency.time.primaries
(long counter) (ms)
query latency on primary shards
fetch count (prim)
es.fetch.count.primaries
(long counter)
fetch count on primary shards
fetch latency (prim)
es.fetch.latency.time.primaries
(long counter) (ms)
fetch latency on primary shards
avg. query latency (primaries)
es.query.latency.primaries.avg
(long gauge) (ms)
avg. query latency on primary shards
query count (all)
es.query.count.total
(long counter)
query count on all (primary and replica) shards
query latency (all)
es.query.latency.time.total
(long counter) (ms)
query latency on all (primary and replica) shards
fetch count (all)
es.fetch.count.total
(long counter)
fetch count on all (primary and replica) shards
fetch latency (all)
es.fetch.latency.time.total
(long counter) (ms)
fetch latency on all (primary and replica) shards
avg. query latency (all)
es.query.latency.total.avg
(long gauge) (ms)
avg. query latency on all (primary and replica) shards
real-time get count (prim)
es.request.rtg.primaries
(long counter)
real-time get count on primary shards
real-time get latency (prim)
es.request.rtg.time.primaries
(long counter) (ms)
real-time latency on primary shards
real-time get exists count (prim)
es.request.rtg.exists.primaries
(long counter)
real-time get exists count on primary shards
real-time get exists latency (prim)
es.request.rtg.exists.time.primaries
(long counter) (ms)
real-time get exists latency on primary shards
real-time get missing count (prim)
es.request.rtg.missing.primaries
(long counter)
real-time get missing count on primary shards
real-time get missing latency (prim)
es.request.rtg.missing.time.primaries
(long counter) (ms)
real-time get missing latency on primary shards
real-time get count (all)
es.request.rtg.total
(long counter)
real-time get count on all (primary and replica) shards
real-time get latency (all)
es.request.rtg.time.total
(long counter) (ms)
real-time latency on all (primary and replica) shards
real-time get exists count (all)
es.request.rtg.exists.total
(long counter)
real-time get exists count on all (primary and replica) shards
real-time get exists latency (all)
es.request.rtg.exists.time.total
(long counter) (ms)
real-time get exists latency on all (primary and replica) shards
real-time get missing count (all)
es.request.rtg.missing.total
(long counter)
real-time get missing count on all (primary and replica) shards
real-time get missing latency (all)
es.request.rtg.missing.time.total
(long counter) (ms)
real-time get missing latency on all (primary and replica) shards
active shards
es.cluster.shards.active
(long gauge)
Number of active shards
active primary shards
es.cluster.shards.active.primary
(long gauge)
Number of active primary shards
initializing shards
es.cluster.shards.initializing
(long gauge)
Number of initializing shards
relocating shards
es.cluster.shards.relocating
(long gauge)
Number of relocating shards
unassigned shards
es.cluster.shards.unassigned
(long gauge)
Number of unassigned shards
active threads
es.thread.pool.active
(long gauge)
active threads
thread pool size
es.thread.pool.size
(long gauge)
thread pool size
thread pool queue
es.thread.pool.queue
(long gauge)
thread pool queue
thread pool queue size
es.thread.pool.queue.size
(long gauge)
thread pool queue size
rejected threads
es.thread.pool.rejected
(long counter)
rejected threads
thread pool largest
es.thread.pool.largest
(long gauge)
thread pool largest
completed threads
es.thread.pool.completed
(long counter)
complete threads
thread pool min
es.thread.pool.min
(long gauge)
thread pool min
thread pool max
es.thread.pool.max
(long gauge)
thread pool max

FAQ

Why doesn't the number of documents I see in SPM match the number of documents in my Elasticsearch index

SPM collects index stats from primary shards only. To see the total number of documents in an index, select all shards in that index and choose "sum". The list of shards and the "sum" function can be found in the "Shard filter" in the Index Stats report.

Can SPM collect metrics even when Elasticsearch HTTP API is disabled

Each SPM agent collects Elasticsearch metrics only from the local node by accessing the Stats API via HTTP. To allow only local access add the following to elasticsearch.yml. Don't forget to restart each ES node to whose elasticsearch.yml you add this.

http.host: "127.0.0.1"

Can I point SPM monitor to a non-localhost Elasticsearch node

Yes. Adjust /opt/spm/spm-monitor/conf/spm-monitor-config-TOKEN_HERE-default.properties and change theSPM_MONITOR_ES_NODE_HOSTPORT property from the default localhost:9200 value to use an alternative hostname:port. After that restart SPM monitor (if you are running a standalone version) or Elasticsearch process(es) with embedded SPM monitor.

My Elasticsearch is protected by basic HTTP authentication, can I use SPM

Yes. You just need to adjust /opt/spm/spm-monitor/conf/spm-monitor-config-TOKEN_HERE-default.properties file by adding the following two properties (replace values with your real username and password):

SPM_MONITOR_ES_NODE_BASICAUTH_USERNAME=yourUsername
SPM_MONITOR_ES_NODE_BASICAUTH_PASSWORD=yourPassword

Restart your SPM monitor after this change (either with sudo service spm-monitor restart in case of standalone monitor or by restarting Elasticsearch node if you are using in-process javaagent).

I am using SPM for Elasticsearch monitor and I don't see Index (and/or Refresh/Flush/Merge) stats, why is that

SPM for Elasticsearch monitor collects Index stats only from primary shards, so it is possible that you installed SPM monitor on some Elasticsearch node which hosts only replicas. The same is also true for Refresh/Flush and Merge stats. Also note that SPM Elasticsearch monitor should be installed on all your Elasticsearch nodes to get the complete picture of your cluster in SPM Reports UI.