Couchbase Monitoring Integration
Integration¶
- Instructions: https://apps.sematext.com/ui/howto/couchbase/overview
Important Metrics to Watch and Alert on¶
View Operations¶
The View Operations metric measures the number of view queries executed by the Couchbase cluster.
It is an important metric for monitoring the performance and throughput of view queries in your Couchbase cluster. High view operation rates can indicate that the number of view queries is increasing and that the design of the views may need to be optimized for better performance. On the other hand, low view operation rates may indicate that the views are not being utilized effectively and may need to be re-evaluated.
Resident Item Ratio¶
The Resident Item Ratio metric indicates the percentage of active items in a bucket that are currently residing in the memory of the Couchbase server. In other words, it is the ratio of the number of active items residing in memory to the total number of active items in the bucket.
A high ratio means that a large portion of the active items in the bucket are resident in memory, which can lead to faster read and write performance. On the other hand, a low ratio indicates that most of the active items are residing on disk, which can lead to increased disk I/O and slower response times.
The Resident Item Ratio is an important metric for Couchbase performance tuning, as it can help identify if a bucket has sufficient memory resources to accommodate its working set. It can also help to determine if the working set of data is too large for the available memory, which can result in a high number of disk I/O operations and reduced performance.
Cache Miss Rate¶
Cache Miss Rate is a performance metric that indicates the percentage of times a requested item is not found in the cache and must be retrieved from the disk. A high cache miss rate can indicate that the working set of data is too large for the available cache size or that the cache eviction policy is not effective. It can lead to increased disk I/O, longer response times, and reduced throughput. On the other hand, a low cache miss rate means that most requested items are found in the cache, which results in faster response times and better overall performance. This number should be as close to zero as possible.
Total Items¶
This metric counts the total number of current items stored in a bucket including those not active (replica, dead and pending states). It is an important indicator of the size and growth of a Couchbase bucket, as well as the overall workload on the cluster. It can be used to monitor and manage the storage capacity and performance of a Couchbase deployment, and to optimize the allocation of resources such as memory, disk space, and network bandwidth.
Memory Usage High Watermark¶
The Memory High Watermark metric is a configurable threshold that determines the maximum amount of memory that the data service will allocate for storing active data in a bucket. When the active data reaches the Memory High Watermark, the data service will begin to evict items from memory to maintain the threshold.
The Memory High Watermark is expressed as a percentage of the total memory available to the Couchbase server. By default, the Memory HWM is set to 85%, which means that when the active data reaches 85% of the available memory, the data service will begin to evict items.
This is an important metric for monitoring the memory usage of your Couchbase cluster. It allows you to control the allocation of resources and prevent the data service from consuming all available memory, which could result in performance degradation or even crashes.
Current Connections¶
The Current Connections metric indicates the number of active network connections between clients and the Couchbase cluster. This includes connections for data access, management operations, and other network traffic.
It's used for monitoring the load on the Couchbase cluster and for identifying potential bottlenecks or capacity issues. A high number of current connections can indicate that the cluster is experiencing heavy load and may need additional resources, such as increased network bandwidth or additional nodes.
Metrics¶
Metric Name Key (Type) (Unit) |
description |
---|---|
average background wait background.wait.time.avg (double_gauge) (sec) |
Average background wait time |
background wait time background.wait.total (double_counter) (sec) |
Total background wait time |
average commit time disk.commit.time.avg (double_gauge) (sec) |
Average disk commit time |
average update time disk.update.time.avg (double_gauge) (microsec) |
Average disk update time |
bytes read bytes.read (double_counter) (bytes) |
Number of bytes per second sent into a bucket. |
bytes written bytes.written (double_counter) (bytes) |
Number of bytes per second sent from a bucket |
cas bad values cas.badval (double_counter) () |
Compare and Swap bad values |
cas hits cas.hits (double_counter) () |
Compare and Swap hits |
cas misses cas.misses (double_counter) () |
Compare and Swap misses |
cmd gets cmd.get (double_counter) () |
Number of get commands |
cmd sets cmd.set (double_counter) () |
Number of set commands |
doc actual disk size docs.disk.actual.size (long_gauge) (bytes) |
Couch docs total size on disk |
doc data disk size docs.data.size (long_gauge) (bytes) |
Couch docs data size |
docs disk size docs.disk.size (long_gauge) (bytes) |
Couch docs total size |
doc fragmentation docs.fragmentation (double_gauge) (%) |
Couch docs fragmentation |
spatial data size data.spatial.size (long_gauge) (bytes) |
Size of object data for spatial views |
spatial disk size disk.spatial.size (long_gauge) (bytes) |
Amount of disk space occupied by spatial views |
spatial ops ops.spatial (double_counter) () |
Spatial operations |
total disk size disk.size (long_gauge) (bytes) |
Couch total disk size. |
view data size views.data.size (long_gauge) (bytes) |
Size of object data for views. |
view disk size views.disk.size (long_gauge) (bytes) |
Amount of disk space occupied by views. |
view fragmentation views.fragmentation (double_gauge) (%) |
View fragmentation |
view ops views.ops (double_counter) () |
View operations |
cpu utilization cpu.utilization (double_gauge) (%) |
CPU utilization percentage. |
connections connections.current (long_gauge) () |
Current bucket connections. |
total items items.current.total (long_gauge) () |
Num current items including those not active (replica, dead and pending states) |
memory items items.current (long_gauge) () |
Num items in active vbuckets (temp + live) |
decrement hits decrement.hits (double_counter) () |
Decrement hits |
decrement misses decrement.misses (double_counter) () |
Decrement misses |
delete hdouble_counterits delete.hits (double_counter) () |
Delete hits |
delete misses delete.misses (double_counter) () |
Delete misses |
commits disk.commits.count (double_counter) () |
Disk commits |
updates disk.updates.count (double_counter) () |
Disk updates |
writes disk.write.queue (long_gauge) () |
Disk write queue depth |
reads ep.background.fetched (double_counter) () |
Disk reads |
cache miss rate ep.cache.miss.rate (double_gauge) (%) |
Cache miss rate |
cache miss ratio ep.cache.miss.ratio (double_gauge) (%) |
Cache miss ratio |
DCP fts backoff ep.dcp.fts.backoff (double_counter) () |
Number of backoffs for fts DCP connections |
DCP fts count ep.dcp.fts.count (double_gauge) () |
Number of fts DCP connections |
DCP fts items remaining ep.dcp.fts.items.remaining (double_gauge) () |
Number of fts items remaining to be sent |
DCP fts items sent ep.dcp.fts.items.sent (double_counter) () |
Number of fts items sent |
DCP fts producers ep.dcp.fts.producer.count (double_gauge) () |
Number of fts producers |
DCP fts total bytes ep.dcp.2i.total.bytes (double_counter) (bytes) |
Number of bytes being sent for indexes DCP connections |
DCP indexes backoff ep.dcp.2i.backoff (double_counter) () |
Number of backoffs for indexes DCP |
DCP indexes count ep.dcp.2i.count (double_gauge) () |
Number of indexes DCP connections |
DCP indexes items remaining ep.dcp.2i.items.remaining (double_gauge) () |
Number of indexes items remaining to be sent |
DCP indexes items sent ep.dcp.2i.items.sent (double_counter) () |
Number of indexes items sent |
DCP indexes producer ep.dcp.2i.producer.count (double_gauge) () |
Number of indexes producers |
DCP other backoff ep.dcp.other.backoff (double_counter) () |
Number of backoffs for other DCP connections |
DCP other count ep.dcp.other.count (double_gauge) () |
Number of other DCP connections |
DCP other items remaining ep.dcp.other.items.remaining (double_gauge) () |
Number of other items remaining to be sent |
DCP other items sent ep.dcp.other.items.sent (double_counter) () |
Number of other items sent |
DCP other producers ep.dcp.other.producer.count (double_gauge) () |
Number of other producers |
DCP other total bytes ep.dcp.other.total.bytes (double_counter) (bytes) |
Number of bytes being sent for other DCP connections |
DCP replica backoff ep.dcp.replica.backoff (double_counter) () |
Number of backoffs for replica DCP connections |
DCP replica count ep.dcp.replica.count (double_gauge) () |
Number of replica DCP connections |
DCP replica items remaining ep.dcp.replica.items.remaining (double_gauge) () |
Number of replica items remaining to be sent |
DCP replica items sent ep.dcp.replica.items.sent (double_counter) () |
Number of replica items sent |
DCP replica producer ep.dcp.replica.producer.count (double_gauge) () |
Number of replica producers |
DCP replica total bytes ep.dcp.replica.bytes.total (double_counter) (bytes) |
Number of bytes being sent for replica DCP connections |
DCP views backoff ep.dcp.views.backoff (double_counter) () |
Number of backoffs for views DCP connections |
DCP views count ep.dcp.views.count (double_gauge) () |
Number of views DCP connections |
DCP views items remaining ep.dcp.views.items.remaining (double_gauge) () |
Number of views items remaining to be sent |
DCP views items sent ep.dcp.views.items.sent (double_counter) () |
Number of views items sent |
DCP views producer ep.dcp.views.producer.count (double_gauge) () |
Number of views producers |
DCP views total bytes ep.dcp.views.bytes.total (double_counter) (bytes) |
Number of bytes being sent for views DCP connections |
DCP XDCR backoff ep.dcp.xdcr.backoff (double_counter) () |
Number of backoffs for XDCR DCP connections |
DCP XDCR count ep.dcp.xdcr.count (double_gauge) () |
Number of XDCR DCP connections |
DCP XDCR items remaining ep.dcp.xdcr.items.remaining (double_gauge) () |
Number of XDCR items remaining to be sent |
DCP XDCR items sent ep.dcp.xdcr.items.sent (double_counter) () |
Number of XDCR items sent |
DCP XDCR producer ep.dcp.xdcr.producer.count (double_gauge) () |
Number of XDCR producers |
DCP XDCR total bytes ep.dcp.xdcr.total.bytes (double_counter) (bytes) |
Number of bytes being sent for XDCR DCP connections |
queue drained ep.diskqueue.drain (double_counter) () |
Total drained items in disk queue |
queued ep.diskqueue.fill (double_counter) () |
Total enqueued items in disk queue |
queue waiting items ep.diskqueue.items (long_gauge) () |
Total number of items waiting to be written to disk |
current flushing items ep.flusher.todo (long_gauge) () |
Number of items currently being written |
failed commits ep.item.commit.failed (double_gauge) () |
Number of times a transaction failed to commit due to storage errors |
kv size ep.kv.size (long_gauge) (bytes) |
Total amount of user data cached in RAM in this bucket |
max size ep.max.size (long_gauge) (bytes) |
The maximum amount of memory this bucket can use |
memory high water mark ep.mem.high.wat (long_gauge) (bytes) |
Memory usage high water mark for auto-evictions |
memory low water mark ep.mem.low.wat (long_gauge) (bytes) |
Memory usage low water mark for auto-evictions |
metadata mem ep.meta.data.memory (long_gauge) (bytes) |
Total amount of item metadata consuming RAM in this bucket |
non-resident items ep.num.non.resident (long_gauge) () |
Number of non-resident items |
ops del meta ep.num.ops.del.meta (double_counter) () |
Number of delete operations for this bucket as the target for XDCR |
ops del ret meta ep.num.ops.del.ret.meta (double_counter) () |
Number of delRetMeta operations for this bucket as the target for XDCR |
ops get meta ep.num.ops.get.meta (double_counter) () |
Number of read operations for this bucket as the target for XDCR |
ops set meta ep.num.ops.set.meta (double_counter) () |
Number of set operations for this bucket as the target for XDCR |
ops set rep meta ep.num.ops.set.ret.meta (double_counter) () |
Number of setRetMeta operations for this bucket as the target for XDCR |
ejects ep.num.value.ejects (double_counter) () |
Number of times item values got ejected from memory to disk |
ooms ep.oom.errors (long_gauge) () |
Number of times unrecoverable OOMs happened while processing operations |
create ops ep.ops.create (double_counter) () |
Create operations |
update ops ep.ops.update (double_counter) () |
Update operations |
overhead ep.overhead (long_gauge) () |
Extra memory used by transient data like persistence queues or checkpoints |
queue size ep.queue.size (long_gauge) () |
Number of items queued for storage |
resident items ep.resident.items.rate (double_gauge) () |
Number of resident items |
drain items ep.tap.replica.queue.drain (double_counter) () |
Total drained items in the replica queue |
drain items ep.tap.total.queue.drain (double_counter) () |
Total drained items in the queue |
queued ep.tap.total.queue.fill (double_gauge) () |
Total enqueued items in the queue |
backlog size ep.tap.total.total.backlog.size (long_gauge) () |
Number of remaining items for replication |
ooms ep.tmp.oom.errors (double_counter) () |
Number of times recoverable OOMs happened while processing operations |
vb total ep.vb.total (long_gauge) () |
Total number of vBuckets for this bucket |
evictions evictions (double_counter) () |
Number of evictions |
get hits get.hits (double_counter) () |
Number of get hits |
get misses get.misses (double_counter) () |
Number of get misses |
hibernated requests hibernated.requests (double_gauge) () |
Number of streaming requests idle |
hibernated waked hibernated.waked (double_counter) () |
Rate of streaming request wakeups |
hit ratio hit.ratio (double_gauge) () |
Hit ratio |
increment hits increment.hits (double_counter) () |
Number of increment hits |
increment misses increment.misses (double_counter) () |
Number of increment misses |
actual free mem.actual.free (long_gauge) (bytes) |
Actual free memory |
actual used mem.actual.used (long_gauge) (bytes) |
Used memory |
free mem.free (long_gauge) (bytes) |
Free memory |
total mem.total (long_gauge) (bytes) |
Total available memory |
used mem.used (long_gauge) (bytes) |
Engine's total memory usage (deprecated) |
used sys mem.used.sys (long_gauge) (bytes) |
System memory usage |
misses misses (double_counter) () |
Total number of misses |
ops ops (double_counter) () |
Total number of operations |
faults page.faults (double_gauge) () |
Number of page faults |
repl docs queue replication.docs.rep.queue (double_gauge) () |
|
repl meta latency aggr replication.meta.latency.aggr (double_gauge) () |
|
rest requests rest.requests (double_counter) (request) |
Number of HTTP requests |
swap total swap.total (long_gauge) (bytes) |
Total amount of swap available |
swap used swap.used (long_gauge) (bytes) |
Amount of swap used |
vb active eject vb.active.eject (double_counter) (items) |
Number of items being ejected to disk from active vBuckets |
vb active item mem vb.active.itm.memory (long_gauge) () |
Amount of active user data cached in RAM in this bucket |
vb active meta mem vb.active.meta.data.memory (long_gauge) () |
Amount of active item metadata consuming RAM in this bucket |
vb active num non resident vb.active.num.non.resident (long_gauge) () |
Number of non resident vBuckets in the active state for this bucket |
vb active num vb.active.num (long_gauge) () |
Number of active items |
vb active ops create vb.active.ops.create (double_counter) (items) |
New items being inserted into active vBuckets in this bucket |
vb active ops update vb.active.ops.update (double_counter) (items) |
Number of items updated on active vBucket for this bucket |
vb active queue age vb.active.queue.age (long_gauge) (ms) |
Sum of disk queue item age |
vb active queue drain vb.active.queue.drain (double_counter) () |
Total drained items in the queue |
vb active queue fill vb.active.queue.fill (double_counter) (items) |
Number of active items being put on the active item disk queue |
vb active queue size vb.active.queue.size (long_gauge) () |
Number of active items in the queue |
vb active resident items ratio vb.active.resident.items.ratio (double_gauge) (%) |
Number of resident items |
vb avg active queue age vb.avg.active.queue.age (double_gauge) (sec) |
Average age in seconds of active items in the active item queue |
vb avg pending queue age vb.avg.pending.queue.age (double_gauge) (sec) |
Average age in seconds of pending items in the pending item queue |
vb avg replica queue age vb.avg.replica.queue.age (double_gauge) (sec) |
Average age in seconds of replica items in the replica item queue |
vb avg total queue age vb.avg.total.queue.age (double_gauge) (sec) |
Average age of items in the queue |
vb pending curr item vb.pending.curr.items (long_gauge) () |
Number of items in pending vBuckets |
vb pending eject vb.pending.eject (double_counter) (items) |
Number of items being ejected to disk from pending vBuckets |
vb pending item mem vb.pending.itm.memory (double_gauge) () |
Amount of pending user data cached in RAM in this bucket |
vb pending meta mem vb.pending.meta.data.memory (double_gauge) () |
Amount of pending item metadata consuming RAM in this bucket |
vb pending num non resident vb.pending.num.non.resident (double_gauge) () |
Number of non resident vBuckets in the pending state for this bucket |
vb pending num vb.pending.num (double_gauge) () |
Number of pending items |
vb pending ops create vb.pending.ops.create (double_counter) () |
Number of pending create operations |
vb pending ops update vb.pending.ops.update (double_counter) (items) |
Number of items updated on pending vBucket for this bucket |
vb pending queue age vb.pending.queue.age (double_gauge) (ms) |
Sum of disk pending queue item age |
vb pending queue drain vb.pending.queue.drain (double_counter) () |
Total drained pending items in the queue |
vb pending queue fill vb.pending.queue.fill (double_counter) () |
Total enqueued pending items in disk queue |
vb pending queue size vb.pending.queue.size (double_gauge) () |
Number of pending items in the queue |
vb pending resident items ratio vb.pending.resident.items.ratio (double_gauge) () |
Number of resident pending items |
vb replica curr items vb.replica.curr.items (long_gauge) () |
Number of in memory items |
vb replica eject vb.replica.eject (double_counter) (items) |
Number of items being ejected to disk from replica vBuckets |
vb replica item mem vb.replica.itm.memory (long_gauge) () |
Amount of replica user data cached in RAM in this bucket |
vb replica meta data mem vb.replica.meta.data.memory (long_gauge) (bytes) |
Total metadata memory |
vb replica num non resident vb.replica.num.non.resident (long_gauge) () |
Number of non resident vBuckets in the replica state for this bucket |
vb replica num vb.replica.num (long_gauge) () |
Number of replica vBuckets |
vb replica ops create vb.replica.ops.create (double_counter) () |
Number of replica create operations |
vb replica ops update vb.replica.ops.update (double_counter) (items) |
Number of items updated on replica vBucket for this bucket |
vb replica queue age vb.replica.queue.age (long_gauge) (ms) |
Sum of disk replica queue item age |
vb replica queue drain vb.replica.queue.drain (double_counter) () |
Total drained replica items in the queue |
vb replica queue fill vb.replica.queue.fill (double_counter) () |
Total enqueued replica items in disk queue |
vb replica queue size vb.replica.queue.size (long_gauge) () |
Replica items in disk queue |
vb replica resident items ratio vb.replica.resident.items.ratio (double_gauge) (%) |
Number of resident replica items |
vb total queue age vb.queue.age.total (long_gauge) (ms) |
Sum of disk queue item age |
XDCR ops xdc.ops (double_counter) () |
Number of cross-datacenter replication operations |
active items items.active (long_gauge) () |
Number of active items in memory |
total items items.total (long_gauge) () |
Total number of items |
data size docs.size (long_gauge) (bytes) |
Couch docs data size |
data disk size docs.disk.actual.size (long_gauge) (bytes) |
Couch docs total size on disk |
views size views.size (long_gauge) (bytes) |
Couch views data size |
views disk size views.disk.size (long_gauge) (bytes) |
Couch views data size on disk |
memory items items.replica (long_gauge) () |
Number of in memory items |
cores cores (long_gauge) () |
Cores |
gc num gc.num (counter) () |
Number of objects garbage collected |
gc pause percent gc.pause.percent (gauge) (%) |
Garbage collection pause percentage |
gc pause time gc.pause.time (counter) (seconds) |
Garbage collection pause time |
system memory memory.system (long_gauge) (bytes) |
Memory used by the system |
total memory memory.total (long_gauge) (bytes) |
Memory used by Couchbase over the total period of time |
usage memory memory.usage (long_gauge) (bytes) |
Memory currently used by Couchbase |
active requests request.active.count (long_gauge) () |
Number of active requests |
requests completed request.completed.count (counter) () |
Number of requests completed |
request prepared percent request.prepared.percent (gauge) (%) |
Percentage of requests prepared |
request time mean request.time.mean (gauge) (seconds) |
Average request time |
total threads threads.total (long_gauge) () |
total_threads |