Many of our clients use AWS EC2. In the context of Elasticsearch consulting or support, one question we often get is: should we use AWS Elasticsearch Service instead of deploying Elasticsearch ourselves? The question is valid whether “self hosted” means in EC2, some other cloud or your own datacenter. As always, the answer is “it depends”, but in this post we’ll show how the advantages of AWS Elasticsearch compared to those of deploying your own Elasticsearch cluster. This way, you’ll be able to decide what fits your use-case and knowledge.
Why AWS Elasticsearch?
- It automatically replaces failed nodes: you don’t need to get paged in the middle of the night, spin a new node and add it to the cluster
- You can add/remove nodes through an API – otherwise you’ll have to make sure you have all the automation in place so that when you spin a node you don’t spend extra time manually installing and configuring Elasticsearch
- You can manage access rights via IAM: this is easier than setting up a reverse proxy or a security addon (cheaper, too, if the addon is paid)
- Daily snapshots to S3 are included. This saves you the time and money to set it up (and the storage cost) for what is a mandatory step in most use-cases
- CloudWatch monitoring included. You will want to monitor your Elasticsearch cluster anyway (whether you build or buy)
Why to install your own Elasticsearch?
- On demand equivalent instances are cheaper by ~29%. The delta differs from instance to instance (we checked m3.2xl and i2.2xl ones). You get even more discount for your own cluster if you use reserved instances
- More instance types and sizes are available. You can use bigger i2 instances than AWS Elasticsearch, and you have access to the latest generation of c4 and m4 instances. This way, you are likely to scale further and get more bang per buck, especially with logs and metrics (more specific hardware recommendations and Elasticsearch tuning here)
- You can change more index settings, beyond analysis and number of shards/replicas. For example, delayed allocation, which is useful when you have a lot of data per node. You can also change the settings of all indices at once by hitting the /_settings endpoint. By properly utilizing various settings Elasticsearch makes available you can better optimize your setup for your particular use case, make better use of underlying resources, and thus drive the cost down further.
- You can change more cluster-wide settings, such as number of shards to rebalance at once
- You get access to all other APIs, such as Hot Threads, which is useful for debugging
- You can use a more comprehensive Elasticsearch monitoring solution. Currently, CloudWatch only collects a few metrics, such as cluster status, number of nodes and documents, heap pressure and disk space. For most use-cases, you’ll need more info, such as the query latency and indexing throughput. And when something goes wrong, you’ll need more insight on JVM pool sizes, cache sizes, Garbage Collection or you may need to profile Elasticsearch
- You can have clusters of more than 20 nodes
— Sematext Group, Inc. (@sematext) November 28, 2016
You may see a pattern emerging from the bullets above: AWS Elasticsearch is easy to set up and comes with a few features on top of Elasticsearch that you’ll likely need. However, it’s limited when it comes to scaling – both in terms of number&size of nodes and Elasticsearch features.
If you already know your way around Elasticsearch, AWS Elasticsearch service will likely only make sense for small clusters. If you’re just getting started, you can go a longer way until it will start to pay off for you to boost your knowledge (e.g. via an Elasticsearch training) and install your own Elasticsearch cluster (maybe with the help of our consulting or support). Or you can delegate the whole scaling part to us by using Logsene, especially if your use-case is about logs or metrics.
Finally, if you think there are too many “if”s in the above paragraph, here’s a flowchart to cover all the options: