Shipping data to AWS Elasticsearch with Logagent

Elasticsearch is already quite popular and its popularity just keeps growing. Looking at Google Trends chart for the last 5 years shows this nicely:

There are a number of reasons why Elasticsearch is popular:  it is very easy to get going with Elasticsearch, it’s user-friendly, and has great APIs, among other things. Its growing popularity is not only beneficial to Elasticsearch itself, but also to the whole community as the ecosystem around it is growing rapidly as well. There are tools developed by Elastic (the company, not software), like Logstash or Kibana and there are tools provided by third-party companies and developers like Logagent, Search Guard (you can find a blog post about it on our blog –  Securing Elasticsearch and Kibana with Search Guard), Grafana and many many more.

In addition to Elasticsearch and the ecosystem built around it there are also commercial offerings. From hosted ELK as a service for logs that’s a part of Sematext Cloud, to hosted Elasticsearch services like Amazon Elasticsearch Service which lets you run your own managed clusters.

Amazon Elasticsearch Service or Sematext Logsene

Before we talk about shipping data to AWS Elasticsearch service let’s just do a quick check.  Is AWS Elasticsearch service really what you want?  To help with that decision, consider the following questions:

  • What is my use case for Elasticsearch?  Do I have use cases other than centralized logging?
  • Do I have enough knowledge and experience to support my own Elasticsearch cluster?
  • Do I want to take the full responsibility over Elasticsearch maintenance and scaling to support a growing volume of data and/or queries?
  • Are there enough other people on my team to share the burden or is it all going to fall on my plate?

If some or most of the answers are no, then you may want to stop reading here, check Logsene and save yourself both time and money.  If, however, your use cases potentially include, but are not limited to logging then keep reading!

Here is a diagram that helps people figure out if they should use AWS Elasticsearch, or run their own Elasticsearch, or use a service like Logsene.

If most of the answers to the questions further above were yes then you are likely considering Amazon Elasticsearch service and going through the above flow diagram should confirm that. We compared the self-hosted Elasticsearch and the Amazon Elasticsearch service some time ago – you can read about in AWS Elasticsearch Service vs. Elasticsearch on EC2 blog post.  The gist is that Amazon Elasticsearch service provides:

  • Automatic failed node replacement
  • Node adding/removal via an API
  • Rights management via IAM
  • Daily S3 snapshots
  • Basic CloudWatch metrics for Elasticsearch

But the downsides are:

  • Increased costs compared to traditional EC2 instances
  • Fewer instance types available
  • Limited cluster-wide changes possible
  • Unavailability of Elasticsearch logs
  • Limited debugging possibilities because of API restrictions

If the pros are more valuable than cons limit you, and you would like to ship your logs to Amazon Elasticsearch Service let’s see how to actually do that with Logagent, an open source, Node.js based log shipper.

Logagent ⇒ Amazon Elasticsearch Service

When using Amazon Elasticsearch Service you gain security as an out of the box feature, but you are also left hanging a bit as the official Elasticsearch client library doesn’t support it. You either have to disable security and allow communication from certain hosts without authentication or choose not to rely on the official Elasticsearch client libraries. However, if your use case is log/event shipping things are not that bad – you can use Logstash with an additional plugin or the newest version of Logagent, which has lower overhead and minimal impact on the system. See Top 5 Logstash Alternatives for more details.

Configuring Logagent to ship data to Amazon Elasticsearch Service is really quite simple. For the purpose of the blog post I will simply send the contents of a file to Elasticsearch. Let’s assume my file is called app.log and is that it lives in the /var/log/myapp/ directory. The input part of Logagent configuration looks as follows:

input:
  stdin: true
  files:
    - /var/log/myapp/app.log

Now the data that is read from the input needs to be sent to an output – in our case our AWS Elasticsearch Service instance available at https://search-blog-elasticsearch-service-u3isln3erq3ocb2vkv3v2tmt24.eu-west-1.es.amazonaws.com/ (no, not available anymore – get your own! :)). To do that we will use the output module called output-aws-elasticsearch. The configuration looks as follows:

output:
 aws-es:
   module: output-aws-elasticsearch
   url: https://search-blog-elasticsearch-service-u3isln3erq3ocb2vkv3v2tmt24.eu-west-1.es.amazonaws.com/
   index: myapp_logs
   type: myapp
   awsConfigFile: ./aws-config.json
   log:
     - type: 'stdio'
       levels: []

In the above output definition we mention the auth and awsConfigFile options.  These are important.  Logagent uses AWS SDK libraries and supports all authentication methods provided by AWS API including signed HTTP requests. Logagent assumes you will provide the credentials to access your AWS environment by using the JSON file specified using the awsConfigFile options. The content of the aws-config.json looks as follows:

{
 "accessKeyId": <YOUR_ACCESS_KEY_ID>,
 "secretAccessKey": <YOUR_SECRET_ACCESS_KEY>,
 "region": "us-east-1"
}

So we need to provide the AWS access key, AWS secret key and the region where we have our AWS Elasticsearch Service instances created and we are good to go. All of this you can get via the AWS Console.

And this is really all you need to do.  Logagent really makes data shipping easy it’s Apache Licensed and open-sourced on Github, completely pluggable, featuring a number of input, output, and processor plugins, and it’s very easy to add your own.  For more information check out http://sematext.com/docs/logagent.  Enjoy!

Leave a Reply