AWS CloudWatch / VPC Logs to Logsene

Sending AWS CloudWatch/VPC Logs to Logsene

Use-case: you’re using AWS VPC and want visibility over the connection to your VPC. Which IPs are allowed or denied connections to certain ports, how much traffic goes through each connection and so on.

 

Solution: send AWS VPC logs (one type of CloudWatch logs) to a Logsene application. There, you can search these logs, visualize them and set up alerts. This post will show you how to forward VPC logs (any CloudWatch logs, for that matter) to Logsene using an AWS Lambda function.

The main steps for implementing the solution are:

  1. Create a Flow Log for your VPC, if there isn’t one already. This will send your AWS VPC logs to CloudWatch
  2. Create a new Lambda Function, which will parse and forward the CloudWatch/VPC logs
  3. Clone this GitHub repo and fill in your Logsene Application Token, create a ZIP file with the contents of the cloned repository, and configure the new Lambda function to use the created ZIP file as code
  4. Decide on the maximum memory to allocate for this function and the timeout for its execution
  5. Explore your logs in Logsene 🙂

Create a Flow Log

To start, log in to your AWS Console, then go to Services -> VPC. There, select your VPC, right-click it and select Create Flow Log:
createflowlog

Then you’ll need to set up a IAM role that’s able to push VPC logs to your CloudWatch account (if you don’t have one already) and then choose a name for this flow. You’ll use the name later on in the lambda function.
flowlog

Create a new AWS Lambda function

Now go to Services -> Lambda and get started with a new function. Then the first step is to select a blueprint for your function. Take cloudwatch-logs-process-data:

blueprint

The next step is to select a source. Here you’d make sure the source type is CloudWatch Logs and select the flow you just created. You can filter only certain logs, but you’d normally leave the Filter Pattern empty to process all of them. Nevertheless, you need to give this filter a name:

source

At the next step, you’d configure the function itself. First you give it a name:

name

Then you have to specify the code:

Add the code to your Lambda function

First you’d need to clone the GitHub repository:

git@github.com:sematext/logsene-aws-lambda-s3.git

Then, open index.js and fill in your Logsene application token in the logseneToken variable. To find the Logsene Application Token, go to your Sematext Account, then in the Services menu select Logsene, and then the Logsene application you want to send your logs to. Once you’re in that application, click the Integration button and you’ll see the application token:
token

Now your code is ready, so you need to make a zip file out of it. Note: make sure you zip only the contents of the repository, not the directory containing the repository. Like:

pwd # /tmp/cloned-repos/logsene-aws-lambda-cloudwatch zip -r logsene.zip *

Finally, you’d upload the zip to AWS Lambda as the function code:
upload

Finalize the function configuration

After the code, leave the handler to the default index.handler and select a role that allows this function to execute. You can create a new Basic execution role to do that (from the drop-down) or select a basic execution role that you’ve already created:
role

Then, you need to decide on how much memory you allow for the function and how long you allow it to run. This depends on the log throughput (more logs will need more processing resources) and will influence costs (i.e. like keeping the equivalent general-purpose instance up for that time). Normally, runtime is very short so even large resources shouldn’t generate significant costs. 256MB of memory and a 30 second timeout should be enough for most use-cases:
memory

To enable the function to run when new logs come in, you’d need to enable the source with your Flow Log name at the last step.
enable

Exploring CloudTrail logs with Logsene

As logs get generated by VPC, the function should upload their contents to Logsene. You can use the native UI to explore those logs:

native

And because VPC logs get parsed out of the box, you can also use Kibana 4 to generate visualizations. Like breaking down connections by the number of bytes:

Kibana

Happy Logsene-ing! 🙂

5 thoughts on “Sending AWS CloudWatch/VPC Logs to Logsene

  1. Hi: I’m wondering how often the Lambda function actually gets triggered and for how long it runs each invocation? My fear would be that constantly flowing logs would trigger LOTS of Lambda executions, potentially making the cost of using Lambda much more than just setting up a persistent daemon on an EC2 instance.

    1. Hi James,

      The first run can take a few seconds, because it needs to download the GeoIP database, but then it should be sub-second. The Lambda should be triggered whenever a new batch of logs lands in CloudTrail. I’m not sure how often if happens if VPC is constantly logging, though. It should be cheaper than EC2 anyway, because it’s like paying for a proportional size of an m3 instance (based on the memory you use, e.g. 256MB which should be plenty) for the time it’s being run. So even if it runs all the time (and I think it won’t, though I’d have to test to be sure), it would be cheaper than even a t2.nano.

    2. James – if you are already running EC2 instances and you don’t mind running a log shipper on it, that should be cheaper than running the EC2 instances and Lambda. However, a log shipper can typically collect only local logs, and not those from S3 or CloudWatch Logs, which is what this post is about. If you are looking for a way to ship local logs, give Logagent a try – https://github.com/sematext/logagent-js .

  2. I have a file config apache-log-show.conf to get log from apache. It show below:

    input {
    file {
    type => “linux-syslog”

    path => [ “/var/log/*.log”, “/var/log/messages”, “/var/log/syslog”]
    }

    file {
    type => “apache-access”
    path => “/var/log/apache/access.log”
    }

    file {
    type => “apache-error”
    path => “/var/log/apache/error.log”
    }
    }

    start_position => “beginning”
    }
    }

    filter {
    grok { # parses the common bits
    match => [ “message”, “[%{TIMESTAMP_ISO8601}][%{DATA}%{SPACE}][%{DATA}%{SPACE}]%{SPACE}[%{DATA}]%{SPACE}[%{DATA}][%{DATA}] took[%{DATA}], took_millis[%{DATA}], types[%{DATA}], stats[%{DATA}], search_type[%{DATA}], total_shards[%{DATA}], source[(?(.|r|n)*], extra_source[)%{DATA}” ]
    }

    mutate {
    gsub => [
    “source_body”, “], extra_source[$”, “”
    ]
    }
    }

    output {
    file {
    path => “/var/tmp/just_queries”
    codec => “json_lines”
    message_format => “%{source_body}”
    }
    }
    File apache-log-show.conf i make on directory with link /var/log/logstash…
    But i can’t get log maybe have same mistake in there but i don’t know why? in this case i think have some problem as:

    + i get file apache-log-show.conf but it don’t stay in true directory?
    + on file conf, maybe i miss some codes to get data or my code is wrong?
    + apache server i confige wrong which to get log’s data?

    So i hope you can help me to check and show me where wrong i have in this file.
    Thks

    1. Hi,

      I’m a bit confused, because this is a post about AWS Cloudwatch and not Logstash. Also, in your config you seem to tail Apache logs (as described here: https://sematext.com/blog/2013/12/19/getting-started-with-logstash/) but have the grok rules from parsing Elasticsearch logs (as described here: https://sematext.com/blog/2015/01/19/grok-elasticsearch-logs-with-logstash/).

      Do you see the posts mixed up, or any other issue with our blog? If not, maybe my comment below helped clarify the issue with your config?

Leave a Reply