log management

Logging Libraries vs Log Shippers

Logging Libraries vs Log Shippers

In the context of centralizing logs (say, to Logsene or your own Elasticsearch), we often get the question of whether one should log directly from the application (e.g. via an Elasticsearch or syslog appender) or use a dedicated log shipper.

In this post, we’ll look at the advantages of each approach, so you’ll know when to use which.

Logging Libraries

Most programming languages have libraries to assist you with logging. Most commonly, they support local files or syslog, but more “exotic” destinations are often added to the list, such as Elasticsearch/Logsene. Here’s why you might want to use them:

  • Convenience: you’ll want a logging library anyway, so why not go with it all the way, without having to set up and manage a separate application for shipping? (well, there are some reasons below, but you get the point)
  • Fewer moving parts: logging from the library means you don’t have to manage the communication between the application and the log shipper
  • Lighter: logs serialized by your application can be consumed by Elasticsearch/Logsene directly, instead of having a log shipper in the middle to deserialize/parse it and then serialize it again

Log Shippers

Your log shipper can be Logstash or one of its alternatives. A logging library is still needed to get logs out of your application, but you’ll only write locally, either to a file or to a socket. A log shipper will take care of taking that raw log all the way to Elasticsearch/Logsene:

  • Reliability: most log shippers have buffers of some form. Whether it tails a file and remembers where it left off, or keeps data in memory/disk, a log shipper would be more resilient to network issues or slowdowns. Buffering can be implemented by a logging library too, but in reality most either block the thread/application or drop data
  • Performance: buffering also means a shipper can process data and send it to Elasticsearch/Logsene in bulks. This design will support higher throughput. Once again, logging libraries may have this functionality too (only tightly integrated into your app), but most will just process logs one by one
  • Enriching: unlike most logging libraries, log shippers often are capable of doing additional processing, such as pulling the host name or tagging IPs with Geo information
  • Fanout: logging to multiple destinations (e.g. local file + Logsene) is normally easier with a shipper
  • Flexibility: you can always change your log shipper to one that suits your use-case better. Changing the library you use for logging may be more involved

Conclusions

Design-wise, the difference between the two approaches is simply tight vs loose coupling, but the way most libraries and shippers are actually implemented are more likely to influence your decision on sending data to Elasticsearch/Logsene:

  • logging directly from the library might make sense for development: it’s easier to set up, especially if you’re not (yet) familiar with a log shipper
  • in production you’ll likely want to use one of the available log shippers, mostly because of buffers: blocking the application or dropping data (immediately) are often non-options in a production deployment

If logging isn’t critical to your environment (i.e. you can tolerate the occasional loss of data), you may want to fire-and-forget your logs to Logsene’s UDP syslog endpoint. This takes reliability out of the equation, meaning you can use a shipper if you need enriching or support for other destinations, or a library if you just want to send the raw logs (which may well be JSON).

Shippers or libraries, if you want to send logs with anything that can talk to Elasticsearch or syslog, you can sign up for Logsene here. No credit card or commitment is required, and we offer 30-day trials for all plans, in addition to the free ones.

If, on the other hand, you enjoy working with logs, metrics and/or search engines, come join us: we’re hiring worldwide.

black friday log management checklist

Black Friday log management (with the Elastic Stack) checklist

For this Black Friday, Sematext wishes you:

  • more products sold
  • more traffic and exposure
  • more logs 🙂

Now seriously, applications tend to generate a lot more logs on Black Friday, and they also tend to break down more – making those logs even more precious. If you’re using the Elastic Stack for centralized logging, in this post we’ll share some tips and tricks to prepare you for this extra traffic.

If you’re still grepping through your logs via ssh, doing that on Black Friday might be that more painful, so you have two options:

  • get started with the Elastic Stack now. Here’s a complete ELK howto. It should take you about an hour to get started and you can move on from there. Don’t forget to come back to this post for tips! 🙂
  • use Logsene, which takes care of the E(lasticsearch) and K(ibana) from ELK for you. Most importantly for this season, we take care of scaling Elasticsearch. You can get started in 5 minutes with Logstash or choose another log shipper. Anything that can push data to Elasticsearch via HTTP can work with Logsene, since it exposes the Elasticsearch API. So you can log directly from your app or from a log shipper (here are all the documented options).

Either way, let’s move to the tips themselves.

Tips for Logstash and Friends

The big question here is: can the pipeline easily max out Elasticsearch, or will it become the bottleneck itself? If your logs go directly from your servers to Elasticsearch, there’s little to worry about: as you spin more servers for Black Friday, your pipeline capacity for processing and buffering will grow as well.

You may get into trouble if your logs are funnelled through one (or a few) Logstash instances, though. If you find yourself in that situation you might check the following:

  • Bulk size. The ideal size depends on your Elasticsearch hardware, but usually you want to send a few MB at a time. Gigantic batches will put unnecessary strain on Elasticsearch, while tiny ones will add too much overhead. Calculate how many logs (of your average size) make up a few MB and you should be good.
  • Number of threads sending data. When one thread goes through a bulk reply, Elasticsearch shouldn’t be idling – it should get data from another thread. The optimal number of threads depends on whether these threads are doing something else (in Logstash, for example, pipeline threads also take care of parsing, which can be expensive) and on your destination hardware. As a rule of thumb, about 4 threads with few things to do (e.g. no grok or geoip in Logstash) per Elasticsearch data node should be enough to keep them busy. If threads have more processing to do, you may need more of them.
  • The same applies for processing data: many shippers work on logs in batches (recent versions of Logstash included) and can do this processing on multiple threads.
  • Distribute the load between all data nodes. This will prevent any one data node from becoming a hotspot. In Logstash specify an array of destination hosts. Or, you can start using Elasticsearch “client” nodes (with both node.data and node.master set to false in elasticsearch.yml) and point Logstash to two of those (for failover).
  • The same applies for the shipper sending data to the central Logstash servers – the load needs to be balanced between them. For example, in Filebeat you can specify an array of destination Logstash hosts or you can use Kafka as a central buffer.
  • Make sure there’s enough memory to do the processing (and buffering, if the shipper buffers in memory). For Logstash, the default 1GB of heap may not cope with heavy load – depending on how much processing you do, it may need 2GB or more (monitoring Logstash’s heap usage will tell for sure).
  • If you use grok and have multiple rules, put the rules matching more logs and the cheaper ones earlier in the array. Or use Ingest Nodes to do the grok instead of Logstash.

Tips for Elasticsearch

Let’s just dive into them:

  • Refresh interval. There’s an older blog post on how refresh interval influences indexing performance. The conclusions from it are still valid today: for Black Friday at least, you might want to relax the real-time-ness of your searches to get more indexing throughput.
  • Async transaction log. By default, Elasticsearch will fsync the transaction log after every operation (2.x) or request (5.x). You can relax this safety guarantee by setting index.translog.durability to async. This way it will fsync every 5s (default value for index.translog.sync_interval) and save you some precious IOPS.
  • Size based indices. If you’re using strict time-based indices (like one index every day), Black Friday traffic may cause a drop in indexing throughput like this (mainly because of merges):

black-friday-log-management

Indexing throughput graph from SPM Elasticsearch monitor

In order to continue writing at that top speed, you’ll need to rotate indices before they reach that “wall size”, which is usually at 5-10GB per shard. The point is to rotate when you reach a certain size, and not purely by time, and use an alias to always write to the latest index (in 5.x this is made easier with the Rollover Index API).

  • Ensure load is balanced across data nodes. Otherwise some nodes will become bottlenecks. This requires your number of shards to be proportional to the number of data nodes. Feel free to twist Elasticsearch’s arm into balancing shards by configuring index.routing.allocation.total_shards_per_node: for example, if you have 4 shards and one replica on a 4-data-node cluster, you’ll want a maximum of 2 shards per node.
  • Overshard so you can scale out if you need to, while keeping your cluster balanced. You’d do this by setting a [reasonable] number of shards that has enough divisors. For example, if you have 4 data nodes then 12 shards and 1 replica per shard might work well. You could scale up to 6, 8, 12 or even 24 nodes and your cluster will still be perfectly balanced.
  • Relax the merge policy. This will slow down your full-text searches a bit (though aggregations would perform about the same), use some more heap and open files in order to allow more indexing throughput. 50 segments_per_tier, 20 max_merge_at_once and 500mb max_merged_segment should give you a good boost.
  • Don’t store what you don’t need. Disable _all and search in specific fields (and search in “message” or some other general field by default via index.query.default_field to it). Skip indexing fields not used for full-text search and skip doc values for fields on which you don’t aggregate.
  • Use doc values for aggregations (instead of the in-memory field data) – this is the default for all fields except analyzed strings since 2.0, but you’ll need to be extra careful if you’re still on 1.x. Otherwise you’ll risk running out of heap and crash/slow down your cluster.
  • Use dedicated masters. This is also a stability measure that helps your cluster remain consistent even if load makes your data nodes unresponsive.

You’ll find even more tips and tricks, as well as more details on implementing the above, in our Velocity 2016 presentation. But the ones described above should give you the most bang per buck (or rather, per time, but you know what they say about time) for this Black Friday.

Final Words

Tuning & scaling Elasticsearch isn’t rocket science, but it often requires time, money or both. So if you’re not into taking care of all this plumbing, we suggest delegating this task to us by using Logsene, our log analytics SaaS. With Logsene, you’d get:

  • The same Elasticsearch API when it comes to indexing and querying. We have Kibana, too, in addition to our own UI, plus you can use Grafana Elasticsearch integration.
  • Free trials for any plan, even the Black Friday-sized ones. You can sign up for them without any commitment or credit card details.
  • No lock in – because of the Elasticsearch API, you can always go [back] to your own ELK Stack if you really want to manage your own Elasticsearch clusters. We can even help you with that via Elastic Stack consulting, training and production support.
  • A lot of extra goodies on top of Elasticsearch, like role-based authentication, alerting and integration with SPM for your application monitoring. This way you can have your metrics and logs in one place.

If, on the other hand, you are passionate about this stuff and work with it, you might like to hear that we’re hiring worldwide, on a wide range of positions (at the time of this writing there are openings for backend, frontend (UX, UI, ReactJS, Redux…), sales, work on Docker, consulting and training). 🙂

5 Logstash Alternatives

When it comes to centralizing logs to Elasticsearch, the first log shipper that comes to mind is Logstash. People hear about it even if it’s not clear what it does:
– Bob: I’m looking to aggregate logs
– Alice: you mean… like… Logstash?

When you get into it, you realize centralizing logs often implies a bunch of things, and Logstash isn’t the only log shipper that fits the bill:

  • fetching data from a source: a file, a UNIX socket, TCP, UDP…
  • processing it: appending a timestamp, parsing unstructured data, adding Geo information based on IP
  • shipping it to a destination. In this case, Elasticsearch. And because Elasticsearch can be down or struggling, or the network can be down, the shipper would ideally be able to buffer and retry

In this post, we’ll describe Logstash and its alternatives – 5 “alternative” log shippers (Filebeat, Fluentd, rsyslog, syslog-ng and Logagent), so you know which fits which use-case.
Read More

Elastic Stack Import-Export with Logstash & Logsene

In earlier posts, we explained how one can reindex data from one Elasticsearch cluster to another, or within the same Elasticsearch cluster, via tools like Logstash and rsyslog.

The same recipes apply to Logsene, as it exposes the Elasticsearch API. Not only can you push data to Logsene with everything that talks to Elasticsearch (such as Logstash), but you can also use Elasticsearch’s Scroll API to export data from Logsene. All you need to remember is that with Logsene, you need to specify your app token as the index name.

Migrating data from your in-house ELK stack to Logsene

Let’s say you already have an Elastic stack deployed, but you want to migrate existing logs to Logsene. Maybe because you’re spending too much time and money on managing and scaling Elasticsearch, and you’d like to outsource that. Or because you’d like built-in features of Logsene like role-based access control or anomaly detection. Either way, you can migrate your data and keep using Elasticsearch-focused tools:

input {
  elasticsearch {
   hosts => ["localhost:9200"]
   index => "logstash-*"
  }
}

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:80"
    index => "DESTINATION_LOGSENE_APP_TOKEN"
    manage_template => false
  }
}

NOTE: Since Logsene plans are based on ingestion volume and retention, that initial import throughput spike may influence your costs. That shouldn’t be a problem if you just started and have a big enough trial plan. Even if the trial is over and go over the selected plan, you’ll pay at the same per-GB rate.

Reindexing data from one Logsene app to another

Let’s say you’re prototyping, you’re tweaking your Logstash grok rules, but you’d like to use a custom template. For the new template to apply, you’ll need a new index (i.e. a new Logsene app). So you can go ahead and create it, and then reindex the data from the first app with Logstash. Here’s a sample config (though you can also add filters to change data along the way). Except now, the source is not your in-house Elasticsearch cluster, but a Logsene app that already has logs you want to reindex:

input {
  elasticsearch {
   hosts => ["logsene-receiver.sematext.com:80"]
   index => "SOURCE_LOGSENE_APP_TOKEN"
  }
}

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:80"
    index => "DESTINATION_LOGSENE_APP_TOKEN"
    manage_template => false
  }
}

NOTE: If you want SSL encryption, just add ssl => true and change the port to 443.

Exporting data from Logsene

Even if Logsene comes with Amazon S3 log archiving, you might need to export your logs somewhere else using – you guessed it! – a similar config:

input {
  elasticsearch {
   hosts => ["logsene-receiver.sematext.com:80"]
   index => "LOGSENE_APP_TOKEN"
  }
}

output {
  file {
    path => "/mnt/big_disk/big_log"
  }
}

See? No lock-in! With Logsene you can also easily go back to self-hosted, if you want to build something custom around your ELK stack for example. We can actually help you with that, through Elasticsearch and logging trainings and through logging consulting.

Elasticsearch Ingest Node vs Logstash Performance

Starting from Elasticsearch 5.0, you’ll be able to define pipelines within it that process your data, in the same way you’d normally do it with something like Logstash. We decided to take it for a spin and see how this new functionality (called Ingest) compares with Logstash filters in both performance and functionality.

Specifically, we tested the grok processor on Apache common logs (we love logs here), which can be parsed with a single rule, and on CISCO ASA firewall logs, for which we have 23 rules. This way we could also check how both Ingest and Logstash scale when you start adding more rules.

Baseline performance

To get a baseline, we pushed logs with Filebeat 5.0alpha1 directly to Elasticsearch, without parsing them in any way. We used an AWS c3.large for Filebeat (2 vCPU) and a c3.xlarge for Elasticsearch (4 vCPU). We also installed SPM to monitor Elasticsearch’s performance.

It turned out that network was the bottleneck, which is why pushing raw logs doesn’t saturate the CPU:
raw logs CPU

Even though we got a healthy throughput rate of 12-14K EPS:
raw logs throughput

But raw, unparsed logs are rarely useful. Ideally, you’d log in JSON and push directly to Elasticsearch. Conveniently, Filebeat can parse JSON since 5.0. That said, throughput dropped to about 4K EPS because JSON logs are bigger and saturate the network:
Throughput of JSON logs

CPU dropped as well, but not that much because now Elasticsearch has to do more work (more fields to index):
JSON logs CPU

This 4K EPS throughput/40 percent CPU ratio is the most efficient way to send logs to Elasticsearch – if you can log in JSON. If you can’t, you’ll need to parse them. So we added another c3.xl instance (4 vCPUs) to do the parsing, first with Logstash, then with a separate Elasticsearch dedicated Ingest node.

Logstash

With Logstash 5.0 in place, we pointed Filebeat to it, while tailing the raw Apache logs file. On the Logstash side, we have a beats listener, a grok filter and an Elasticsearch output:

input {
  beats {
    port => 5044
  }
}

filter {
   grok {
     match => ["message", "%{COMMONAPACHELOG}%{GREEDYDATA:additional_fields}"]
   }
}

output {
  elasticsearch {
    hosts => "10.154.238.233:9200"
    workers => 4
  }
}

The default number of 2 pipeline workers seemed enough, but we’ve specified more output workers to make up for the time each of them waits for Elasticsearch to reply. That said, network was again the bottleneck so throughput was capped at 4K EPS like with JSON logs:
Logstash apache logs throughput

Meanwhile, Logstash used just about the same amount of CPU as Elasticsearch, at 40-50%:
Logstash apache logs CPU usage

Then we parsed CISCO ASA logs. The config looks similar, except there were 23 grok rules instead of one. Logstash handled the load surprisingly well – throughput was again capped by the network, slightly lower than before because JSONs were bigger:
Logstash CISCO ASA grok throughput

While CPU usage only increased to 60-70%:
Logstash CISCO ASA CPU usage

This means the throughput-to-CPU ratio only went down by about 1.5x after adding a lot more rules. However, in both cases Logstash proved pretty heavy, using about the same CPU to parse the data as Elasticsearch used for indexing it. Let’s see if the Ingest node can do better.

Ingest node

We used the same c3.xl instance for Ingest node tests: we’ve set node.master and node.data to false in its elasticsearch.yml, to make sure it only does grok and nothing else. We’ve also set node.ingest to false of the data node, so it can focus on indexing.

Next step was to define a pipeline that does the grok processing on the Ingest node:

curl -XPOST localhost:9200/_ingest/pipeline/apache?pretty -d '{
  "description": "grok apache logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "pattern": "%{COMMONAPACHELOG}%{GREEDYDATA:additional_fields}"
      }
    }
  ]
}'

Then, to trigger the pipeline for a certain document/bulk, we added the name of the defined pipeline to the HTTP parameters like pipeline=apache. We used curl this time for indexing, but you can add various parameters in Filebeat, too.

With Apache logs, the throughput numbers were nothing short of impressive (12-16K EPS):
ingest node apache logs grok throughput

This used up all the CPU on the data node, while the ingest node was barely breaking a sweat at 15%:
ingest node grok apache logs CPU usage

Because Filebeat only sent raw logs to Elasticsearch (specifically, the dedicated Ingest node), there was less strain on the network. The Ingest node, on the other hand, also acted like a client node, distributing the logs (now parsed) to the appropriate shards, using the node-to-node transport protocol. Overall, the Ingest node provided ~10x better CPU-to-throughput ratio than Logstash.

Things still look better, but not this dramatic, with CISCO ASA logs. We have multiple sub-types of logs here, and therefore multiple grok rules. With Logstash, you can specify an array of match directives:

grok {
  match => [
   "cisco_message", "%{CISCOFW106001}",
   "cisco_message", "%{CISCOFW106006_106007_106010}",
...

There’s no such thing for Ingest node yet, so you need to define one rule, and then use the on_failure block to define another grok rule (effectively saying “if this rule doesn’t match, try that one”) and keep nesting like that until you’re done:

"grok": {
  "field": "cisco_message",
  "pattern": "%{CISCOFW106001}",
  "on_failure": [
    {
      "grok": {
      "field": "cisco_message",
      "pattern": "%{CISCOFW106006_106007_106010}",
      "on_failure": [...

The other problem is performance. Because now there are up to 23 rules to evaluate, throughput goes down to about 10K EPS:
Ingest node CISCO ASA grok throughput

And the CPU bottleneck shifts to the Ingest node:
Ingest node CISCO ASA grok CPU

Overall, the throughput-to-CPU ratio of the Ingest node dropped by a factor of 9 compared to the Apache logs scenario.

Conclusions

  • Logstash is easier to configure, at least for now, and performance didn’t deteriorate as much when adding rules
  • Ingest node is lighter across the board. For a single grok rule, it was about 10x faster than Logstash
  • Ingest nodes can also act as “client” nodes
  • Define the grok rules matching most logs first, because both Ingest and Logstash exit the chain on the first match by default

You’ve made it all the way down here? Bravo! If you need any help with Elasticsearch – don’t forget @sematext does Elasticsearch Consulting, Production Support, as well as Elasticsearch Training.

AWS CloudWatch / VPC Logs to Logsene

Sending AWS CloudWatch/VPC Logs to Logsene

Use-case: you’re using AWS VPC and want visibility over the connection to your VPC. Which IPs are allowed or denied connections to certain ports, how much traffic goes through each connection and so on.

 

Solution: send AWS VPC logs (one type of CloudWatch logs) to a Logsene application. There, you can search these logs, visualize them and set up alerts. This post will show you how to forward VPC logs (any CloudWatch logs, for that matter) to Logsene using an AWS Lambda function.

The main steps for implementing the solution are:

  1. Create a Flow Log for your VPC, if there isn’t one already. This will send your AWS VPC logs to CloudWatch
  2. Create a new Lambda Function, which will parse and forward the CloudWatch/VPC logs
  3. Clone this GitHub repo and fill in your Logsene Application Token, create a ZIP file with the contents of the cloned repository, and configure the new Lambda function to use the created ZIP file as code
  4. Decide on the maximum memory to allocate for this function and the timeout for its execution
  5. Explore your logs in Logsene 🙂

Create a Flow Log

To start, log in to your AWS Console, then go to Services -> VPC. There, select your VPC, right-click it and select Create Flow Log:
createflowlog

Then you’ll need to set up a IAM role that’s able to push VPC logs to your CloudWatch account (if you don’t have one already) and then choose a name for this flow. You’ll use the name later on in the lambda function.
flowlog

Create a new AWS Lambda function

Now go to Services -> Lambda and get started with a new function. Then the first step is to select a blueprint for your function. Take cloudwatch-logs-process-data:

blueprint

The next step is to select a source. Here you’d make sure the source type is CloudWatch Logs and select the flow you just created. You can filter only certain logs, but you’d normally leave the Filter Pattern empty to process all of them. Nevertheless, you need to give this filter a name:

source

At the next step, you’d configure the function itself. First you give it a name:

name

Then you have to specify the code:

Add the code to your Lambda function

First you’d need to clone the GitHub repository:

git@github.com:sematext/logsene-aws-lambda-s3.git

Then, open index.js and fill in your Logsene application token in the logseneToken variable. To find the Logsene Application Token, go to your Sematext Account, then in the Services menu select Logsene, and then the Logsene application you want to send your logs to. Once you’re in that application, click the Integration button and you’ll see the application token:
token

Now your code is ready, so you need to make a zip file out of it. Note: make sure you zip only the contents of the repository, not the directory containing the repository. Like:

pwd # /tmp/cloned-repos/logsene-aws-lambda-cloudwatch zip -r logsene.zip *

Finally, you’d upload the zip to AWS Lambda as the function code:
upload

Finalize the function configuration

After the code, leave the handler to the default index.handler and select a role that allows this function to execute. You can create a new Basic execution role to do that (from the drop-down) or select a basic execution role that you’ve already created:
role

Then, you need to decide on how much memory you allow for the function and how long you allow it to run. This depends on the log throughput (more logs will need more processing resources) and will influence costs (i.e. like keeping the equivalent general-purpose instance up for that time). Normally, runtime is very short so even large resources shouldn’t generate significant costs. 256MB of memory and a 30 second timeout should be enough for most use-cases:
memory

To enable the function to run when new logs come in, you’d need to enable the source with your Flow Log name at the last step.
enable

Exploring CloudTrail logs with Logsene

As logs get generated by VPC, the function should upload their contents to Logsene. You can use the native UI to explore those logs:

native

And because VPC logs get parsed out of the box, you can also use Kibana 4 to generate visualizations. Like breaking down connections by the number of bytes:

Kibana

Happy Logsene-ing! 🙂

Using Filebeat to Send Elasticsearch Logs to Logsene

One of the nice things about our log management and analytics solution Logsene is that you can talk to it using various log shippers.  You can use Logstash, or you can use syslog protocol capable tools like rsyslog, or you can just push your logs using the Elasticsearch API just like you would to send data to a local Elasticsearch cluster. And like any good DevOps team, we like to play with all the tools ourselves.  So we thought the timing was right to make Logsene work as a final destination for data sent using Filebeat.

With that in mind, let’s see how to use Filebeat to send log files to Logsene.  In this post we’ll ship Elasticsearch logs, but Filebeat can tail and ship logs from any log file, of course.

Read More

PagerDuty and Logsene Integration

Great news for for those of us who use PagerDuty and manage — or are considering managing — logs with Logsene: PagerDuty and Logsene are now integrated!

This integration is a huge time- and aggravation-saver for DevOps professionals who wouldn’t mind dramatically reducing the frequent “noise” from log-generated monitoring alarms.

In case you’re not familiar, Logsene is an enterprise-class log management solution. Logsene can receive logs from a wide array of logs shippers, such as Fluentd, Logstash, and Syslog, and supports many logging frameworks for programming languages such as: Java, Scala, Go, Node.js, Ruby, Python, .Net, Perl, and more.  Among other capabilities, Logsene exposes the Elasticsearch API, works with Kibana and with Grafana (video), and has built-in alerts and anomaly detection.  It is available both in the Cloud (SaaS) and On Premises.

Logsene also integrates with SPM Performance Monitoring to correlate metrics, events, and logs in a single UI (check out Integrate PagerDuty with SPM Performance Monitoring for those instructions, which are very similar to what you will see here).

In PagerDuty:

Create a new service:

1) In your account, go to Services click +Add New Service

2) Enter in a name for your new service

3) Start typing “Sematext” for the Integration Type, which will narrow your filtering

PagerDuty_image

4) Select an escalation policy. Then, adjust the incident settings to your liking, then click Add Service.

5) Once the service is created, you’ll be taken to the service page. On this page, you’ll see the Service Integration Key​, which you will need when you configure Sematext products to send events to PagerDuty. Copy the Service Integration Key to the clipboard.

PagerDuty_2

In Logsene

1) Navigate to App Actions of your Logsene App by clicking the App Settings menu item.

PagerDuty_3

2) Navigate to Alerts / PagerDuty

3) Enter the API key from PagerDuty in the field Service API key.

4) Press Save

PagerDuty_4

5) To enable PagerDuty Notifications, navigate to Alerts /Notification Transports

6) Select PagerDuty

PagerDuty_5

Done. Every alert from your Logsene app will be forwarded to PagerDuty, where you can manage escalation policies and configure notifications to other services like HipChat, Slack, Zapier, Flowdock, and more.

Like what you saw here? To integrate PagerDuty with Logsene just get a free account here!  And drop us an email or hit us on Twitter with suggestions, questions or comments.

How to forward CloudTrail (or other logs from AWS S3) to Logsene

This recipe shows how to send CloudTrail logs (which are .gz logs that AWS puts in a certain S3 bucket) to a Logsene application, but should apply to any kinds of logs that you put into S3. We’ll use AWS Lambda for this, but you don’t have to write the code. We’ve got that covered.

The main steps are:
0. Have some logs in an AWS S3 bucket 🙂
1. Create a new AWS Lambda function
2. Paste the code from this repository and fill in your Logsene Application Token
3. Point the function to your S3 bucket and give it permissions
4. Decide on the maximum memory to allocate for the function and the timeout for its execution
5. Explore your logs in Logsene 🙂

Read More

Sematext Joins Docker’s ETP Program for Logging

Docker_ETP_Program_logo_squareSematext has just been recognized by Docker as an Ecosystem Technology Partner (ETP) for logging.  This designation indicates that Logsene has contributed to the logging driver and is available to users and organizations that seek solutions to capture logging data for monitoring their Dockerized distributed applications.

Log Management for Docker

“Sematext brings years of logging and monitoring expertise to the Docker community,” said Nick Stinemates, Head of Business Development and Technical Alliances at Docker.  “As an active participant in the Docker community, Sematext has provided logging solutions like Logsene and SPM for Docker, and contributed valuable user education and resources through informative webinars and blogs.”

Logsene & Docker

Logsene is a centralized logging, alerting and anomaly detection solution, available in the Cloud and On Premises.  Logsene delivers critical operational and business insights from data generated by Docker containers, applications and servers, and other devices.  Some DevOps engineers even think of Logsene as “ELK Stack on steroids.”  Logsene also integrates seamlessly with SPM, a performance monitoring, alerting and anomaly detection tool for Docker and many other platforms used by DevOps teams.

The following screenshot shows expanded views for Docker Events and Alerts (top), Container Logs (middle) and Container Metrics (bottom):

Docker_ETP_Container_CPU_annotation

Sematext SPM, showing Docker Events, Logs and Metrics

If you need more functionality to slice and dice logs then move to the Logsene UI shown below. The screenshot shows Container Log search (top) and detailed log messages tagged with container information and parsed fields (middle). Both the detail view in the middle and the Fields & Filters on the right side contain buttons to drill down into logs – e.g., to filter for the logs of a specific Docker Image or Docker Container.

Docker_ETP_Logsene_copy

Logsene User Interface – showing Docker log search, filtering options, log messages, & log events sorted by format

1-Minute Deployment in Tutum

One of the benefits of using SPM and Logsene for Docker monitoring, logging, and events is how easily they can be launched on Tutum.  It’s basically one minute: click-click-done!  For Docker users this means a single solution, a single container that captures not just logs or just metrics, but both container metrics and logs, plus Docker events, plus Docker host metrics and its logs.

Docker_ETP_Agent_Tutum_button

Sematext Docker Agent on Docker Hub

Sematext Docker Agent image is available on Docker Hub, and we shared the Tutum Stackfile for Sematext Docker Agent on Stackfiles.io – but the easiest way is to go via Sematext UI, which generates the stackfiles for you, including Application Tokens, as demonstrated in the video.

Docker_ETP_Tutum_create_stack

Sematext Docker Agent Stackfile in Tutum Cloud, ready to deploy

Docker’s ETP Program

Docker’s ETP program recognizes ecosystem partners like Sematext that have demonstrated integration with the Docker platform. As part of the program, Docker will highlight a capability area within the application lifecycle, validate integration and communicate the availability of the partner’s solution to the community and the market. The goal of the program is to ensure that logging tools like Logsene have been working with Docker to ensure the highest degree of availability and performance of distributed applications. Like the other partners in this program, Sematext has proven integration with the Docker platform and has demonstrated that Logsene is able to record logging data for dockerized applications.

“Sematext has been on the forefront of Docker monitoring, along with Docker event collection, charting and correlation with metrics,” said Otis Gospodnetić, Sematext’s Founder and CEO.  “So it was a natural next step to incorporate Docker logging via our Logsene log management solution.  The combination of SPM and Logsene not only allows for correlation of Docker metrics and logs, but also metrics and logs of applications running inside containers, along with anomaly detection and alerting. All this makes it much easier to troubleshoot performance and other issues much faster and with a lot less hassle than using more traditional or siloed solutions.”

Not using Logsene yet? Check out the free 30-day trial by registering here (ping us if you’re a startup, a non-profit, or educational institution – we’ve got special pricing for you!).  There’s no commitment and no credit card required.