es-logsene-arrows

Elastic Stack Import-Export with Logstash & Logsene

In earlier posts, we explained how one can reindex data from one Elasticsearch cluster to another, or within the same Elasticsearch cluster, via tools like Logstash and rsyslog.

The same recipes apply to Logsene, as it exposes the Elasticsearch API. Not only can you push data to Logsene with everything that talks to Elasticsearch (such as Logstash), but you can also use Elasticsearch’s Scroll API to export data from Logsene. All you need to remember is that with Logsene, you need to specify your app token as the index name.

Migrating data from your in-house ELK stack to Logsene

Let’s say you already have an Elastic stack deployed, but you want to migrate existing logs to Logsene. Maybe because you’re spending too much time and money on managing and scaling Elasticsearch, and you’d like to outsource that. Or because you’d like built-in features of Logsene like role-based access control or anomaly detection. Either way, you can migrate your data and keep using Elasticsearch-focused tools:

input {
  elasticsearch {
   hosts => ["localhost:9200"]
   index => "logstash-*"
  }
}

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:80"
    index => "DESTINATION_LOGSENE_APP_TOKEN"
    manage_template => false
  }
}

NOTE: Since Logsene plans are based on ingestion volume and retention, that initial import throughput spike may influence your costs. That shouldn’t be a problem if you just started and have a big enough trial plan. Even if the trial is over and go over the selected plan, you’ll pay at the same per-GB rate.

Reindexing data from one Logsene app to another

Let’s say you’re prototyping, you’re tweaking your Logstash grok rules, but you’d like to use a custom template. For the new template to apply, you’ll need a new index (i.e. a new Logsene app). So you can go ahead and create it, and then reindex the data from the first app with Logstash. Here’s a sample config (though you can also add filters to change data along the way). Except now, the source is not your in-house Elasticsearch cluster, but a Logsene app that already has logs you want to reindex:

input {
  elasticsearch {
   hosts => ["logsene-receiver.sematext.com:80"]
   index => "SOURCE_LOGSENE_APP_TOKEN"
  }
}

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:80"
    index => "DESTINATION_LOGSENE_APP_TOKEN"
    manage_template => false
  }
}

NOTE: If you want SSL encryption, just add ssl => true and change the port to 443.

Exporting data from Logsene

Even if Logsene comes with Amazon S3 log archiving, you might need to export your logs somewhere else using – you guessed it! – a similar config:

input {
  elasticsearch {
   hosts => ["logsene-receiver.sematext.com:80"]
   index => "LOGSENE_APP_TOKEN"
  }
}

output {
  file {
    path => "/mnt/big_disk/big_log"
  }
}

See? No lock-in! With Logsene you can also easily go back to self-hosted, if you want to build something custom around your ELK stack for example. We can actually help you with that, through Elasticsearch and logging trainings and through logging consulting.

DevOps

Search-Devops.com – Search DevOps Projects

As the world of software is growing, so is the ecosystem of DevOps tools and resources – for monitoring, for logging, for alerting, for continuous integration and deployment, configuration management, etc.  Nothing wrong with having lots of resources and tools, but here at Sematext we like to find answers as quickly and efficiently as possible; after all, we are known (among other things) for Search and Big Data expertise.
To that end, we’re happy to announce search-devops.com (aka “SD”), which aggregates, indexes and makes searchable all content repositories (mailing lists, source code, wikis, issue trackers, etc.) for a number of open source DevOps projects:

Puppet Collectd Logstash
Chef InfluxDB Fluentd
Saltstack Prometheus Rsyslog
Ansible Sensu Syslog-ng
CFEngine Statsd Nxlog
Docker Zabbix Log4J2
CoreOS Nagios log4php
Vagrant Icinga Kibana
Graphite Ganglia Grafana

Activity has been steadily growing the past few months (see chart below for Docker example) and there is a ton of usable and helpful content available right now.

SD_Docker_screen

The Search-DevOps Suggestion Box is Open!

Got suggestions for new DevOps tools to add? Or new functionality on the site?  Drop us an email or hit us on Twitter.  We are eager to support the DevOps community and improve everyone’s work lives and tool sets.

docker_monitoring_oss_cover

Open Source Docker Monitoring & Logging

Pets ⇒ Cattle ⇒ Orchestration

Docker is growing by leaps and bounds, and along with it its ecosystem.  Being light, the predominant container deployment involves running just a single app or service inside each container.  Most software products and services are made up of at least several such apps/services.  We all want all our apps/services to be highly available and fault tolerant.  Thus, Docker containers in an organization quickly start popping up like mushrooms after the rain.  They multiply faster than rabbits. While in the beginning we play with them like cute little pets, as their number quickly grow we realize we are dealing with aherd of cattle, implying we’ve become cowboys.  Managing a herd with your two hands, a horse, and a lasso willget you only so far.  You won’t be able to ride after each and every calf that wonders in the wrong direction.  To get back to containers from this zoological analogy – operating so many moving pieces at scale is impossible without orchestration – this is why we’ve seen the rise of Docker Swarm, Kubernetes, Mesos, CoreOS, RancherOS and so on.

 

Containers multiply faster than Gremlins


Pets ⇒ Cattle ⇒ Orchestration + Operational Insights

Container orchestration helps you manage your containers, their placement, their resources, and their whole life cycle.  While containers and applications in them are running, in addition to the whole life cycle management we need container monitoring and log management so we can troubleshoot performance or stability issues, debug or tune applications, and so on.  Just like with orchestration, there are a number of open-source container monitoring and logging tools.  It’s great to have choices, but having lots of them means you need to evaluate and compare them to pick the one that best matches your needs.

DevOps Tools Comparison

We’ve open-sourced our Sematext Docker Agent (SDA for short) which works with SPM for monitoring and Logsene for log management (think of it as ELK as a Service), and wanted to provide a high level comparison of SDA and several popular Docker monitoring and logging tools, like CAdvisor, Logspout, and others.  In the following table we group tools by functionality and include monitoring agents, log collectors and shippers, storage backends, and tools that provide the UI and visualizations.  For each functionality we list in the “Common Tools” column one or more popular open-source tools that provide that functionality.  An empty “Common Tools” cell means there are no popular open-source tools that provide it, or at least we are not aware of it — if we messed something up, please leave a comment or tweet @sematext.

Functionality Common Tools Sematext Tools
Collect Logs from Docker API
(including auto-discovery of new containers)
Logspout Sematext Docker Agent
Log routing Logspout
Routing setup for containers via HTTP API to syslog, redis, kafka, logstash
Docker Logging Drivers (e.g. syslog, journald, fluentd, etc.)
Sematext Docker Agent
(routing of logs to different indices based on container labels)
Automatic log tagging
(with Docker Compose or Swarm or Kubernetes metadata)
For Kubernetes: fluentd-elasticsearch, assumes Elasticsearch deployed locally Sematext Docker Agent
Collect Docker Metrics CAdvisor Sematext Docker Agent
Collect Docker Events ? Sematext Docker Agent
Logs format detection
(most tools need a static setup per logfile/application)
? Sematext Docker Agent
(out of the box format detection and parsing; the parser and the logagent-js pattern library is open source)
Logs parsing and shipping Fluentd
Logstash
rsyslog
syslog-ng
Sematext Docker Agent
Logs storage and indexing Elasticsearch
Solr
Logsene
(exposes Elasticsearch API)
Logs anomaly detection and alerting ? Logsene
Log search and analytics Kibana
Grafana
Logsene
(Logsene’s own UI or integrated Kibana, or Grafana connected to Logsene via Elasticsearch data source)
Metrics storage and aggregation Graphite
OpenTSDB
KairosDB
Elasticsearch
Influxdb
Prometheus
SPM
Metrics charts and dashboards Grafana
Kibana
SPM
Metrics anomaly detection and alerting Influxdb
Prometheus
SPM
Correlation of Metrics, Logs and Events ? SPM & Logsene integration

This table shows a few things:

  • Some of the functionality provided by SPM and Logsene is not available in some of the most popular open-source monitoring and logging tools included here
  • Some of the SPM and Logsene functionality is indeed provided by some of the open-source tools, however none of them seems to encompass all the features, forcing one to mix and match and head down the tech debt-ridden Franken-monitoring path
  • Try it yourself in the MindMap below – pick a few functionalities and see how many different tools you might have to use?
    docker_monitoring_oss

 

Avoid building technical-debt & Franken-monitoring by using a limited number of Docker monitoring & logging tools

Again, if we missed something, please leave a comment or tweet @sematext.
If you want to try Sematext Docker Agent sign up for a free trial.

P.S.: Sematext Docker Agent is available in the RancherOS Community Catalog and shows up with our new mascot “Octi” only one more pet 🙂 – so if you use RancherOS search for “sematext” in the RancherOS Catalog and within a few clicks you’ll have the Sematext Docker Agent deployed to your RancherOS clusters!

octi-sda

SIGN UP – FREE TRIAL

octi-docker-datacenter

Monitoring Docker Datacenter Logs & Metrics

Docker Datacenter (DDC) simplifies container orchestration and increases the flexibility and scalability of application deployments.  However, the high level of automation create new challenges for monitoring and log management. Organizations that introduce Docker Datacenter manage container deployments in various scenarios e.g., on bare metal, virtual machines, or hybrid clouds. That’s why at Sematext we are seeing a shift from traditional server monitoring to container-centric monitoring. This post is an excerpt from the newly published “Reference Architecture: Monitoring and Logging for Docker Datacenter” and shows how Docker Datacenter could be extended with Logging and Monitoring services.

Download Reference Architecture Logging & Monitoring for Docker Datacenter

The Docker Universal Control Plane (UCP) management functionalities include real-time monitoring of the cluster state, real-time metrics and logs for each container. However, operating larger infrastructures requires a longer retention time for logs and metrics and the capability to correlate metrics, logs and events on several levels (cluster, nodes, applications and containers).  A comprehensive monitoring and logging solution ought to provide the following operational insights:

Read More

docker-monitoring-no-text-1

Container Monitoring: Top Docker Metrics to Watch

Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resources of one or more underlying hosts. It’s not uncommon for Docker servers to run thousands of short-term containers (e.g., for batch jobs) while a set of permanent services runs in parallel. Traditional monitoring tools not used to such dynamic environments are not suited for such deployments. On the other hand, some modern monitoring solutions (e.g. SPM from Sematext) were built with such dynamic systems in mind and even have out of the box reporting for docker monitoring. Moreover, container resource sharing calls for stricter enforcement of resource usage limits, an additional issue you must watch carefully. To make appropriate adjustments for resource quotas you need good visibility into any limits containers have reached or errors they have caused. We recommend using alerts according to defined limits; this way you can adjust limits or resource usage even before errors start happening.

Read More

Kafka Consumer Lag Monitoring

SPM is one of the most comprehensive Kafka monitoring solutions, capturing some 200 Kafka metrics, including Kafka Broker, Producer, and Consumer metrics. While lots of those metrics are useful, there is one particular metric everyone wants to monitor – Consumer Lag.

What is Consumer Lag

When people talk about Kafka or about a Kafka cluster, they are typically referring to Kafka Brokers. You can think of a Kafka Broker as a Kafka server. A Broker is what actually stores and serves Kafka messages. Kafka Producers are applications that write messages into Kafka (Brokers). Kafka Consumers are applications that read messages from Kafka (Brokers).

Inside Kafka Brokers data is stored in one or more Topics, and each Topic consists of one or more Partitions. When writing data a Broker actually writes it into a specific Partition. As it writes data it keeps track of the last “write position” in each Partition. This is called Latest Offset also known as Log End Offset. Each Partition has its own independent Latest Offset.

Just like Brokers keep track of their write position in each Partition, each Consumer keeps track of “read position” in each Partition whose data it is consuming. That is, it keeps track of which data it has read. This is known as Consumer Offset. This Consumer Offset is periodically persisted (to ZooKeeper or a special Topic in Kafka itself) so it can survive Consumer crashes or unclean shutdowns and avoid re-consuming too much old data.

Kafka Consumer Lag and Read/Write Rates
Kafka Consumer Lag and Read/Write Rates

In our diagram above we can see yellow bars, which represents the rate at which Brokers are writing messages created by Producers.  The orange bars represent the rate at which Consumers are consuming messages from Brokers. The rates look roughly equal – and they need to be, otherwise the Consumers will fall behind.  However, there is always going to be some delay between the moment a message is written and the moment it is consumed. Reads are always going to be lagging behind writes, and that is what we call Consumer Lag. The Consumer Lag is simply the delta between the Latest Offset and Consumer Offset.

Why is Consumer Lag Important

Many applications today are based on being able to process (near) real-time data. Think about performance monitoring system like SPM or log management service like Logsene. They continuously process infinite streams of near real-time data. If they were to show you metrics or logs with too much delay – if the Consumer Lag were too big – they’d be nearly useless.  This Consumer Lag tells us how far behind each Consumer (Group) is in each Partition.  The smaller the lag the more real-time the data consumption.

Monitoring Read and Write Rates

Kafka Consumer Lag and Broker Offset Changes
Kafka Consumer Lag and Broker Offset Changes

As we just learned the delta between the Latest Offset and the Consumer Offset is what gives us the Consumer Lag.  In the above chart from SPM you may have noticed a few other metrics:

  • Broker Write Rate
  • Consume Rate
  • Broker Earliest Offset Changes

The rate metrics are derived metrics.  If you look at Kafka’s metrics you won’t find them there.  Under the hood SPM collects a few metrics with various offsets from which these rates are computed.  In addition, it charts Broker Earliest Offset Changes, which is  the earliest known offset in each Broker’s Partition.  Put another way, this offset is the offset of the oldest message in a Partition.  While this offset alone may not be super useful, knowing how it’s changing could be handy when things go awry.  Data in Kafka has has a certain TTL (Time To Live) to allow for easy purging of old data.  This purging is performed by Kafka itself.  Every time such purging kicks in the offset of the oldest data changes.  SPM’s Broker Earliest Offset Change surfaces this information for your monitoring pleasure.  This metric gives you an idea how often purges are happening and how many messages they’ve removed each time they ran.

There are several Kafka monitoring tools out there that, like LinkedIn’s Burrow, whose Offset and Consumer Lag monitoring approach is used in SPM.  If you need a good Kafka monitoring solution, give SPM a go.  Ship your Kafka and other logs into Logsene and you’ve got yourself a DevOps solution that will make troubleshooting easy instead of dreadful.

 

5 Minute Recipe: Heroku Log Drain Setup

Since we wrote about how to ship Heroku Logs to ELK we’ve received good feedback from Heroku users and, encouraged by that feedback, deployed a log ingestion service for apps running on Heroku. This makes it super easy to get structured Heroku Logs into Logsene, the hosted ELK logging service.  Let’s see how that’s done in under five minutes (check the current time!):

Step 1 – Create your Logsene App

If you don’t have a Logsene account already simply get a free account and create a Logsene App. This will get you a Logsene Application Token.

Step 2 – Configure Log Drain for your Heroku App

Once you create your Logsene app you’ll see a command to set up the Heroku Log Drain including the Logsene Token.

Simply copy that command and run it in one of the two places:

  1. in the Heroku app directory, like this:

heroku drains:add https://logsene-heroku-receiver.sematext.com/LOGSENE_TOKEN

  1. alternatively, specify your app name in the command instead of calling the command from your Heroku app directory:

heroku drains:add https://logsene-heroku-receiver.sematext.com/LOGSENE_TOKEN -a YOUR_HEROKU_APP_NAME

Step 3 – Watch your Logs in Logsene

If you now access your Heroku App, Heroku should log your HTTP request and a few seconds later the logs will be visible in Logsene.  And not in just any format!  You’ll see PERFECTLY STRUCTURED HEROKU LOGS:

heroku-logs-in-logsene

Parsed Heroku Logs in Logsene

 

Check the time!  Under five minutes?  If you like your Heroku app logs in Logsene tweet us your setup time. 🙂

Sematext Elastic Stack Training

Elasticsearch / Elastic Stack Training – NYC June 13-16

Next month, June 13-16, 2016, we will be running three Elastic Stack (aka ELK Stack) classes in New York City:

  1. June 13 & 14: Elasticsearch for Developers Training Workshop
  2. June 15: Elasticsearch Operations Training Workshop
  3. June 16: Elasticsearch for Logging Training Workshop

All classes cover Elasticsearch 2.x as well as Elasticsearch 5.x!

You can see the complete course outlines under Training Overview.  All three classes include lots of valuable hands-on exercises.  Be prepared to learn a lot!

Cost:

  • 2-day course: $1,200 early bird rate (valid through June 1) and $1,500 afterwards.
  • 1-day course: $700 early bird rate (valid through June 1) and $800 afterwards.

There’s also a 50% discount for the purchase of a 2nd seat!

Location:
462 7th Avenue, New York, NY 10018 – see map

If you have any questions please get in touch.