Hiring: Elasticsearch Engineer – Instructor – Consultant

Sematext is hiring! More specifically, we are looking for people with Elasticsearch skills to join our Professional Services team. Our Elasticsearch Professional Services include:

  • Consulting
  • Production Support
  • Training

Most of our consulting work we do remotely, but occasionally a short on-site visit is needed. Our Production Support already benefits from our team being distributed, but we are looking to add more weight in the western hemisphere for better timezone coverage. We run public Elasticsearch classes several times a year and are looking to do them more frequently. We also offer on-site training, which most often takes place in North America and Europe.

We regularly work with some of the world’s biggest companies and are looking for teammates to join us as Elasticsearch support engineers and consultants, as well as instructors (aka trainers).

Because you can’t really be, become, and remain an Elasticsearch expert unless you work with Elasticsearch, you’ll also get to apply and improve your expertise by working on Logsene, a log management solution with Elasticsearch under the hood, Elasticsearch API, and Kibana integrated in the UI, as well as SPM, an application performance monitoring solution whose subsystems also include Elasticsearch, among other things.

We’re small, young, completely self-funded, distributed and diverse, contribute to open-source, write books, speak at conferences, etc. Because we have both the services side and multiple products there is a lot of variety and there are opportunities to contribute in different areas of our business and grow in multiple directions, both up and “sideways”.

If you’d like to join us, get in touch. See our Jobs page for other openings.

docker-swarm-sematext-agent

Docker “Swarm Mode”: Full Cluster Monitoring & Logging with 1 Command

Until recently, automating the deployment of Performance Monitoring agents in Docker Swarm clusters was challenging because monitoring agents had to be deployed to each cluster node and the previous Docker releases (<Docker engine v1.12 / Docker Swarm 1.2.4) had no global service scheduler (Github issue #601).  Scheduling services with via docker-compose and scheduling constraints required manual updates when the number of nodes changed in the swarm cluster – definitely not convenient for dynamic scaling of clusters! In Docker Swarm Monitoring and Logging we shared some Linux shell acrobatics as workaround for this issue.

The good news: All this has changed with Docker Engine v1.12 and new Swarm Mode. The latest release of Docker v1.12 provides many new features for orchestration and the new Swarm mode made it much easier to deploy Swarm clusters.  

 

With Docker v1.12 services can be scheduled globally – similar to Kubernetes DaemonSet, RancherOS global services or CoreOS global fleet services

 

Read More

Elastic Stack Training

Elasticsearch Training, San Francisco & New York, October

If you are using Elasticsearch and are looking for Elasticsearch training to quickly improve your Elastic Stack skills, we’ve running several Elasticsearch classes this October in San Francisco and New York.

All classes are also available virtually. This means you get to participate in the class, see the whiteboard, see and hear the instructor as well as other attendees, and they get to see and hear you….. without you having to travel.

Have two people attend the training from the same company? The second one gets 25% off.

To see the full course details, full outline and information about the instructor, click on the class names below.

solr-elasticsearch-training-fall-2016
San Francisco:

ny-solr-elasticsearch-training-fall-2016
New York City:

All classes include breakfast and lunch. If you have special dietary needs, please let us know ahead of time. If you have any questions about any of the classes you can use our live chat (see bottom-right of the page), email us at training@sematext.com or call us at 1-347-480-1610.

Solr Training

Solr Training, San Francisco & New York, October

If you are using Solr and are looking for Solr training to quickly improve your Solr or SolrCloud skills, we’ve running several Solr classes this October in San Francisco and New York.

Have two people attend the training from the same company? The second one gets 25% off.

To see the full course details, full outline and information about the instructor, click on the class names below.

solr-elasticsearch-training-fall-2016
San Francisco:

ny-solr-elasticsearch-training-fall-2016
New York City:

All classes include breakfast and lunch. If you have special dietary needs, please let us know ahead of time. If you have any questions about any of the classes you can use our live chat (see bottom-right of the page), email us at training@sematext.com or call us at 1-347-480-1610.

logstash-5-alternatives

5 Logstash Alternatives

When it comes to centralizing logs to Elasticsearch, the first log shipper that comes to mind is Logstash. People hear about it even if it’s not clear what it does:
– Bob: I’m looking to aggregate logs
– Alice: you mean… like… Logstash?

When you get into it, you realize centralizing logs often implies a bunch of things, and Logstash isn’t the only log shipper that fits the bill:

  • fetching data from a source: a file, a UNIX socket, TCP, UDP…
  • processing it: appending a timestamp, parsing unstructured data, adding Geo information based on IP
  • shipping it to a destination. In this case, Elasticsearch. And because Elasticsearch can be down or struggling, or the network can be down, the shipper would ideally be able to buffer and retry

In this post, we’ll describe Logstash and its alternatives – 5 “alternative” log shippers (Filebeat, Fluentd, rsyslog, syslog-ng and Logagent), so you know which fits which use-case.
Read More

rancheros-monitoring

RancherOS Monitoring and Logging Support

RancherOS is one of the few “container only” operating systems and it evolved into an excellent orchestration tool for containers, competing e.g. with CoreOS. It supports several types of schedulers such as its own “Cattle” scheduler, as well as Kubernetes, Docker Swarm, and Mesos. A unique feature of RancherOS is its GUI for container orchestration based on templates for application stacks. Rancher Labs maintains a catalog for such templates and has integrated a community catalog, which includes Sematext Docker Agent for the collection of metrics, events and logs from all RancherOS cluster nodes and containers.

Monitoring all RancherOS nodes can be done several different ways, depending on which orchestration tool you use:

  • Deployment via rancher-compose for the whole RancherOS cluster using the Cattle scheduler
  • Deployment via the GUI (rancher server) using the Community Catalog entry (available for the Cattle scheduler)
  • Deployment as Kubernetes DaemonSet via kubectl for the Kubernetes scheduler
  • Deployment as Swarm global service using Swarm scheduler

This post provides the walk-throughs for all these deployment/orchestration options, with the common goal of collecting metrics, logs and events from each RancherOS node and all containers.

rancheros-sematext-docker-agent

Sematext Logging and Monitoring on RancherOS

Setup via Sematext Catalog Entry

When you run the Rancher server user interface, simply search in the community catalog for “sematext”, “monitoring” or “logs” and select “Sematext Docker Agent”.

Rancher-Catalog-monitoring-search

Sematext Docker Agent in RancherOS Community Catalog

Choose “View Details”, and in the “Configuration Options” enter the SPM and Logsene App tokens. You can obtain these from https://apps.sematext.com, where you can sign up and create your SPM and Logsene apps. If your Rancher cluster runs behind a firewall, you might need to specify the proxy URL in the HTTPS_PROXY or HTTP_PROXY environment variable.

Sematext-catalog-options

If you’d like to collect all logs, just press “Launch” without specifying any filter for containers or images.

You can find all details about the template on RancherOS blog New in Rancher Community Catalog: Monitoring and Logging by Sematext.

Setup via rancher-compose command

If you prefer rancher-compose over GUIs and use the Cattle scheduler then rancher-compose is the right tool to deploy Sematext Docker Agent. The following configuration will activate Sematext Docker Agent on every node in the RancherOS cluster. You’ll need to replace the SPM/Logsene App tokens, of course:

# sematext/docker-compose.yml
sematext-docker-agent:  
 image: 'sematext/sematext-agent-docker:latest'
 environment:
   - LOGSENE_TOKEN=YOUR_LOGSENE_TOKEN
   - SPM_TOKEN=YOUR_SPM_TOKEN 
 restart: always
 volumes:
   - /var/run/docker.sock:/var/run/docker.sock
 labels:
   io.rancher.scheduler.global: 'true'

Store the configuration as sematext/docker-compose.yml and activate the service:

rancher-compose up -f sematext/docker-compose.yml -d

Setup via Kubernetes DaemonSet

If you already deployed Kubernetes on RancherOS the Cattle scheduler should not be activated and you should use Kubernetes DaemonSet to deploy Sematext Docker Agent to all cluster nodes:

  1. Create a sematext-agent.yml file with the DaemonSet definition (just replace SPM/Logsene App tokens):
    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
     name: sematext-agent
    spec:
     template:
       metadata:
         labels:
           app: sematext-agent
       spec:
         nodeSelector: {}
         dnsPolicy: "ClusterFirst"
         restartPolicy: "Always"
         containers:
         - name: sematext-agent
           image: sematext/sematext-agent-docker:latest
           imagePullPolicy: "Always"
           env:
           - name: SPM_TOKEN
             value: "YOUR_SPM_TOKEN"
           - name: LOGSENE_TOKEN
             value: "YOUR_LOGSENE_TOKEN"
           - name: KUBERNETES
             value: "1"
           volumeMounts:
             - mountPath: /var/run/docker.sock
               name: docker-sock
             - mountPath: /etc/localtime
               name: localtime
           securityContext:
             privileged: true
         volumes:
           - name: docker-sock
             hostPath:
               path: /var/run/docker.sock
           - name: localtime
             hostPath:
               path: /etc/localtime
    
  1. Activate the DaemonSet in the Kubernetes cluster:
    kubectl create -f sematext-agent.yml

Setup via Swarm global services

Similar to Kubernetes on RancherOS the Cattle scheduler is deactivated for Swarm as well. Thus, you can deploy Sematext Docker Agent as a global service on Swarm (>Docker engine 1.12). Connect you Docker client to RancherOS Swarm API endpoint and run the following global service definition with your SPM/Logsene App tokens. This will add Sematext Docker Agent to each Swarm node as soon it get launched.

docker service create --mode global --name sematext-agent-docker \
--mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
-e SPM_TOKEN=YOUR_SPM_TOKEN -e LOGSENE_TOKEN=YOUR_LOGSENE_TOKEN \
sematext/sematext-agent-docker

Setup via Mesos Marathon

The following configuration will activate Sematext Docker Agent on every node in the Mesos cluster. Please note that you have to specify the number of Mesos nodes (instances), SPM App Token, and Logsene App Token. Example call to the Marathon API:

curl -XPOST -H "Content-type: application/json" http://your_marathon_server:8080/v2/apps -d '
{
"container": {
"type": "DOCKER",
"docker": {
"image": "sematext/sematext-agent-docker",
"privileged": true,
},
"volumes": [
{
"containerPath": "/var/run/docker.sock",
"hostPath": "/var/run/docker.sock",
"mode": "RW"
}
],
"network": "BRIDGE"
},
"env": {
"LOGSENE_TOKEN": "YOUR_LOGSENE_TOKEN",
"SPM_TOKEN": "YOUR_SPM_TOKEN"
},
"id": "sematext-agent-docker",
"instances": 1,
"cpus": 0.5,
"mem": 300,
"constraints": [["hostname","UNIQUE"]]
}

Summary

 

RancherOS provides many options and plays well with the most common orchestration tools.

Users can pick their preferred solution together with Sematext Docker Agent for the collection of metrics, events and logs. We have seen users moving from one orchestration platform to another when they hit some issues and having the flexibility to do so easily is extremely valuable in the rapidly changing container world.

As we hope you can see, RancherOS provides this flexibility, while Sematext Docker Agent support for all leading orchestration platforms ensures continuous visibility into Docker operational intelligence – metrics, events, and logs, regardless of which orchestration platform you use.

octi-sda

SIGN UP – FREE TRIAL

es-logsene-arrows

Elastic Stack Import-Export with Logstash & Logsene

In earlier posts, we explained how one can reindex data from one Elasticsearch cluster to another, or within the same Elasticsearch cluster, via tools like Logstash and rsyslog.

The same recipes apply to Logsene, as it exposes the Elasticsearch API. Not only can you push data to Logsene with everything that talks to Elasticsearch (such as Logstash), but you can also use Elasticsearch’s Scroll API to export data from Logsene. All you need to remember is that with Logsene, you need to specify your app token as the index name.

Migrating data from your in-house ELK stack to Logsene

Let’s say you already have an Elastic stack deployed, but you want to migrate existing logs to Logsene. Maybe because you’re spending too much time and money on managing and scaling Elasticsearch, and you’d like to outsource that. Or because you’d like built-in features of Logsene like role-based access control or anomaly detection. Either way, you can migrate your data and keep using Elasticsearch-focused tools:

input {
  elasticsearch {
   hosts => ["localhost:9200"]
   index => "logstash-*"
  }
}

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:80"
    index => "DESTINATION_LOGSENE_APP_TOKEN"
    manage_template => false
  }
}

NOTE: Since Logsene plans are based on ingestion volume and retention, that initial import throughput spike may influence your costs. That shouldn’t be a problem if you just started and have a big enough trial plan. Even if the trial is over and go over the selected plan, you’ll pay at the same per-GB rate.

Reindexing data from one Logsene app to another

Let’s say you’re prototyping, you’re tweaking your Logstash grok rules, but you’d like to use a custom template. For the new template to apply, you’ll need a new index (i.e. a new Logsene app). So you can go ahead and create it, and then reindex the data from the first app with Logstash. Here’s a sample config (though you can also add filters to change data along the way). Except now, the source is not your in-house Elasticsearch cluster, but a Logsene app that already has logs you want to reindex:

input {
  elasticsearch {
   hosts => ["logsene-receiver.sematext.com:80"]
   index => "SOURCE_LOGSENE_APP_TOKEN"
  }
}

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:80"
    index => "DESTINATION_LOGSENE_APP_TOKEN"
    manage_template => false
  }
}

NOTE: If you want SSL encryption, just add ssl => true and change the port to 443.

Exporting data from Logsene

Even if Logsene comes with Amazon S3 log archiving, you might need to export your logs somewhere else using – you guessed it! – a similar config:

input {
  elasticsearch {
   hosts => ["logsene-receiver.sematext.com:80"]
   index => "LOGSENE_APP_TOKEN"
  }
}

output {
  file {
    path => "/mnt/big_disk/big_log"
  }
}

See? No lock-in! With Logsene you can also easily go back to self-hosted, if you want to build something custom around your ELK stack for example. We can actually help you with that, through Elasticsearch and logging trainings and through logging consulting.

DevOps

Search-Devops.com – Search DevOps Projects

As the world of software is growing, so is the ecosystem of DevOps tools and resources – for monitoring, for logging, for alerting, for continuous integration and deployment, configuration management, etc.  Nothing wrong with having lots of resources and tools, but here at Sematext we like to find answers as quickly and efficiently as possible; after all, we are known (among other things) for Search and Big Data expertise.
To that end, we’re happy to announce search-devops.com (aka “SD”), which aggregates, indexes and makes searchable all content repositories (mailing lists, source code, wikis, issue trackers, etc.) for a number of open source DevOps projects:

Puppet Collectd Logstash
Chef InfluxDB Fluentd
Saltstack Prometheus Rsyslog
Ansible Sensu Syslog-ng
CFEngine Statsd Nxlog
Docker Zabbix Log4J2
CoreOS Nagios log4php
Vagrant Icinga Kibana
Graphite Ganglia Grafana

Activity has been steadily growing the past few months (see chart below for Docker example) and there is a ton of usable and helpful content available right now.

SD_Docker_screen

The Search-DevOps Suggestion Box is Open!

Got suggestions for new DevOps tools to add? Or new functionality on the site?  Drop us an email or hit us on Twitter.  We are eager to support the DevOps community and improve everyone’s work lives and tool sets.

docker_monitoring_oss_cover

Open Source Docker Monitoring & Logging

Pets ⇒ Cattle ⇒ Orchestration

Docker is growing by leaps and bounds, and along with it its ecosystem.  Being light, the predominant container deployment involves running just a single app or service inside each container.  Most software products and services are made up of at least several such apps/services.  We all want all our apps/services to be highly available and fault tolerant.  Thus, Docker containers in an organization quickly start popping up like mushrooms after the rain.  They multiply faster than rabbits. While in the beginning we play with them like cute little pets, as their number quickly grow we realize we are dealing with aherd of cattle, implying we’ve become cowboys.  Managing a herd with your two hands, a horse, and a lasso willget you only so far.  You won’t be able to ride after each and every calf that wonders in the wrong direction.  To get back to containers from this zoological analogy – operating so many moving pieces at scale is impossible without orchestration – this is why we’ve seen the rise of Docker Swarm, Kubernetes, Mesos, CoreOS, RancherOS and so on.

 

Containers multiply faster than Gremlins


Pets ⇒ Cattle ⇒ Orchestration + Operational Insights

Container orchestration helps you manage your containers, their placement, their resources, and their whole life cycle.  While containers and applications in them are running, in addition to the whole life cycle management we need container monitoring and log management so we can troubleshoot performance or stability issues, debug or tune applications, and so on.  Just like with orchestration, there are a number of open-source container monitoring and logging tools.  It’s great to have choices, but having lots of them means you need to evaluate and compare them to pick the one that best matches your needs.

DevOps Tools Comparison

We’ve open-sourced our Sematext Docker Agent (SDA for short) which works with SPM for monitoring and Logsene for log management (think of it as ELK as a Service), and wanted to provide a high level comparison of SDA and several popular Docker monitoring and logging tools, like CAdvisor, Logspout, and others.  In the following table we group tools by functionality and include monitoring agents, log collectors and shippers, storage backends, and tools that provide the UI and visualizations.  For each functionality we list in the “Common Tools” column one or more popular open-source tools that provide that functionality.  An empty “Common Tools” cell means there are no popular open-source tools that provide it, or at least we are not aware of it — if we messed something up, please leave a comment or tweet @sematext.

Functionality Common Tools Sematext Tools
Collect Logs from Docker API
(including auto-discovery of new containers)
Logspout Sematext Docker Agent
Log routing Logspout
Routing setup for containers via HTTP API to syslog, redis, kafka, logstash
Docker Logging Drivers (e.g. syslog, journald, fluentd, etc.)
Sematext Docker Agent
(routing of logs to different indices based on container labels)
Automatic log tagging
(with Docker Compose or Swarm or Kubernetes metadata)
For Kubernetes: fluentd-elasticsearch, assumes Elasticsearch deployed locally Sematext Docker Agent
Collect Docker Metrics CAdvisor Sematext Docker Agent
Collect Docker Events ? Sematext Docker Agent
Logs format detection
(most tools need a static setup per logfile/application)
? Sematext Docker Agent
(out of the box format detection and parsing; the parser and the logagent-js pattern library is open source)
Logs parsing and shipping Fluentd
Logstash
rsyslog
syslog-ng
Sematext Docker Agent
Logs storage and indexing Elasticsearch
Solr
Logsene
(exposes Elasticsearch API)
Logs anomaly detection and alerting ? Logsene
Log search and analytics Kibana
Grafana
Logsene
(Logsene’s own UI or integrated Kibana, or Grafana connected to Logsene via Elasticsearch data source)
Metrics storage and aggregation Graphite
OpenTSDB
KairosDB
Elasticsearch
Influxdb
Prometheus
SPM
Metrics charts and dashboards Grafana
Kibana
SPM
Metrics anomaly detection and alerting Influxdb
Prometheus
SPM
Correlation of Metrics, Logs and Events ? SPM & Logsene integration

This table shows a few things:

  • Some of the functionality provided by SPM and Logsene is not available in some of the most popular open-source monitoring and logging tools included here
  • Some of the SPM and Logsene functionality is indeed provided by some of the open-source tools, however none of them seems to encompass all the features, forcing one to mix and match and head down the tech debt-ridden Franken-monitoring path
  • Try it yourself in the MindMap below – pick a few functionalities and see how many different tools you might have to use?
    docker_monitoring_oss

 

Avoid building technical-debt & Franken-monitoring by using a limited number of Docker monitoring & logging tools

Again, if we missed something, please leave a comment or tweet @sematext.
If you want to try Sematext Docker Agent sign up for a free trial.

P.S.: Sematext Docker Agent is available in the RancherOS Community Catalog and shows up with our new mascot “Octi” only one more pet 🙂 – so if you use RancherOS search for “sematext” in the RancherOS Catalog and within a few clicks you’ll have the Sematext Docker Agent deployed to your RancherOS clusters!

octi-sda

SIGN UP – FREE TRIAL

octi-docker-datacenter

Monitoring Docker Datacenter Logs & Metrics

Docker Datacenter (DDC) simplifies container orchestration and increases the flexibility and scalability of application deployments.  However, the high level of automation create new challenges for monitoring and log management. Organizations that introduce Docker Datacenter manage container deployments in various scenarios e.g., on bare metal, virtual machines, or hybrid clouds. That’s why at Sematext we are seeing a shift from traditional server monitoring to container-centric monitoring. This post is an excerpt from the newly published “Reference Architecture: Monitoring and Logging for Docker Datacenter” and shows how Docker Datacenter could be extended with Logging and Monitoring services.

Download Reference Architecture Logging & Monitoring for Docker Datacenter

The Docker Universal Control Plane (UCP) management functionalities include real-time monitoring of the cluster state, real-time metrics and logs for each container. However, operating larger infrastructures requires a longer retention time for logs and metrics and the capability to correlate metrics, logs and events on several levels (cluster, nodes, applications and containers).  A comprehensive monitoring and logging solution ought to provide the following operational insights:

Read More