Log Management for IBM Bluemix and Cloud Foundry

Enterprises without DevOps teams and culture typically experience long applications deployment cycles. Once developers make a new release the operations team needs to deploy it to one of the existing or new servers, and it often takes a long time (up to several weeks is not uncommon) to get a new application deployed to production. This process may include hardware purchases, configuration, and troubleshooting sessions with the development team. Especially the troubleshooting portion might take a while, with the typical e-mail ping-pong with commands for the administrators, sysadmins sending log files to the developers… Modern digital businesses don’t operate that way any more.  Instead, modern DevOps teams often utilize PaaS to boost productivity, without even thinking of server resources, ansible/chef/puppet scripts and related stuff. They push their application releases with a few commands to “the cloud” – typically a platform as a service (PaaS) like IBM Bluemix! Once deployments get simplified in such a way, the next logical step is to disrupt the e-mail ping-pong for troubleshooting by introducing centralized log management. Logs are the most important source of information for application troubleshooting. Central log management is crucial for providing real-time log access to relevant teammates.  

In this post, we’ll show how to do this by deploying a Node.js app to IBM Bluemix and using Logsene for Log Management.

Read More

Elasticsearch security: Authentication, Encryption, Backup

The recent ransom attack on public Elasticsearch instances showed that Elasticsearch security is still a hot topic. Elasticsearch was not the only target – tens of thousands of poorly configured MongoDB databases have been compromised over the past week, too, compromising over 27,000 servers where hackers stole and then deleted data from unpatched or “poorly-configured” systems. The scenario is always the same: insecure instances are “hacked” and data replaced with a note informing the owner to send payment to a Bitcoin address and then email the attacker to retrieve the data. Over the last few days we saw more than 4000 Elasticsearch instances compromised and the number of instances is still growing, as seen here. The attacks are rather simple. The attacker simply scans for services on port 9200. Once such a service is found the hacked fetches the data from it, then deletes it and puts the payment information as document in the stolen Elasticsearch index. Due to the fact that many Elasticsearch instances are not protected these instances are very easy targets.

In this post, we are going to show you what you should or shouldn’t do with Elasticsearch, but actually how to secure Elasticsearch by sharing a few simple and free prevention methods to do that.
Read More

Solr Query Segmenter: How to Provide Better Search Experience

One way to create a better search experience is to understand the user intent.  One of the phases in that process is query understanding, and one simple step in that direction is query segmentation. In this post, we’ll cover what query segmentation is and when it is useful. We will also introduce to you Solr Query Segmenter, a open-sourced Solr component that we developed to make search experience better.

Read More

2016 Year in Review: Monitoring and Logging Highlights

2017 is almost here and, like last year, we thought we’d share how 2016 went for us.  We remain committed to be your “one-stop shop” for all things Elasticsearch and Solr: from Consulting, Production Support, and Training, to complementing that with our Logsene for all your logs, and SPM for all your monitoring needs.

Docker

It’s safe to say 2016 was the year of Docker and by extension Kubernetes, Mesos, Docker Swarm, among others, too.  They stopped being just early adopters’ toys and have become production-ready technologies used by many. This year we’ve added excellent support for Docker monitoring with SPM and logging with Logsene via the open-source Sematext Docker Agent.

Read More

Migrating to SolrCloud from Solr Master-Slave

Nowadays there are more and more organizations searching for fault-tolerant and highly available solutions for various parts of their infrastructure, including search, which evolved from merely a “nice to have” feature to the first class citizen and a “must have” element.

Apache Solr is a mature search solution that has been available for over a decade now.  Its traditional master-slave deployment has been available since 2006, while the fully distributed deployment known as SolrCloud has been available for only a few years now. Thus, naturally, many organizations are in the process of migrating from Solr master-slave to SolrCloud, or are at least thinking about the move. In this article, we will give you an overview of what’s needed to be done for the migration to SolrCloud to be as smooth as it can be.

Read More

Making Elasticsearch in Docker Swarm Elastic

Running on Elasticsearch on Docker sounds like a natural fit – both technologies promise elasticity. However, running a truly elastic Elasticsearch cluster on Docker Swarm became somewhat difficult with Docker 1.12 in Swarm mode. Why? Since Elasticsearch gave up on multicast discovery (by moving multicast node discovery into a plugin and not including it by default) one has to specify IP addresses of all master nodes to join the cluster.  Unfortunately, this creates the chicken or the egg problem in the sense that these IP addresses are not actually known in advance when you start Elasticsearch as a Swarm service!  It would be easy if we could use the shared Docker bridge or host network and simply specify the Docker host IP addresses, as we are used to it with the “docker run” command. However,  “docker service create” rejects the usage of bridge or host network. Thus, the question remains: How can we deploy Elasticsearch in a Docker Swarm cluster?

Read More

Introducing Sematable – ReactJS & Redux Table

Back in 2011 – more than half a decade ago(!) – we’ve reviewed Top JavaScript Dynamic Table Libraries.  Clearly, a lot has changed since then.  Earlier this year, we started reworking our SPM & Logsene front-ends, building them on top of ReactJS, Redux, and ES6.  In the past, we’ve used DataTables, but it turns out DataTables doesn’t play well with React.  We set off looking for a modern alternative that works well with React but, to our disappointment, we could not find anything that fit our needs. We needed something that:

  • Can filter and search data by text or by column values
  • Can paginate data
  • Can sort data
  • Plays well with React and Redux so we can easily store filter state in Redux, or display data in some custom way

Read More

Kubernetes Containers: Logging and Monitoring support

In this post we will:

  • Introduce Kubernetes concepts and motivation for Kubernetes-aware monitoring and logging tooling
  • Show how to deploy the Sematext Docker Agent to each Kubernetes node with DaemonSet
  • Point out key Kubernetes metrics and log elements to help you troubleshoot and tune Docker and Kubernetes

Managing microservices in containers is typically done with Cluster Managers and Orchestration tools such as  Google Kubernetes, Apache Mesos, Docker Swarm, Docker Cloud, Amazon ECS, Hashicorp Nomad just to mention a few. However, each platform has slightly different of options to deploy containers or schedule tasks to each cluster node. This is why we started a Series of blog post with Docker Swarm Monitoring, and continue today with a quick tutorial for Container Monitoring and Log Collection on Kubernetes.

Read More

running sold on docker

Running Solr in Docker: How & Why

Docker is all the rage these days, but one doesn’t hear about running Solr on Docker very much.

Last month, we gave a talk on the topic of running containerized Solr at the Lucene Revolution conference in Boston, the biggest open source conference dedicated to Apache Lucene/Solr. The 40-minute talk included a live demo that shows how to actually do it, while addressing a number of important bits if you want to run Solr on Docker in production.

Curious to check the presentation? You may find it below.

Or, interested in listening to the 40-minute talk? Check it below.

Indeed, a rapidly growing number of organizations are using Solr and Docker in production. If you also run Solr in Docker be sure to check out Docker + Solr How-to: Monitoring the Official Solr Docker Image.

Needless to say, monitoring Solr is essential in production and Docker is disruptive in many ways, and there are many things that are slightly different and worth mentioning. For instance, one can create, deploy, and run applications by using containers and this gives a significant performance boost and reduces the size of the applications.

 

 

log management

Logging Libraries vs Log Shippers

Logging Libraries vs Log Shippers

In the context of centralizing logs (say, to Logsene or your own Elasticsearch), we often get the question of whether one should log directly from the application (e.g. via an Elasticsearch or syslog appender) or use a dedicated log shipper.

In this post, we’ll look at the advantages of each approach, so you’ll know when to use which.

Logging Libraries

Most programming languages have libraries to assist you with logging. Most commonly, they support local files or syslog, but more “exotic” destinations are often added to the list, such as Elasticsearch/Logsene. Here’s why you might want to use them:

  • Convenience: you’ll want a logging library anyway, so why not go with it all the way, without having to set up and manage a separate application for shipping? (well, there are some reasons below, but you get the point)
  • Fewer moving parts: logging from the library means you don’t have to manage the communication between the application and the log shipper
  • Lighter: logs serialized by your application can be consumed by Elasticsearch/Logsene directly, instead of having a log shipper in the middle to deserialize/parse it and then serialize it again

Log Shippers

Your log shipper can be Logstash or one of its alternatives. A logging library is still needed to get logs out of your application, but you’ll only write locally, either to a file or to a socket. A log shipper will take care of taking that raw log all the way to Elasticsearch/Logsene:

  • Reliability: most log shippers have buffers of some form. Whether it tails a file and remembers where it left off, or keeps data in memory/disk, a log shipper would be more resilient to network issues or slowdowns. Buffering can be implemented by a logging library too, but in reality most either block the thread/application or drop data
  • Performance: buffering also means a shipper can process data and send it to Elasticsearch/Logsene in bulks. This design will support higher throughput. Once again, logging libraries may have this functionality too (only tightly integrated into your app), but most will just process logs one by one
  • Enriching: unlike most logging libraries, log shippers often are capable of doing additional processing, such as pulling the host name or tagging IPs with Geo information
  • Fanout: logging to multiple destinations (e.g. local file + Logsene) is normally easier with a shipper
  • Flexibility: you can always change your log shipper to one that suits your use-case better. Changing the library you use for logging may be more involved

Conclusions

Design-wise, the difference between the two approaches is simply tight vs loose coupling, but the way most libraries and shippers are actually implemented are more likely to influence your decision on sending data to Elasticsearch/Logsene:

  • logging directly from the library might make sense for development: it’s easier to set up, especially if you’re not (yet) familiar with a log shipper
  • in production you’ll likely want to use one of the available log shippers, mostly because of buffers: blocking the application or dropping data (immediately) are often non-options in a production deployment

If logging isn’t critical to your environment (i.e. you can tolerate the occasional loss of data), you may want to fire-and-forget your logs to Logsene’s UDP syslog endpoint. This takes reliability out of the equation, meaning you can use a shipper if you need enriching or support for other destinations, or a library if you just want to send the raw logs (which may well be JSON).

Shippers or libraries, if you want to send logs with anything that can talk to Elasticsearch or syslog, you can sign up for Logsene here. No credit card or commitment is required, and we offer 30-day trials for all plans, in addition to the free ones.

If, on the other hand, you enjoy working with logs, metrics and/or search engines, come join us: we’re hiring worldwide.