2015 in Review

Another year is behind us, and it’s been another good year for us at Sematext.  Here are the highlights in the chronological order.  If you prefer looking non-chronological overview, look further below.


We started the year by doing a ton of publishing on the blog – about Solr-Redis, about SPM and Slack, about Solr vs. Elasticsearch – always a popular topic, Spark, Kafka, Cassandra, Solr, etc.  Logsene being ELK as a Service means we made sure users have the freedom and flexibility to create custom Elasticsearch Index Templates in Logsene.


We added Account Sharing to all our products, thus making it easier to share SPM, Logsene, and Site Search Analytics apps by teams.  We made a big contribution to Kafka 0.8.2 by reworking pretty much all Kafka metrics and making them much more useful and consumable by Kafka monitoring agents.  We also added support for HAProxy monitoring to SPM.


We announced Node.js / io.js monitoring.  This was a release of our first Node.js-based monitoring monitoring agent – spm-agent-nodejs, and our first open-source agent.  The development of this agent resulted in creation of spm-agent an extensible framework for Node.js-based monitoring agents.  HBase is one of those systems with tons of metrics and with metrics that change a lot from release to release, so we updated our HBase monitoring support for HBase 0.98.


The SPM REST API was announced in April, and a couple of weeks later the spm-metrics-js npm module for sending custom metrics from Node.js apps to SPM was released on Github.


A number of us from several different countries gathered in Krakow in May.  The excuse was to give a talk about Tuning Elasticsearch Indexing Pipeline for Logs at GeeCon and give away our eBook – Log Management & Analytics – A Quick Guide to Logging Basics while sponsoring GeeCon, but in reality it was really more about Żubrówka and Vișinată, it turned out.  Sematext grew a little in May with 3 engineers from 3 countries joining us in a span of a couple of weeks.  We were only dozen people before that, so this was decent growth for us.

Right after Krakow some of us went to Berlin to give another talk: Solr and Elasticsearch – Side by Side with Elasticsearch and Solr: Performance and Scalability.  While in Berlin we held our first public Elasticsearch training and, following that, quickly hopped over to Hamburg to give a talk at a local search meetup.


In June we gave a talk on the other side of the Atlantic – in NYC – Beyond POC: Processing Metrics, Logs and Traces … at Scale.  We were conference sponsors there as well and took part in the panel about microservices.  We published our second eBook – Elasticsearch Monitoring Essentials eBook.  The two most important June happenings were the announcement of Docker monitoring – SPM for Docker – our solution for monitoring Docker containers, as well as complete, seamless integration of Kibana 4 into Logsene.  We’ve added Servers View to SPM and Logsene got much needed Alerting and Anomaly Detection, as well as Saved Searches and Scheduled Reporting.


In July we announced public Solr and Elasticsearch trainings, both in New York City, scheduled for October.  We built and open-sourced Logsene Command Line Interfacelogsene-cli – and we added Tomcat monitoring integration to SPM.


At Sematext we use Akka, among other things, and in August we introduced Akka monitoring integration for SPM and open-sourced the Kamon backend for SPM.  We also worked on and announced Transaction Tracing that lets you easily find slow transactions and bottlenecks that caused their slowness, along with AppMaps, which are a wonderful way to visualize all your infrastructure along applications running on it and see, in real-time, which apps and servers are communicating, how much, how often there are errors in each app, and so on.


In September we held our first 2 webinars on Docker Monitoring and Docker Logging.  You can watch them both in Sematext’s YouTube channel.


We presented From zero to production hero: Log Analysis with Elasticsearch at O’Reilly’s Velocity conference in New York and then Large Scale Log Analytics with Solr at Lucene/Solr Revolution in Austin.  After Texas we came back to New York for our Solr and Elasticsearch trainings.


Logsene users got Live Tail in November, while SPM users welcomed the new Top Database Operations report.  Live Tail comes in very handy when you want to semi-passively watch out for errors (or other types of logs) without having to constantly search for them.  While most SPM users have SPM monitoring agents on their various backend components, Top Database Operations gives them the ability to gain more insight in performance between the front-end/web applications and backend servers like Solr, Elasticsearch, or other databases by putting the monitoring agents on applications that act as clients for those backend services.  We worked with O’Reilly and produced a 3-hour Working with Elasticsearch Training Video.


We finished off 2015 by adding MongoDB monitoring to SPM, joining Docker’s ETP Program for Logging, further integrating monitoring and logging, ensuring Logsene works with Grafana, writing about monitoring Solr on Docker, publishing the popular Top 10 Node.js Metrics to Watch, as well as a SPM vs. New Relic APM comparison.  

Pivoting the above and grouping it by our products and services:


  • Live Tail
  • Alerting
  • Anomaly Detection
  • logsene-cli + logsene.js + logagent-js
  • Saved Searches
  • Scheduled Email Reporting
  • Integrated Kibana
  • Compatibility with Grafana
  • Search AutoComplete
  • Powerful click-and-filter
  • Native charting of numerical fields
  • Account Sharing



  • Transaction Tracing
  • SPM Tracing API
  • AppMap
  • NetMap
  • On Demand Profiling
  • Integration with Logsene
  • Expanded monitoring for Elasticsearch, Solr, HBase, and Kafka
  • Added monitoring for Docker, Node.js, Akka, MongoDB, HAProxy, and Tomcat
  • Birds Eye Servers View
  • Account Sharing



  • Docker Monitoring
  • Docker Logging



  • Elasticsearch training in Berlin
  • Solr and Elasticsearch trainings in New York



  • Elasticsearch Monitoring Essentials
  • Log Management & Analytics – A Quick Guide to Logging Basics


Talks / Presentations / Conferences:

  • Lucene/Solr Revolution, Austin, TX – Large Scale Log Analytics with Solr
  • Velocity Conference, NYC, NY – Log Analysis with Elasticsearch
  • Berlin Buzzwords, Berlin, Germany – Side by Side with Elasticsearch and Solr: Performance and Scalability
  • GeeCon, Krakow, Poland – Tuning Elasticsearch Indexing Pipeline for Logs
  • DevOps Days, Warsaw, Poland – Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
  • DevOps Expo, NYC, NY – Process Metrics, Logs, and Traces at Scale



All numbers are up  – our SPM and Logsene signups are up, product revenue is up a few hundred percent from last year, we’ve nearly doubled our blogging volume, our site traffic is up,we’ve made several UI-level facelifts for both apps.sematext.com and www.sematext.com, our team has grown, we’ve increased the number of our Solr and Elasticsearch Production Support customers, and we’ve added Solr and Elasticsearch Training to the list of our professional services.


Sematext Year 2011 in Review

2011 was a good year for Sematext. Here are some highlights.


In 2011, we’ve released several new versions of our popular AutoComplete, Key Phrase Extractor, and DYM ReSearcher products and have witnessed a number of organizations adopting them.


After months of hard work, we’ve opened up our Search Analytics and Performance Monitoring services to public. Anyone can sign up for an account and use either or both of these services for free.  Yes, both services are completely free now and can be used without any restrictions.

Services – Tech Support

In addition to our standard consulting services we’ve successfully started offering commercial tech support for Lucene and Solr.  These annual subscriptions come in several different packages and are made for those running Lucene or Solr in production and wanting immediate access to Lucene and Solr experts when things go awry.

Services – Consulting Packages

For those who need long-term access to Lucene or Solr experts we started offering several levels of consulting support packages.  What makes these packages attractive is that one gets immediate help to Lucene or Solr expert consultants while paying a lower rate in exchange for an annual commitment.


We attended and presented at a number of conferences (slides and videos) in 2011 – Lucene Revolution in San Francisco in May, Berlin Buzzwords in June, Lucene Eurocon in Barcelona in October, and Enterprise Search Summit Fall in Washington, DC in November.

Open Source

During our work on Search Analytics and Performance Monitoring and specifically the parts of them that use HBase, we’ve forked a couple of open-source projects that we put up on Github.  We are looking at open-sourcing a few other things in 2012.  We’ve also contributed patches to Flume, Solr, HBase, and while in Berlin in June we took part in our first HBase Hackathon.

Team Growth

Our team has roughly doubled in size.  Our people are now in 3 different time zones and 6 countries and we are looking to expand this further.  We are actively hiring and have a number of open positions, from mobile development and design to system administration, sales and marketing.  Of course, we are always looking for search and big data experts.  The team didn’t grow only in number, but also in knowledge and expertise – we now have 2 Cloudera Certified Hadoop developers on staff, 2 published book authors, a recent Machine Learning and Artificial Intelligence Stanford online class “graduate”, etc.

Collaboration with Academia

We have partnered with a university lab on the other side of the Atlantic and have been collaborating on some interesting projects results of which we’ll share in 2012.


Overall, 2011 was a good year.  But we are in 2012 now and it’s time to look ahead.  Looking at our crystal ball, I see more fun work and more opportunities.  We’ll do our best to make the most of them.

Hiring Search and Data Analytics Engineers

We are growing and looking for smart people to join us either in an “elastic”, on-demand, per-project, or more permanent role:

Lucene/Solr expert who…

  • Has built non-trivial applications with Lucene or Solr or Elastic Search, knows how to tune them, and can design systems for large volume of data and queries
  • Is familiar with (some of the) internals of Lucene or Solr or Elastic Search, at least on the high level (yeah, a bit of an oxymoron)
  • Has a systems/ops bent or knows how to use performance-related UNIX and JVM tools for analyzing disk IO, CPU, GC, etc.

Data Analytics expert who…

  • Has used or built tools to process and analyze large volumes of data
  • Has experience using HDFS and MapReduce, and have ideally also worked with HBase, or Pig, or Hive, or Cassandra, or Voldemort, or Cascading or…
  • Has experience using Mahout or other similar tools
  • Has interest or background in Statistics, or Machine Learning, or Data Mining, or Text Analytics or…
  • Has interest in growing into a Lead role for the Data Analytics team

We like to dream that we can find a person who gets both Search and Data Analytics, and ideally wants or knows how to marry them.

Ideal candidates also have the ability to:

  • Write articles on interesting technical topics (that may or may not relate to Lucene/Solr) on Sematext Blog or elsewhere
  • Create and give technical talks/presentations (at conferences, local user groups, etc.)

Additional personal and professional traits we really like:

  • Proactive and analytical: takes initiative, doesn’t wait to be asked or told what to do and how to do it
  • Self-improving and motivated: acquires new knowledge and skills, reads books, follows relevant projects, keeps up with changes in the industry…
  • Self-managing and organized: knows how to parcel work into digestible tasks, organizes them into Sprints, updates and closes them, keeps team members in the loop…
  • Realistic: good estimator of time and effort (i.e. knows how to multiply by 2)
  • Active in OSS projects: participates in open source community (e.g. mailing list participation, patch contribution…) or at least keeps up with relevant projects via mailing list or some other means
  • Follows good development practices: from code style to code design to architecture
  • Productive, gets stuff done: minimal philosophizing and over-designing

Here are some of the Search things we do (i.e. that you will do if you join us):

  • Work with external clients on their Lucene/Solr projects.  This may involve anything from performance troubleshooting to development of custom components, to designing highly scalable, high performance, fault-tolerant architectures.  See our services page for common requests.
  • Provide Lucene/Solr technical support to our tech support customers
  • Work on search-related products and services

A few words about us:

We work with search and big data (Lucene, Solr, Nutch, Hadoop, MapReduce, HBase, etc.) on a daily basis.  Our projects with external clients range from 1 week to several months.  Some clients are small startups, some are large international organizations.  Some are top secret.  New customers knock on our door regularly and this keeps us busy at pretty much all times.  When we are not busy with clients we work on our products.  We run search-lucene.com and search-hadoop.com.  We participate in open-source projects and publish monthly Digest posts that cover Lucene, Solr, Nutch, Mahout, Hadoop, and HBase.  We don’t write huge spec docs, we work in sprints, we multitask, and try our best to be agile. We send people to conferences, trainings (Hadoop, HBase, Cassandra), and certifications (2 of our team members are Cloudera Certified Hadoop Developers).

We are a small and mostly office-free, highly distributed team that communicates via email, Skype voice/IM, BaseCamp.  Some of our developers are in Eastern Europe, so we are especially open to new team members being in that area, but we are also interested in good people world-wide, from South America to Far East.

Interested? Please send your resume to jobs @ sematext.com.

Hiring Lucene, Solr, Nutch, Hadoop, NLP People

Hear, hear!

We are looking for people passionate about search/information retrieval, natural language processing, machine learning, text analytics, recommendation engines, and related topics.  Please see the Sematext Jobs page for a bit more information.  If you enjoy working with Lucene, Solr, Nutch, Hadoop, HBase, or any of the other technologies listed on the Sematext Jobs page or, more generally, you enjoy working in any of the related fields, please get in touch.

We are a small, private company based in New York City, with people on multiple continents and clients from all around the globe.

Sematext Blog Introduction

Here it is – Sematext’s new and shinny blog.

We’ll be writing about topics that are dear and important to us – search (both web search and enterprise search), text analytics, natural language processing (sentiment detection, named entity recognition…), machine learning, information gathering (e.g. web crawling), information extraction, e-discovery, recommendation engines, etc.  There will be a lot of talk about tools we use regularly – Lucene, Solr, Nutch, Mahout and Taste, Hadoop, HBase and friends, and more.

To subscribe, use the orange feed icon or just go to http://feeds.feedburner.com/SematextBlog.  If you are a Twitter user, you can follow @sematext on Twitter, too.