Nutch Digest, May 2010

With May being almost over, it’s time for our regular monthly Nutch Digest. As you know, Nutch became a top-level project and first related changes are already visible/applied. The Nutch site was moved to its new home at nutch.apache.org and […]

Nutch Digest, April 2010

In the first part of this Nutch Digest we’ll go through new and useful features of the upcoming Nutch 1.1 release, while in the second part we’ll focus on developments and plans for next big Nutch milestone, Nutch 2.0. But, […]

Nutch Digest, March 2010

This is the first post in the Nutch Digest series and a little introduction to Nutch seems in order. Nutch is a multi-threaded and, more importantly, a distributed Web crawler with distributed content processing (parsing, filtering), full text indexer and […]

Hiring Lucene, Solr, Nutch, Hadoop, NLP People

Hear, hear! We are looking for people passionate about search/information retrieval, natural language processing, machine learning, text analytics, recommendation engines, and related topics.  Please see the Sematext Jobs page for a bit more information.  If you enjoy working with Lucene, […]