Hadoop Digest, March 2010

Main news first: Hadoop 0.20.2 was released! The list of changes may be found in the release notes here. Related news: Maven artifacts have been pushed to repository.apache.org. This version has entered Debian unstable repository. Cloudera officially announced CDH2 release […]

Nutch Digest, March 2010

This is the first post in the Nutch Digest series and a little introduction to Nutch seems in order. Nutch is a multi-threaded and, more importantly, a distributed Web crawler with distributed content processing (parsing, filtering), full text indexer and […]

Lucene Digest, March 2010

Welcome to another edition of the Lucene monthly Digest post. As reported by @lucene, Lucene and Solr have merged.  This pretty big change didn’t happen over night.  As a matter of fact, the Lucene/Solr developers went through a pretty intense […]

Solr Digest, February 2010

This second installment of Solr Digest (see Solr January Digest) will cover 8 topics, some of which are quite new and some with very long history (and still uncertain future). So, here we go: 1. solr.ISOLatin1AccentFilterFactory is commonly used filter […]

Introducing Cloud MapReduce

The following post is the introduction to Cloud MapReduce (CMR) written by Huan Liu, CMR’s main author and the Research Manager at Accenture Technology Labs. MapReduce is a programming model (borrowed from functional programming languages) and its associated implementation, and […]