Registration is open - Live, Instructor-led Online Classes - Elasticsearch in March - Solr in April - OpenSearch in May. See all classes


Nutch Digest, May 2010

With May being almost over, it’s time for our regular monthly Nutch Digest. As you know, Nutch became a top-level project and first related changes are already visible/applied. The Nutch site was moved to its new home at nutch.apache.org and the mailing lists (user- and dev-) have been moved to new addresses: user@nutch.apache.org and dev@nutch.apache.org.  Old subscriptions were automatically moved to the new lists, so subscribers don’t have to do anything (apart from changing mail filters, perhaps).  Even though Nutch is not a Lucene sub-project any more, we will continue to index its content (mailing lists, JIRA, Wiki, web site) and searchable over on Search-Lucene.com.  We’ve helped enough organizations with their Nutch clusters that it makes sense for at least us at Sematext to have a handy place to find all things Nutch.

There is a vote on updated Release Candidate (RC3) for the Apache Nutch 1.1 release.  The major differences between this release and RC2 are several bug fixes: NUTCH-732, NUTCH-815, NUTCH-814, NUTCH-812 and one improvement in NUTCH-816.  Nutch 1.1 is expected shortly!

From relevant changes related to Nuch develompent during May it’s important to note than Nutch will be using Ivy in Nutch builds and that there is one change regarding Nutch’s language identification: code for Japanese changed from “jp” to “ja”.  The former is Japan’s country code and the latter is the language code for Japanese.

There’s been a lot of talk on the Nutch mailing list about the first book on Nutch which Dennis Kubes started writing.  We look forward to reading it, Dennis!

Nutch developers were busy with the TLP-related changes and with preparations for the Nutch 1.1 release this month, so this Digest is a bit thinner than usual.  Follow @sematext on Twitter.

Start Free Trial