[NUTCH-2597] NPE in updatehostdb - Nutch - [issue]
...I get an NPE on updatehostdb. I start with a clean crawlDB & hostDB. After an inject, I do an updatehostdb with -checkAll and get the following stacktrace:2018-06-13 10:45:21,958 WARN ho...    Author: Jurian Broertjes , 2018-06-21, 15:29
[NUTCH-2565] MergeDB incorrectly handles unfetched CrawlDatums - Nutch - [issue]
...I ran into this issue when merging a crawlDB originating from sitemaps into our normal crawlDB. CrawlDatums are merged based on output of AbstractFetchSchedule::calculateLastFetchTime(). Whe...    Author: Jurian Broertjes , 2018-06-21, 15:29
[NUTCH-2382] indexer-hbase Nutch 1.x branch - Nutch - [issue]
...I've ported the indexer-hbase for Nutch 2.x ( to 1.x. Dit some basic tests. Patch is attached....    Author: Jurian Broertjes , 2018-06-12, 19:41
[MASSMAIL]Re: Preparing to release Nutch 1.15 ? - Nutch - [mail # user]
...+1 Nice work all!On 11-06-18 23:44, BlackIce wrote:> +1>> stoopid question, but I can't find any info on it... can we now parse Open> Graph metatags?>> Greetz>> On Mo...
   Author: Jurian Broertjes , 2018-06-12, 10:13
[NUTCH-2380] indexer-elastic version upgrade to 5.3.0 - Nutch - [issue]
...The current version of the indexer-elastic plugin is not compatible with ES 5.x. The patch bumps the ES lib version to 5.3 but also requires a Nutch classloader fix (NUTCH-2378) due to runti...    Author: Jurian Broertjes , 2018-06-01, 19:04
[NUTCH-2543] readdb & readlinkdb to implement AbstractChecker - Nutch - [issue]
...Implement AbstractChecker in LinkDbReader & CrawlDbReader classes, so we can expose them via TCP....    Author: Jurian Broertjes , 2018-03-27, 14:54
[NUTCH-2197] Add solr5 solrcloud indexer support - Nutch - [issue]
...Nutch cannot index to Solr5. Also proper SolrCloud support is missing....    Author: Jurian Broertjes , 2018-02-13, 19:24
[NUTCH-2477] Refactor *Checker classes to use base class for common code - Nutch - [issue]
...The various Checker class implementations have quite a bit of duplicated code in them. This should be refactored for cleanliness and maintainability....    Author: Jurian Broertjes , 2017-12-18, 16:53
[NUTCH-2431] URLFilterchecker to implement Tool-interface - Nutch - [issue]
...The current implementation of the URLFilterChecker does not allow for commandline config overrides. It needs to implement the Tool interface for this. Please see the attached patch...    Author: Jurian Broertjes , 2017-12-18, 13:05
[NUTCH-2242] lastModified not always set - Nutch - [issue]
...I observed two issues: When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not updated on the first successful fetch. When a document modification is detected (protocol- ...    Author: Jurian Broertjes , 2017-11-04, 17:16