clear query| facets| time Search criteria: .   Results from 1 to 10 from 15 (0.0s).
Loading phrases to help you
refine your search...
[NUTCH-2501] Take into account $NUTCH_HEAPSIZE when crawling using crawl script - Nutch - [issue]    Author: Moreno Feltscher , 2018-05-10, 12:15
[NUTCH-2508] Misleading documentation about http.proxy.exception.list - Nutch - [issue]
...The description about http.proxy.exception.list states that domains as well as URLs can be configured to be excluded from being routed through a pre-configured proxy. This is misleading sinc...    Author: Moreno Feltscher , 2018-02-01, 00:52
[NUTCH-2503] Add option to run tests for a single plugin - Nutch - [issue]
...Sometimes it makes sense to just run tests for a single plugin instead of building all plugins and running all tests at once....    Author: Moreno Feltscher , 2018-01-23, 21:59
[NUTCH-2502] Any23 Plugin: Add Content-Type filtering - Nutch - [issue]
...It should be possible to filter based on a document's Content-Type when using Any23 extractors....    Author: Moreno Feltscher , 2018-01-23, 18:57
[NUTCH-2499] Elastic REST Indexer: Duplicate values - Nutch - [issue]
...Due to a change in the Elastic REST indexer does not work with HashSets ...    Author: Moreno Feltscher , 2018-01-23, 18:57
[NUTCH-2495] Use -deleteGone instead of clean job in crawler script while indexing - Nutch - [issue]
...Instead of running bin/nutch clean after indexing the documents run bin/nutch index with the -deleteGone flag which instead of just deleting gone and duplicated documents also deletes redire...    Author: Moreno Feltscher , 2018-01-23, 17:55
[NUTCH-2497] Elastic REST Indexer: Allow multiple hosts - Nutch - [issue]
...Allow specifying a list of Elasticsearch hosts to index documents to. This would be especially helpful when working with a Elasticsearch cluster which contains of multiple nodes....    Author: Moreno Feltscher , 2018-01-18, 19:04
[NUTCH-2496] Speed up link inversion step in crawling script - Nutch - [issue]
...While working on a project where I have to index a huge number of URLs I encountered an issue with the link inversion step of the crawling script. A while ago Ian Lopata stumbled upon the sa...    Author: Moreno Feltscher , 2018-01-18, 00:34
[NUTCH-2493] Add configuration parameter for sitemap processing to crawler script - Nutch - [issue]
...While using the crawler script with the sitemap processing feature introduced in NUTCH-2491 I encountered some performance issues when working with large sitemaps.Therefore one should be abl...    Author: Moreno Feltscher , 2018-01-10, 16:53
[NUTCH-2491] Integrate sitemap processing and HostDB into crawl script - Nutch - [issue]
...Add three new steps to the crawl bash script:1. Generate HostDB from CrawlDB2. Inject URLs from sitemaps URLs found in hosts from HostDb3. If given, inject sitemap URLs specified in a config...    Author: Moreno Feltscher , 2018-01-09, 00:23