[NUTCH-1063] OutlinkExtractor test generates an exception but does not fail - Nutch - [issue]
...Testsuite: org.apache.nutch.parse.TestOutlinkExtractorTests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.043 sec------------- Standard Output ---------------2011-07-19 15:06:36,073 ERROR ...    Author: Julien Nioche , 2018-10-13, 10:21
[expand - 1 more] - index-replace: variable substitution? - Nutch - [mail # user]
...Hi Ryan, From looking at the code of index-replace, it uses Java's Matcher.replaceAll  , so $1 (for example) should work. Yossi.  > -----Original Message-----> From...
   Author: Yossi Tamari , Ryan Suarez , ... , 2018-10-13, 00:13
[expand - 2 more] - RE: Apache Nutch commercial support - Nutch - [mail # user]
...I will add that likewise for all open source projects most of the top committers are either work for a company that actively use it in a large scale or/and support it as contractors.But if y...
   Author: Semyon Semyonov , Markus Jelsma , ... , 2018-10-12, 16:36
[NUTCH-2650] -addBinaryContent -base64 flags are causing "String length must be a multiple of four" error in IndexingJob - Nutch - [issue]
... I am running Nutch crawl command as follows, in distributed mode:runtime/deploy/bin/crawl -i -D solr.server.url=http://my-solr:8983/solr/my-collection -D solr.server.type=cloud -D solr.zook...    Author: asmita , 2018-10-12, 10:06
[expand - 3 more] - Unable to get regex-urlfilter working - Nutch - [mail # user]
...Thanks Lewis,It was very basic mistake on my part. Default crawl script launchesgenerateJob with -noFilter switch which I failed to take notice of. Rest ofthe configurations and job file wer...
   Author: Gajanan Watkar , lewis john mcgibbney , ... , 2018-10-12, 07:00
[expand - 2 more] - Nutch 1.15: Solr indexing issue - Nutch - [mail # user]
...Thank you so much.They changed it dramatically. It is not accepting solr.server.url anymore and even old solr mapping xml file. Everything now under IndexWriter.xmlKind regards, Hany Shehata...
   Author: hany.nasr@... , Yossi Tamari , ... , 2018-10-11, 10:49
[NUTCH-2186] -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob - Nutch - [issue]
...When using the following indexing command./runtime/local/bin/nutch index -crawldb /usr/local/trunk_new1/esdswg_crawl/crawldb/ -linkdb /usr/local/trunk_new1/esdswg_crawl/linkdb/ -segmentDir /...    Author: Lewis John McGibbney , 2018-10-11, 10:43
[NUTCH-2648] Make configurable whether TLS/SSL certificates are checked by protocol plugins - Nutch - [issue]
...(see discussion in NUTCH-2647)It should be possible to enable/disable TLS/SSL certificate validation centrally for all http/https protocol plugins by a single configuration property.Some use...    Author: Sebastian Nagel , 2018-10-09, 13:56
[NUTCH-2649] Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit - Nutch - [issue]
...NUTCH-2648 adds a property to enable/disable the TLS/SSL certificate validation for protocol-http, protocol-httpclient and protocol-okhttp. It should be also supported by remaining protocol ...    Author: Sebastian Nagel , 2018-10-09, 08:28
[NUTCH-2630] Fetcher to log skipped records by robots.txt - Nutch - [issue]
...To analyze problems it would be helpful if fetcher logs URLs which are disallowed in the robots.txt - see discussion on user mailing list....    Author: Markus Jelsma , 2018-10-08, 12:53