clear query| facets| time Search criteria: .   Results from 1 to 6 from 6 (0.0s).
Loading phrases to help you
refine your search...
[NUTCH-1993] Nutch does not use backup parsers - Nutch - [issue]
...From reading the code it is clear that it is designed to allow using several parsers to parse a document in a sequence, until it is successfully parsed. In practice, this does not work becau...
http://issues.apache.org/jira/browse/NUTCH-1993    Author: Arkadi Kosmynin , 2018-07-19, 13:57
[NUTCH-2071]  A parser failure on a single document may fail crawling job if parser.timeout=-1 - Nutch - [issue]
...java.io.IOException: Job failed!        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)        at org.apache.nutch.parse.ParseSegmen...
http://issues.apache.org/jira/browse/NUTCH-2071    Author: Arkadi Kosmynin , 2018-07-17, 12:09
[NUTCH-2605] The Feed plugin causes a NumberFormatException - Nutch - [issue]
...The Feed plugin seems to have a major problem. The line 102 in  FeedIndexingFilter.java generated a NumberFormatException (which caused the failure of the entire crawling process!) because i...
http://issues.apache.org/jira/browse/NUTCH-2605    Author: Arkadi Kosmynin , 2018-06-28, 09:48
[NUTCH-2603] Bring back legacy pre-Tika parsers and use them as back up parsers - Nutch - [issue]
...There are cases when legacy parsers successfully parse documents on which Tika fails. I am attaching a list of examples of such documents. Nutch allows use of more than one parser on a docum...
http://issues.apache.org/jira/browse/NUTCH-2603    Author: Arkadi Kosmynin , 2018-06-20, 15:06
[NUTCH-2604] The lines defining catch-all (*) parser in parse-plugins.xml are ignored - Nutch - [issue]
...The lines defining catch-all  plugin in parse-plugins.xml are not effective, because they are ignored, as long as there is at least one plugin claiming * in its plugin.xml file. In some...
http://issues.apache.org/jira/browse/NUTCH-2604    Author: Arkadi Kosmynin , 2018-06-20, 06:32
[NUTCH-1251] SolrDedup to use proper Lucene catch-all query - Nutch - [issue]
...Deletion of duplicates fails. This happens because the "get all" query used to get Solr index size is "id:[* TO *]", which is a range query. Lucene is trying to expand it to a Boolea...
http://issues.apache.org/jira/browse/NUTCH-1251    Author: Arkadi Kosmynin , 2013-05-22, 03:53