clear query| facets| time Search criteria: .   Results from 1 to 10 from 2581 (0.0s).
Loading phrases to help you
refine your search...
[NUTCH-2673] EOFException protocol-http - Nutch - [issue]
...Got an EOFException for some URL:2018-11-07 12:23:18,463 INFO  indexer.IndexingFiltersChecker - fetching: https://www.misdaadjournalist.nl/2018/11/politie-kraakt-server-van-blackbox-265...
http://issues.apache.org/jira/browse/NUTCH-2673    Author: Markus Jelsma , 2018-11-08, 14:42
Getting Nutch To Crawl Sharepoint Online - Nutch - [mail # user]
...Hello Ashish,You might want to check out Apache ManifoldCF.Regards.Markushttp://manifoldcf.apache.org/  -----Original message-----> From:Ashish Saini > Sent: Monday 29th Octo...
   Author: Markus Jelsma , 2018-10-29, 21:16
[NUTCH-2665] Upgrade to Apache Tika 1.19.1 - Nutch - [issue]
...Borrowing from Sebastian Nagel's efforts on NUTCH-2651, 2.x can be upgraded to Apache Tika 1.19.1 as well....
http://issues.apache.org/jira/browse/NUTCH-2665    Author: Markus Jelsma , 2018-10-24, 12:44
[NUTCH-2630] Fetcher to log skipped records by robots.txt - Nutch - [issue]
...To analyze problems it would be helpful if fetcher logs URLs which are disallowed in the robots.txt - see discussion on user mailing list....
http://issues.apache.org/jira/browse/NUTCH-2630    Author: Markus Jelsma , 2018-10-24, 04:46
[NUTCH-1377] Add option to index via CloudSolrServer instead - Nutch - [issue]
...Nutch indexes to a specific Solr server. With SolrCloud on its way we can still use the current indexer and point to any server. However, the SolrCloudServer can connect to ZooKeeper instead...
http://issues.apache.org/jira/browse/NUTCH-1377    Author: Markus Jelsma , 2018-10-15, 12:41
[NUTCH-1021] Migrate OutlinkExtractor from Apache ORO to java.util.regex - Nutch - [issue]
...Migrate from deprecated ORO to Java util regex....
http://issues.apache.org/jira/browse/NUTCH-1021    Author: Markus Jelsma , 2018-10-13, 16:53
[NUTCH-1014] Migrate from Apache ORO to java.util.regex - Nutch - [issue]
...A separate issue tracking migration of all components from Apache ORO to java.util.regex. Components involved are: RegexURLNormalzier OutlinkExtractor JSParseFilter MoreIndexingFilter BasicU...
http://issues.apache.org/jira/browse/NUTCH-1014    Author: Markus Jelsma , 2018-10-13, 16:53
[NUTCH-2192] Get rid of oro - Nutch - [issue]
...Couple of classes still rely on oro, we should get rid of it....
http://issues.apache.org/jira/browse/NUTCH-2192    Author: Markus Jelsma , 2018-10-13, 16:53
Apache Nutch commercial support - Nutch - [mail # user]
...Hello Hany,There are a few, mine included, mentioned on the Nutch support wiki page [1].Regards,Markus[1] https://wiki.apache.org/nutch/Support  -----Original message-----> From...
   Author: Markus Jelsma , 2018-10-12, 07:48
Regex to block some patterns - Nutch - [mail # user]
...Hi Amarnatha,-^.+(?:modal|exit).*\.htmlWill work for all exampes given.You can test regexes really well online [1]. If each input has true for lookingAt, Nutch' regexfilter will filter the U...
   Author: Markus Jelsma , 2018-10-03, 19:23