[NUTCH-1377] Add option to index via CloudSolrServer instead - Nutch - [issue]
...Nutch indexes to a specific Solr server. With SolrCloud on its way we can still use the current indexer and point to any server. However, the SolrCloudServer can connect to ZooKeeper instead...    Author: Markus Jelsma , 2018-10-15, 12:41
[NUTCH-1021] Migrate OutlinkExtractor from Apache ORO to java.util.regex - Nutch - [issue]
...Migrate from deprecated ORO to Java util regex....    Author: Markus Jelsma , 2018-10-13, 16:53
[NUTCH-1014] Migrate from Apache ORO to java.util.regex - Nutch - [issue]
...A separate issue tracking migration of all components from Apache ORO to java.util.regex. Components involved are: RegexURLNormalzier OutlinkExtractor JSParseFilter MoreIndexingFilter BasicU...    Author: Markus Jelsma , 2018-10-13, 16:53
[NUTCH-2192] Get rid of oro - Nutch - [issue]
...Couple of classes still rely on oro, we should get rid of it....    Author: Markus Jelsma , 2018-10-13, 16:53
Apache Nutch commercial support - Nutch - [mail # user]
...Hello Hany,There are a few, mine included, mentioned on the Nutch support wiki page [1].Regards,Markus[1]  -----Original message-----> From...
   Author: Markus Jelsma , 2018-10-12, 07:48
[NUTCH-2630] Fetcher to log skipped records by robots.txt - Nutch - [issue]
...To analyze problems it would be helpful if fetcher logs URLs which are disallowed in the robots.txt - see discussion on user mailing list....    Author: Markus Jelsma , 2018-10-08, 12:53
Regex to block some patterns - Nutch - [mail # user]
...Hi Amarnatha,-^.+(?:modal|exit).*\.htmlWill work for all exampes given.You can test regexes really well online [1]. If each input has true for lookingAt, Nutch' regexfilter will filter the U...
   Author: Markus Jelsma , 2018-10-03, 19:23
Nutch 2.x HBase alternatives - Nutch - [mail # user]
...Hi Benjamin,If you do not specifically require Nutch 2.x, i would strongly suggest to go to Nutch 1.x. It doesn't have the added hustle of a DB and DB layer, is much more mature and gets the...
   Author: Markus Jelsma , 2018-10-03, 19:17
[NUTCH-2647] Skip TLS certificate checks in protocol-http plugin - Nutch - [issue]
...Taken from protocol-httpclient, protocol-http now has support for ignoring certificate checks....    Author: Markus Jelsma , 2018-09-28, 11:56
Nutch Maven support for plugins - Nutch - [mail # user]
...Hello Rustam,You can use urlnormalizer-slash for this task.Regards,Markus  -----Original message-----> From:Rustam > Sent: Wednesday 29th August 2018 10:30> To: [EMAIL PRO...
   Author: Markus Jelsma , 2018-08-29, 13:14