[NUTCH-2667] Update Tika and Commons Collections 4 - Nutch - [issue]
...Tika and Commons Collections 4 need to be updated. This issue needs to address them....    Author: Lewis John McGibbney , 2018-10-24, 17:11
[NUTCH-2199] Documentation for Nutch 2.X REST API - Nutch - [issue]
...The work done on NUTCH-1800 needs to be ported to 2.X branch. This is trivial, I thought I had already done it but obviously not....    Author: Lewis John McGibbney , 2018-10-24, 09:10
webapp for Nutch deploy mode - Nutch - [mail # user]
...Hi Gahanna,Response inlineOn 2018/10/12 07:40:50, Gajanan Watkar  wrote: > Hi all,> I am using Nutch 2.3.1 with Hbase-1.2.3 as storage backend on top of> Hadoop-2.5.2 cluster i...
   Author: Lewis John McGibbney , 2018-10-18, 19:24
[NUTCH-1121] JUnit test for parse-js - Nutch - [issue]
...This issue is part of the larger attempt to provide a Junit test case for every Nutch plugin....    Author: Lewis John McGibbney , 2018-10-13, 16:53
Unable to get regex-urlfilter working - Nutch - [mail # user]
...Hi Gajanan,Seeing as you are using 2.x, are you making sure that the project has beenbuilt with the correct   regex-urlfilter.txt being present on ClassPath andincluded in the job jar y...
   Author: lewis john mcgibbney , 2018-10-11, 16:09
[NUTCH-2186] -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob - Nutch - [issue]
...When using the following indexing command./runtime/local/bin/nutch index -crawldb /usr/local/trunk_new1/esdswg_crawl/crawldb/ -linkdb /usr/local/trunk_new1/esdswg_crawl/linkdb/ -segmentDir /...    Author: Lewis John McGibbney , 2018-10-11, 10:43
Uneven HBase region sizes WAS Re: Nodemanager crashing repeatedly - Nutch - [mail # user]
...Hi Gajanan,CC dev@gora, this is something we may wish to implement within HBase.If anything I've provided below is incorrect, then please correct therecord.BTW, I found the following article...
   Author: lewis john mcgibbney , 2018-09-19, 17:36
Nodemanager crashing repeatedly - Nutch - [mail # user]
...Hi Gajanan,Which OS are you running this on?I would also suggest that if you want to use the 2.x codebase, you shoulduse the most recent from SCM e.g. check out master and change to 2.x bran...
   Author: lewis john mcgibbney , 2018-09-06, 17:01
redirect bin/crwal log output to some other file - Nutch - [mail # user]
...Hi Amarnatha,There are a couple of options which I can think of.1. Why don't you just set up a simple daemon to watch hadoop.log andgenerate a subsequent stream writing it to /tmp/myurls.log...
   Author: lewis john mcgibbney , 2018-09-06, 16:08
IndexWriter interface in 1.15 - Nutch - [mail # user]
...Hi Yossi,REASON: Upgrade of MapReduce API from legacy to 'new'. This was a breakingchange for sure and a HUGE patch. We did not however factor in thenon-braking aspects of the upgrade... so ...
   Author: lewis john mcgibbney , 2018-09-06, 15:58