clear query| facets| time Search criteria: .   Results from 1 to 10 from 25 (0.0s).
Loading phrases to help you
refine your search...
[expand - 1 more] - Apache Nutch CleaningJob failed - Nutch - [mail # user]
...Hi Anna,Sebastian knows the schedule. I don't. I'll let Sebastian respond. In thebug report I linked Sebastian noted that there was a 1.14 snapshot releasethat you could build and run. I goo...
   Author: Sol Lederman , 2017-12-08, 17:58
General question on dealing with file types - Nutch - [mail # user]
...Like most of you I imagine, I want to capture and index file types from aparticular set of types. I want to index HTML but I may or may not want toindex cgi-bin or PDFs. It seems that there ...
   Author: Sol Lederman , 2017-11-25, 16:56
[expand - 1 more] - Can't get any regex to work in regex-urlfilters.txt - Nutch - [mail # user]
...Sebastian,Thanks for the engagement and for the quick reply. I still can't get it towork. Here's something I don't understand. I assume that the dot in "+."means to match any character so it...
   Author: Sol Lederman , 2017-11-21, 20:09
[expand - 1 more] - Why do I only get 28 records when I crawl the tutorial example of nutch.apache.org? - Nutch - [mail # user]
...Thanks. Including https didn't make a difference. Anyway, I've moved on toother sites where I am getting lots more hits.Sol...
   Author: Sol Lederman , 2017-11-17, 22:07
Is there a broken Nutch 1.13 binary release? - Nutch - [mail # user]
...Hi,I retrieved the pre-built nutch code fromhttp://www.trieuvan.com/apache/nutch/1.13/apache-nutch-1.13-bin.tar.gz.When I try to run a clean I get errors:% bin/nutch solrclean crawl/crawldb ...
   Author: Sol Lederman , 2017-11-13, 00:11
[expand - 2 more] - different regex-urlfilter.txt files for different sets of URLs? - Nutch - [mail # user]
...Hi Rushikesh,I'm very new to Nutch. I'll let Sebastian and the other experts guide you.I suspect that success in removing the header and footer will be verydependent on the HTML files you're...
   Author: Sol Lederman , 2017-11-09, 18:16
[expand - 2 more] - Tagging records by seed list - Nutch - [mail # user]
...Thanks, Sebastian.Ah, I got the wrong information from an old API Javadoc page.I fixed the plugin name, updated the nutch and Solr schemas to have my newfield, dropped and readded the nutch ...
   Author: Sol Lederman , 2017-11-06, 18:44
[expand - 3 more] - index fails: java.io.IOException: Job failed! - Nutch - [mail # user]
...Ok, I've figured it out. It didn't make sense that nutch was trying to usethe elasticsearch indexer instead of the solr indexer. I thought it wasenough to have indexer-solr listed before ind...
   Author: Sol Lederman , 2017-10-07, 22:55
[expand - 1 more] - Unable to create core [nutch] Caused by: enablePositionIncrements is not a valid option as of Lucene 5.0 - Nutch - [mail # user]
...Yeah, that worked. Thanks. I guess it's an issue with the tutorial for thatnutch/solr pair. I've run into a couple of other issues with the tutorial.One I was able to workaround. The cleaner...
   Author: Sol Lederman , 2017-09-28, 19:14
[expand - 1 more] - [EXT] Another issue with the nutch tutorial - plugin init failure ... fieldType: text_general - Nutch - [mail # user]
...Hi Sebastian,When will Nutch 1.14 be released?Thanks.SolOn Mon, Sep 18, 2017 at 8:21 AM, Sebastian Nagel <[EMAIL PROTECTED]> wrote:> Hi,>> > The combination Nutch 1.13/Solr...
   Author: Sol Lederman , 2017-09-19, 16:28