clear query| facets| time Search criteria: .   Results from 1 to 10 from 22 (0.0s).
Loading phrases to help you
refine your search...
A couple of basic questions re scheduled crawls. - Nutch - [mail # user]
...I have a couple of very basic questions about scheduled crawls.every crawled page has a scheduled fetch date (?).  How do I know thatnutch actually went out and crawled it?  How do...
   Author: Fred Zimmerman , 2018-07-26, 15:44
how do fetch wait times work? - Nutch - [mail # user]
...When I run bin/crawl once and it generates a segment list with a bunch offetch dates in the future, does nutch proactively run those fetches onthose future dates, or do I have to do somethin...
   Author: Fred Zimmerman , 2018-04-09, 19:14
OutOfMemoryError when indexing into Solr - Nutch - [mail # user]
...I'm having the exact same problem. I am trying to isolate whether it is a Solr problem or a Nutch+Solr problem.  On Wed, Oct 26, 2011 at 11:54 PM,  wrote:  > Hi, > > ...
   Author: Fred Zimmerman , 2011-10-27, 12:20
1) success 2) how to tell Nutch "index everything" - Nutch - [mail # user]
...1) I resolved the issues with solrindex. It turned out to be a matter of adding all the nutch schema-specific fields to solr's schema.xml.  there was one gotcha which is that the latest...
   Author: Fred Zimmerman , 2011-10-26, 14:37
[expand - 5 more] - solrindexer parameters -- input path does not exist: crawl_fetch, parse_data, etc. - Nutch - [mail # user]
...will do.  Of course I have already googled these terms without much luck.  Fred  On Wed, Oct 26, 2011 at 9:34 AM, lewis john mcgibbney  wrote:  > Hi Fred, > &g...
   Author: Fred Zimmerman , 2011-10-26, 13:38
[expand - 2 more] - advice, config files for crawling private wikipedia mirror - Nutch - [mail # user]
...so let me make sure I understand.  what this guy did is that he made an XML file from his local backup of wikipedia but he didn't crawl it?maybe I don't need to crawl it, either, since ...
   Author: Fred Zimmerman , 2011-10-10, 14:41
[expand - 1 more] - when and how to delete old crawls? - Nutch - [mail # user]
...I mean the directories like this:  crawl-20110920160208 crawl-20110920211805 etc ...     On Wed, Oct 5, 2011 at 11:08, Markus Jelsma wrote:  > "crawls" or segment dire...
   Author: Fred Zimmerman , 2011-10-05, 15:14
[expand - 1 more] - Interpreting Nutch results - Nutch - [mail # user]
...thanks for the tip about filtering  ----------------------------------------------------- Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for monthly updates &nb...
   Author: Fred Zimmerman , 2011-09-30, 15:29
Understanding Nutch workflow - Nutch - [mail # user]
...this is helpful -- can someone also explain whether there is mechanism to extract full text of pages from where they are stored in mapreduce?   On Tue, Sep 27, 2011 at 11:24, Bai Shen &...
   Author: Fred Zimmerman , 2011-09-27, 15:42
Can't retrieve Tika Parser for mime-type - Nutch - [mail # user]
...Basic question:  I have Nutch crawling and sending documents to Solr for indexing.  Now when I get the Solr answer set, I want to go get all the documents at once and append them i...
   Author: Fred Zimmerman , 2011-09-26, 17:25