clear query| facets| time Search criteria: .   Results from 1 to 10 from 13 (0.0s).
Loading phrases to help you
refine your search...
robots.txt Disallow not respected - Nutch - [mail # user]
...Hi,Check that robots.txt is acquired and parsed correctly. Try to change the protocol to protocol-httpclient.ZOn 2017-12-10 23:54:14, Sebastian Nagel  wrote:Hi,I've tried to reproduce i...
   Author: Zoltán Zvara , 2017-12-11, 06:16
Serious OOM while using PhantomJS on Nutch 1.13 - Nutch - [mail # user]
...Dear Community,We are experiencing troubling PhantomJS 1.9.8 memory leaks, in which neighbor services, for example a DataNode is not able to execute even a "df" command due to OOM errors on ...
   Author: Zoltán Zvara , 2017-11-20, 15:17
[expand - 2 more] - db.fetch.schedule.adaptive.min_interval not respected by Nutch 1.13 - Nutch - [mail # user]
...We got the problem. Looking into the code of `AdaptiveFetchSchedule`, a `defaultInterval` will be used for the first time for each record, which is evaluated from configuration parameter "db...
   Author: Zoltán Zvara , 2017-11-18, 17:25
Struggling with adaptive recrawl - Nutch - [mail # user]
...Dear Nutch Community,Thanks for the help you provided so far, more recently we have beenstruggling with adaptive recrawl.My goal is to run a large-scale crawl job adaptively, starting off wi...
   Author: Zoltán Zvara , 2017-08-25, 15:23
[expand - 1 more] - After Parse extension point - Nutch - [mail # user]
...Hi Yossi and Jorge,Thanks for your detailed answer and guidance! I will look into the materials immediately.We started to use Nutch 1.X extensively, and we would definitely contribute improv...
   Author: Zoltán Zvara , 2017-07-27, 12:39
Build Nutch for Hadoop 2.8.0 - Nutch - [mail # user]
...Hi Community,Nutch 1.13 build depends on Hadoop 2.7.2 and some 2.2.0 hits the `lib` directory as well under the `build`. However, we use Hadoop 2.8.0 currently. I've found no traces in the a...
   Author: Zoltán Zvara , 2017-07-19, 19:34
[expand - 1 more] - Configuration is not found by Nutch when running Inject remotely - Nutch - [mail # user]
...Hi Sebastian,Thanks for your tips. I have switched on debugging for YARN, and kept "launch_container.sh" for a few minutes to be able to examine. HADOOP AND NUTCH CONF + HOME directories wer...
   Author: Zoltán Zvara , 2017-07-19, 16:40
[expand - 1 more] - Nutch 1.X with alternative storage - Nutch - [mail # user]
...Thanks Markus,As you have mentioned, I only care about the segment data, and to avoid having the raw content and parsed content not to be present there, only in my external data storage. I'v...
   Author: Zoltán Zvara , 2017-06-16, 19:52