RE: Apache Nutch commercial support - Nutch - [mail # user]
...I will add that likewise for all open source projects most of the top committers are either work for a company that actively use it in a large scale or/and support it as contractors.But if y...
   Author: Semyon Semyonov , 2018-10-12, 16:36
[NUTCH-2510] Crawl script modification. HostDb : generate, optional usage and description - Nutch - [issue]
...Script crawl now includes hostdb update as a part of crawling cycle, but :1) There is no hostdb parameter for generate2) Generation of hostdb is not optional, therefore hostdb is generated e...    Author: Semyon Semyonov , 2018-07-02, 14:54
[MASSMAIL][ANNOUNCE] New Nutch committer and PMC - - Nutch - [mail # user]
...Hi Roannel,Congratulations and good luck! Semyon. Sent: Wednesday, June 27, 2018 at 3:42 AMFrom: "Roannel Fernández Hernández" To: [EMAIL PROTECTED]Subject: Re: [MASSMAIL][ANNOUNCE] New Nutc...
   Author: Semyon Semyonov , 2018-06-27, 07:17
[NUTCH-2539] Not correct naming of db.url.filters and db.url.normalizers in nutch-default.xml - Nutch - [issue]
...There is a mismatch between config and code.In code,  In CrawlDbFilter line 41:43> public static final String URL_FILTERING = "crawldb.url.filters";> public static final String UR...    Author: Semyon Semyonov , 2018-04-11, 00:01
RunNutchInEclipse - Nutch - [mail # dev]
...I have a feeling nobody uses Eclipse for the development.1) I fixed bug by myself several months ago that prevents Eclipse to work, I reported the other and now it again doesn't work because...
   Author: Semyon Semyonov , 2018-03-23, 08:13
[expand - 5 more] - Internal links appear to be external in Parse. Improvement of the crawling quality - Nutch - [mail # user]
...I found out that there is no direct way to do it, the problem was solved through calling of the regex transformation one more time in IndexerMapReduce, before the Indexer gets the Doc for wr...
   Author: Semyon Semyonov , 2018-03-20, 15:17
[expand - 1 more] - Config issues with URL filters and normalizers in UpdateCrawlDb - Nutch - [mail # dev]
...Hi Sebastian,No problems.Here it is, Monday, March 19, 2018 at 2:02 PMFrom: "Sebastian Nagel" To: [EMAIL PROTECTED]Subject: Re: Co...
   Author: Semyon Semyonov , 2018-03-19, 13:24
[NUTCH-2538] Refactoring of Regex Url Normalizer and Bidirectional Url ExemptionFilter - Nutch - [issue]
...NUTCH-2522 uses the same regex logic as RegxUrlNormalizer. These plugins can be refactored to the same base class....    Author: Semyon Semyonov , 2018-03-16, 18:05
[NUTCH-2522]  Bidirectional URL exemption filter - Nutch - [issue]
...The current Nutch Url Exemption plugin exempts based on toUrl only, the new plugin uses both fromUrl and toUrl and after the regex transformation, exempts based on condition regex(fromUrl) =...    Author: Semyon Semyonov , 2018-03-16, 18:05
[NUTCH-2537] Logical OR instead of AND in UrlExemptionFilters - Nutch - [issue]
...With NUTCH-2522 another urlexemptionfilter is added, therefore now we can combine the filters.We should use more reasonable combination of ExemptionFilters based on OR, instead of AND.The fo...    Author: Semyon Semyonov , 2018-03-16, 18:05