Thank you Alex.  I have managed to get this to work via URLClassifyProcessorFactory. If anyone is interested, it can be easily done via with the following solrconfig.xml

<updateRequestProcessorChain name="urlProcessor">
<processor class="org.apache.solr.update.processor.URLClassifyProcessorFactory">
 <bool name="enabled">true</bool>
 <str name="inputField">SolrId</str>
 <str name="domainOutputField">hostname</str>
<processor class="solr.RunUpdateProcessorFactory" />

<requestHandler name="/update" class="solr.UpdateRequestHandler">
        <lst name="defaults">
         <str name="update.chain">urlProcessor</str>

I will look at how to submit a patch to the Java doc.


-----Original Message-----
From: Alexandre Rafalovitch [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 13, 2018 12:13 AM
To: solr-user <[EMAIL PROTECTED]>
Subject: [EXT] Re: Extracting top level URL when indexing document

Try URLClassifyProcessorFactory in the processing chain instead, configured in solrconfig.xml

There is very little documentation for it, so check the source for exact params. Or search for the blog post introducing it several years ago.

Documentation patches would be welcome.


On Wed, Jun 13, 2018, 01:02 Hanjan, Harinder, <[EMAIL PROTECTED]>