Skip to main content

Poll: Handling lucene-dev Merge

sematext sematext on

Lucene and Solr projects merged recently, as we mentioned in Solr Digest and Lucene Digest for March 2010.  Today, their -dev mailing lists finally finally merged.  Since Sematext runs the service that makes these lists (and more) searchable, we need to decided how to handle this relatively drastic change.

Short version: Please tell us how you would like us to handle lucene-dev merge on by selecting your choice in our Handling lucene-dev merge Poll.  The 2 choices are described below.

We’ve identified 2 options, and we need your input to help us decide what the right option is:

  • We can add a new lucene-dev list and start indexing it.  This would contain only the new lucene-dev content (for both Lucene and Solr development from today on).  This downside is that if you wanted to include old lucene-dev messages or old solr-dev messages in your search, you would have to explicitly select those lists.  We could rename them to lucene-dev-old and solr-dev-old for example, so the UI would show lucene-dev, lucene-dev-old, and solr-dev-old.  You’d have total control over what you want searched, but it would require you to make your choices explicitly, which also means people would have to understand what those -old lists are about and why there is no solr-dev.
  • We could merge the old solr-dev and old lucene-dev, and have a single lucene-dev that has both of those lists’ old messages (up to today), as well as all the new messages from the merged lucene-dev list from here on.  In effect, it would look as it Lucene and Solr always had a single lucene-dev list, since all of the old lucene-dev and solr-dev content would be in this new lucene-dev.  If we go this route, there would be no lucene-dev-old or solr-dev-old in the UI, just one lucene-dev choice.  But there also wouldn’t be solr-dev choice in the UI, since it doesn’t exist any  more, which may be confusing.  Thus, when you choose to search Solr, you wouldn’t see solr-dev facet in the UI, but the lucene-dev list’s content would be searched, so you wouldn’t actually miss any matches.

If there is a 3rd or 4th option that we missed, please let us know via comments!

Please tell us which option you would prefer as user by selecting your choice in our Handling lucene-dev merge Poll.  Thank you.

9 thoughts on “Poll: Handling lucene-dev Merge

  1. Voted for Option 2.

    If you keep the two dev list separate for now, at some point in future, the old (solr) list will be redundant. Why not do it today.

    A crude way to get results from previous solr list is to simply add the keyword “solr” to search terms. Results may not be completely accurate, but it will save the hassle of having another filter in UI.

    1. Shashikant, when you say “redundant” what do you mean? Are you saying the content in old (solr) list will become a lot less valuable because in the near future it will become outdated? Because it’s a -dev list where a discussion from 2008 may no longer be important? Thanks.

      1. I didn’t think much when I made the statement. Now that you have pointed it out, I think, I overstated it.

        The point I am trying to make is that over a period of time, the value of old solr list will reduce, not necessarily, “lot less” as some the information will be outdated.

        1. I’d tend to agree. Questions and answers appear over and over, and old answers to old questions stop being correct after a while. Perhaps keeping only the last N months would be the thing to do, esp. for -dev lists?

  2. 4th option?

    I’d like to see a new combined solrlucene-dev list. All the old links to the old lists would work and there wouldn’t be the confusion of mismatched names.

    While the copy-pasted quoting would be a hassle, that’s a one-time thing, but going forward, it would make more sense.


    1. Sorry, I don’t follow, Avi. 🙁
      * It sounds like you are referring to our option 2? (I’m a bit hesitant to include “solr” in the name – what if something else gets included under Lucene?) Aha, I see, you are suggesting our option 2, BUT with “solr” in the same, so that one can tell from the name that this list includes both Lucene AND Solr?

      * Which copy-pasting quoting are you referring to?


  3. What about introducing a “crawler” property to the old posts, so you could add a filter criteria UI element (include/exclude old lucene, include/exclude old solr), but index them all as one. That way, you have the big search over everything (which should be fine in most cases) or the option to filter to include/exclude for those that need it.

    1. Are you suggesting something like this in the UI:
      [] lucene-dev (12000)
      – [] lucene-dev-old (7000)
      – [] solr-dev-old (5000)

      Where each of the 3 options (facet values) can be selected for subsequent search, and where the parent lucene-dev, if selected, includes both of the -old lists?
      The numbers in parentheses are just made-up facet counts for those facet values.

      And, in case the original searcher selected just “Solr” or just “Lucene” or both “Solr” and “Lucene” facets, the lucene-dev would be searched, which means that essentially all old and all new messages would be included in search, as if lucene-dev was selected?


      1. That could work, but maybe add a third facet:

        [] lucene-dev-all (12000)
        – [] lucene-dev-new (1000)
        – [] lucene-dev-old (6000)
        – [] solr-dev-old (5000)

Leave a Reply