Subject: SolrClient#updateByQuery?


On 1/26/2018 9:55 AM, Clemens Wyss DEV wrote:

The deleteByQuery functionality is known to have some issues getting
along with other things happening at the same time.

For best performance and compatibility with concurrent operations, I
would strongly recommend that you change all deleteByQuery calls into
two steps:  Do a standard query with fl=id (or whatever your uniqueKey
field is), gather up the ID values (possibly with start/rows pagination
or cursorMark), and then proceed to do one or more deleteById calls with
those ID values.  Both the query and the ID-based delete can coexist
with other concurrent operations very well.

I would expect that doing atomic updates to a deleted field in your
documents is going to be slower than the query/deleteById approach.  I
cannot be sure this is the case, but I think it would be.  It should be
a lot more friendly to NRT operation than deleteByQuery.

As Walter said, expungeDeletes will result in Solr doing a lot more work
than it should, slowing things down even more.  It also won't affect
search results at all.  Once the commit finishes and opens a new
searcher, Solr will not include deleted documents in search results. 
The expungeDeletes parameter can make commits take a VERY long time.

I have no idea whether the issues surrounding deleteByQuery can be fixed
or not.

Thanks,
Shawn