On 1/26/2018 9:55 AM, Clemens Wyss DEV wrote:
The deleteByQuery functionality is known to have some issues getting
along with other things happening at the same time.
For best performance and compatibility with concurrent operations, I
would strongly recommend that you change all deleteByQuery calls into
two steps: Do a standard query with fl=id (or whatever your uniqueKey
field is), gather up the ID values (possibly with start/rows pagination
or cursorMark), and then proceed to do one or more deleteById calls with
those ID values. Both the query and the ID-based delete can coexist
with other concurrent operations very well.
I would expect that doing atomic updates to a deleted field in your
documents is going to be slower than the query/deleteById approach. I
cannot be sure this is the case, but I think it would be. It should be
a lot more friendly to NRT operation than deleteByQuery.
As Walter said, expungeDeletes will result in Solr doing a lot more work
than it should, slowing things down even more. It also won't affect
search results at all. Once the commit finishes and opens a new
searcher, Solr will not include deleted documents in search results.
The expungeDeletes parameter can make commits take a VERY long time.
I have no idea whether the issues surrounding deleteByQuery can be fixed