I pretty much agree with your business side.
The rough size of the docValues fields is one of X for each doc. So
say you have an int field. Size is near maxDoc * 4 bytes. This is not
totally accurate, there is some int packing done for instance, but
it'll do. If you really want an accurate count, look at the
before/after size of your *.dvd, *.dvm segment files in your index.
However, it's "pay me now or pay me later". The critical operations
are faceting, grouping and sorting. If you do any of those operations
on a field that is _not_ docValues=true, it will be uninverted on the
_java heap_, where it will consume GC cycles, put pressure on all your
other operations, etc. This process will be done _every_ time you open
a new searcher and use these fields.
If the field _does_ have docValues=true, that will be held in the OS's
memory space, _not_ the JVM's heap due to using MMapDirectory (see:http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
Among other virtues, it can be swapped out (although you don't want it
to be, it's still better than OOMing). Plus loading it is just reading
it off disk rather than the expensive uninversion process.
And if you don't do any of those operations (grouping, sorting and
faceting), then the bits just sit there on disk doing nothing.
So say you carefully define what fields will be used for any of the
three operations and enable docValues. Then 3 months later the
business side comes back with "oh, we need to facet on another field".
Your choices are:
1> live with the increased heap usage and other resource contention.
Perhaps along the way panicking because your processes OOM and prod
2> reindex from scratch, starting with a totally new collection.
And note the fragility here. Your application can be humming along
just fine for months. Then one fine day someone innocently submits a
query that sorts on a new field that has docValues=false and B-OOM.
If (and only if) you can _guarantee_ that fieldX will never be used
for any of the three operations, then turning off docValues for that
field will save you some disk space. But that's the only advantage.
Well, alright. If you have to do a full index replication that'll
happen a bit faster too.
So I prefer to err on the side of caution. I recommend making fields
docValues=true unless I can absolutely guarantee (and business _also_
1> that fieldX will never be used for sorting, grouping or faceting,
2> if the can't promise that they guarantee to give me time to
On Wed, Jun 13, 2018 at 4:30 PM, root23 <[EMAIL PROTECTED]> wrote: