Pratik may have jumped right to the difference. We'd have gotten there
eventually by looking at file extensions, but just checking his
recommendation would be the first thing to do!

bq:  what would be the right scenarios to use docvalues='true'?

Whenever you want to facet, group or sort on the field. This _will_
increase the index size on disk, but it's almost always a good
tradeoff, here's why:

To facet, group or sort you need to "uninvert" the field. If you have
docValues=false, this universion is done at run-time into Java's heap.
If you have docValues=true, the uninversion is done at _index_ time
and the result stored on disk. Now when it's required, it can be
loaded in from disk efficiently (essentially de-serialized) and is
stored on the OS memory due to the magic of MMapDirectory, see:

bq:  In what situation would it make sense to have indexed=false and

When you want to return _only_ fields that have docValues=true. If you
return fields with stored=true and docValues=false, Solr/Lucene has to
1> read the stored values from disk (minimum 16K block)
2> decrypt it
3> extract the field

With docValues, since they're only simple field types, all that you
have to do is read the value from the docValues structure., much more
efficient. HOWEVER, there are two caveats
1> The entire docValues field will be MMapped, so there's a time/space tradeoff.
2> docValues are stored in a sorted_set. This is relevant for
multiValued field because:
2a> values are returned in sorted order, not the order they were in the document
2b> identical values are collapsed.

So if the input values for a particular doc were 4, 3, 6, 4, 5, 2, 6,
5, 6, 5, 4, 3, 2 you'd get back 2, 3, 4, 5, 6

If you an live with those caveats, then returning field values would
involve much less work (both I/O and CPU), especially in
high-throughput situations. NOTE: there are a couple of JIRAs IIRC
that have to do with not storing the <uniqueKey> though.


On Wed, Feb 14, 2018 at 7:01 AM, Pratik Patel <[EMAIL PROTECTED]> wrote: