That's what it looked like to me, too. I wonder if it would be worth
improving the estimate for some very common Collections classes? I see
this comment eg in BaseIndexFileFormatTestCase:

      // we have no way to estimate the size of these things in codecs although
      // something like a Collections.newSetFromMap(new HashMap<>()) uses quite
      // some memory... So for now the test ignores the overhead of such
      // collections but can we do better?

This is in testRamBytesUsed and there is a kind of fudge factor in
there for handling mismeasurement errors of the sort we are talking
about. Actually the test seems to be more about validating the
RamUsageTester than about validating the accounting in SegmentReader!
There are lots of other usages in tests, but I suppose they don't
require very precise handling of Collections classes (since they
pass)? Anyway it is certainly possible to improve the estimate quite a
bit and pretty easily for HashMap by simply counting the size of the
Node that is used for each entry, although given the dynamic nature of
these data structures (HashMap eg can use TreeNodes sometimes
depending on data distribution) it would be almost impossible to be
100% accurate.
On Thu, Dec 6, 2018 at 7:14 AM Dawid Weiss <[EMAIL PROTECTED]> wrote: