We (Facebook) are closely monitoring the OS page cache hit ratio in the
production environments. My experience is if your data access pattern is
very random, then the OS page cache won't help you so much even though the
data locality is very high. On the other hand, if the requests are always
against the recent data points, then the page cache hit ratio could be much

Actually, there are lots of optimizations could be done in HDFS. For
example, we are working on fadvice away the 2nd/3rd replicated data from OS
page cache so that it potentially could improve your OS page cache by 3X.
Also, by taking advantage of the tiered-based compaction+fadvice in HDFS,
the region server could keep more hot data in OS page cache based on the
read access pattern.

Another separate point is that we probably should NOT reply on the
memstore/block cache to keep hot data. 1) The more data in the memstore,
the more data the region server need to recovery from the server failures.
So the tradeoff is the recovery time. 2) The blocks in the block cache will
be naturally invalid quickly after the compactions. So region server
probably won't be benefit from large JVM size at all.

Thanks a lot

On Sat, Mar 23, 2013 at 6:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote: