Subject: Cacheblocksonwrite not working during compaction?


Thank you for the feedback!

Our cache size *is* larger than our data size, at least for our heavily accessed tables. Memory may be prohibitively expensive for keeping large tables in an in-memory cache, but storage is cheap, so hosting a 1 TB bucketcache on the local disk of each of our region servers is feasible and that is what we are trying to accomplish.

I'm not sure I understand the complexity of populating a cache that is supposed to represent the data in files on disk while writing out one of those files during the compaction process. In fact, that's what I understood the hbase.rs.cacheblocksonwrite to do (based on nothing more than the description of the setting in the online hbase book - I don't see very good documentation online for this feature). If that setting doesn't do that, then what does it do exactly? What about the hbase.rs.evictblocksonclose setting? Could that be evicting all of the blocks that are put in the cache at the end of compaction? What are the implications if we set that to "false"?

Prefetching is also OK for us to do on some tables because we are using the on-disk cache (I understand this also means opening a region after a split or move will take longer). But I don't understand why it appeared that prefetching was being done when the region wasn't opened recently. I don't expect prefetching to help us with compactions, but seeing the thread getting blocked after a compaction just raised a red flag that I'm not understanding what is going on.

I understand that some latency during compaction is expected, but what we are seeing is fairly extreme. The instances take thread dumps every 15 minutes and we saw threads still in a BLOCKED state on the same input stream object an hour later! This is after a 3.0 GB compaction was already done. If prefetching was happening, then something seems wrong if it takes an hour to populate 3.0 GB worth of data in a local disk cache from S3.

I appreciate the help on this!

--Jacob LeBlanc

-----Original Message-----
From: Vladimir Rodionov [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 20, 2019 6:41 PM
To: [EMAIL PROTECTED]
Subject: Re: Cacheblocksonwrite not working during compaction?

>>- Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does it
only work for flushing and not for compaction? I can see from the logs that the file is renamed >>after being written. Does that have something to do with why cacheblocksonwrite isn't working?

Generally, it is a very bad idea to enable caching on read/write during compaction unless your cache size is larger than your data size (which is  not a common case).
Cache invalidation during compaction is almost inevitable thing, due to a complexity of a potential optimizations in this case. Actually, there are some works (papers) on the Internet particularly dedicated to a smarter cache invalidation algorithms for a LSM - derived storage engines, but engineers, as usual, much more conservative than academia researches and are not eager to implement novel (not battle tested) algorithms.
Latency spikes during compaction are normal and inevitable things, at least for HBase and especially, when one deals with S3 or any other cloud storage. S3 read latency can reach seconds sometimes and the only possible mitigation for this huge latency spikes is a very-smart-cache -invalidation-during-compaction algorithm (which does not exist yet).

For your case, I would recommend the following settings:

*CACHE_BLOOM_BLOCKS_ON_WRITE_KEY = true*

*CACHE_INDEX_BLOCKS_ON_WRITE_KEY = true*

*CACHE_DATA_BLOCKS_ON_WRITE_KEY = false (bad idea to set it to true)*
 PREFETCH_BLOCKS_ON_OPEN should be false as well, unless your table is small and your application does this on startup (once)
-Vlad

On Fri, Sep 20, 2019 at 12:51 PM Jacob LeBlanc <[EMAIL PROTECTED]>
wrote:

> Hi HBase Community!
>
> I have some questions on block caches around how the prefetch and
> cacheblocksonwrite settings work.
>
> In our production environments we've been having some performance