Subject: HBase Scan consumes high cpu

  Solvannan R M 2019-09-10, 16:06
  Josh Elser 2019-09-10, 17:12
  Anoop John 2019-09-13, 04:54
  ramkrishna vasudevan 2019-09-13, 15:54
Hi Anoop,

    We have executed the query with the qualifier set like you advised.
But we dont get the results for the range but only the specified
qualifier cell is returned.

Query & Result:

hbase(main):008:0> get 'mytable', 'MY_ROW',
{COLUMN=>["pcf:\x00\x16\xDFx"],
FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),
true, Bytes.toBytes(1499010.to_java(:int)), false)}
COLUMN CELL
  pcf:\x00\x16\xDFx                 timestamp=1568380663616,
value=\x00\x16\xDFx
1 row(s) in 0.0080 seconds

hbase(main):009:0>
Is there any other way to get arond this ?.
Regards,

Solvannan R M
On 2019/09/13 04:53:45, Anoop John wrote:
 > Hi>
 > When you did a put with a lower qualifier int (put 'mytable',>
 > 'MY_ROW', "pcf:\x0A", "\x00") the system flow is getting a valid cell
at>
 > 1st step itself and that getting passed to the Filter. The Filter is
doing>
 > a seek which just avoids all the in between deletes and puts
processing..>
 > In 1st case the Filter wont get into action at all unless the scan flow>
 > sees a valid cell. The delete processing happens as 1st step before the>
 > filter processinf step happening.>
 >
 > In this case I am wondering why you can not add the specific 1st
qualifier>
 > in the get part itself along with the column range filter. I mean>
 >
 > get 'mytable', 'MY_ROW', {COLUMN=>['pcf: *1499000 * '],>
 > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>
 > true, Bytes.toBytes(1499010.to_java(:int)), false)}>
 >
 > Pardon the syntax it might not be proper for the shell.. Can this be
done?>
 > This will make the scan to make a seek to the given qualifier at 1st
step>
 > itself.>
 >
 > Anoop>
 >
 > On Thu, Sep 12, 2019 at 10:18 PM Udai Bhan Kashyap (BLOOMBERG/
PRINCETON) <>
 > [EMAIL PROTECTED]> wrote:>
 >
 > > Are you keeping the deleted cells? Check 'VERSIONS' for the column
family>
 > > and set it to 1 if you don't want to keep the deleted cells.>
 > >>
 > > From: [EMAIL PROTECTED] At: 09/12/19 12:40:01To:>
 > > [EMAIL PROTECTED]>
 > > Subject: Re: HBase Scan consumes high cpu>
 > >>
 > > Hi,>
 > >>
 > > As said earlier, we have populated the rowkey "MY_ROW" with integers>
 > > from 0 to 1500000 as column qualifiers. Then we have deleted the>
 > > qualifiers from 0 to 1499000.>
 > >>
 > > We executed the following query. It took 15.3750 seconds to execute.>
 > >>
 > > hbase(main):057:0> get 'mytable', 'MY_ROW', {COLUMN=>['pcf'],>
 > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>
 > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>
 > > COLUMN CELL>
 > > pcf:\x00\x16\xDFx timestamp=1568123881899,>
 > > value=\x00\x16\xDFx>
 > > pcf:\x00\x16\xDFy timestamp=1568123881899,>
 > > value=\x00\x16\xDFy>
 > > pcf:\x00\x16\xDFz timestamp=1568123881899,>
 > > value=\x00\x16\xDFz>
 > > pcf:\x00\x16\xDF{ timestamp=1568123881899,>
 > > value=\x00\x16\xDF{>
 > > pcf:\x00\x16\xDF| timestamp=1568123881899,>
 > > value=\x00\x16\xDF|>
 > > pcf:\x00\x16\xDF} timestamp=1568123881899,>
 > > value=\x00\x16\xDF}>
 > > pcf:\x00\x16\xDF~ timestamp=1568123881899,>
 > > value=\x00\x16\xDF~>
 > > pcf:\x00\x16\xDF\x7F timestamp=1568123881899,>
 > > value=\x00\x16\xDF\x7F>
 > > pcf:\x00\x16\xDF\x80 timestamp=1568123881899,>
 > > value=\x00\x16\xDF\x80>
 > > pcf:\x00\x16\xDF\x81 timestamp=1568123881899,>
 > > value=\x00\x16\xDF\x81>
 > > 1 row(s) in 15.3750 seconds>
 > >>
 > >>
 > > Now we inserted a new column with qualifier 10 (\x0A), such that it>
 > > comes earlier in lexicographical order. Now we executed the same
query.>
 > > It only took 0.0240 seconds.>
 > >>
 > > hbase(main):058:0> put 'mytable', 'MY_ROW', "pcf:\x0A", "\x00">
 > > 0 row(s) in 0.0150 seconds>
 > > hbase(main):059:0> get 'mytable', 'MY_ROW', {COLUMN=>['pcf'],>
 > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>
 > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>
 > > COLUMN CELL>
 > > pcf:\x00\x16\xDFx timestamp=1568123881899,>
 > > value=\x00\x16\xDFx>
 > > pcf:\x00\x16\xDFy timestamp=1568123881899,>
 > > value=\x00\x16\xDFy>
 > > pcf:\x00\x16\xDFz timestamp=1568123881899,>
 > > value=\x00\x16\xDFz>
 > > pcf:\x00\x16\xDF{ timestamp=1568123881899,>
 > > value=\x00\x16\xDF{>
 > > pcf:\x00\x16\xDF| timestamp=1568123881899,>
 > > value=\x00\x16\xDF|>
 > > pcf:\x00\x16\xDF} timestamp=1568123881899,>
 > > value=\x00\x16\xDF}>
 > > pcf:\x00\x16\xDF~ timestamp=1568123881899,>
 > > value=\x00\x16\xDF~>
 > > pcf:\x00\x16\xDF\x7F timestamp=1568123881899,>
 > > value=\x00\x16\xDF\x7F>
 > > pcf:\x00\x16\xDF\x80 timestamp=1568123881899,>
 > > value=\x00\x16\xDF\x80>
 > > pcf:\x00\x16\xDF\x81 timestamp=1568123881899,>
 > > value=\x00\x16\xDF\x81>
 > > 1 row(s) in 0.0240 seconds>
 > > hbase(main):060:0>>
 > >>
 > >>
 > > We were able to reproduce the result consistently same, the pattern>
 > > being bulk insert followed by bulk delete of most of the earlier
columns.>
 > >>
 > >>
 > > We observed the following behaviour while debugging the StoreScanner>
 > > (regionserver).>
 > >>
 > > Case 1:>
 > >>
 > > 1. When StoreScanner.next() is called, it starts to iterate over the>
 > > cells from the start of the rowkey.>
 > >>
 > > 2. As all the cells are deleted (from 0 to 1499000), we could see>
 > > alternate delete and put type cells. Now, the>
 > > NormalUserScanQueryMatcher.match() returns>
 > > ScanQueryMatcher.MatchCode.SKIP and>
 > > ScanQueryMatcher.MatchCode.SEEK_NEXT_COL for Delete and Put type cell>
 > > respectively. This iteration happens throughout the range of 0 to
1499000.>
 > >>
 > > 3. This happens until a valid Put type cell is encountered, where the>
 > > matcher applies the ColumnRangeFilter to the cell, which in turm
returns>
 > > ScanQueryMatcher.MatchCode.SEEK_NEXT_USING_HINT. In the next
iteration>
 > > it seeks directly to the desired column.>
 > >>
 > >>
 > > Case 2:>
 > >>
 > > 1. When StoreScanner.next() is called, it starts to iterate over the>
 > > cells from the start of the rowkey.>
 > >>
 >
  Solvannan R M 2019-09-12, 16:39
  Udai Bhan Kashyap 2019-09-12, 16:48
  Solvannan R M 2019-09-16, 17:24
  ramkrishna vasudevan 2019-09-17, 05:21
  Manjeet Singh 2019-09-18, 17:06
  Solvannan R M 2019-09-18, 16:52