David –


The entirety of both data corpuses were re-loaded every night?  What did the users do while the data was dropped and reloaded?  What happened in the middle of night if the job failed?  Couldn’t you identify the incremental updates to the two sources and incrementally load the new data into the combined target?


This brute force implementation is only applicable to a few use cases with lax SLAs.



HYPERLINK "http://www.oracle.com/"Oracle

Innovative technologies enabling the worlds best intelligence

Chuck Adams 

Vice President Technical Leadership Team
National Security Group
1910 Oracle Way
Reston, VA 20190

Cell:     301.529.9396  




From: David Medinets [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 10, 2014 2:56 PM
To: accumulo-user
Subject: Re: How does Accumulo compare to HBase


Last year, I used Accumulo's rapid ingest ability to join two data silos into one dataset. Every field was fully indexed. Having all of the data in one place allowed cross-referencing queries to be executed. For various reason, this kind of query was not possible using the existing technology. The rapid ingest was important because a new copy of the data silos was pulled every night.


On Thu, Jul 10, 2014 at 1:55 PM, Sean Busbey <HYPERLINK "mailto:[EMAIL PROTECTED]" \[EMAIL PROTECTED]> wrote:




It would help the community and my own benchmarking efforts if you could describe how you think a benchmark might incorporate representations of real-world bottlenecks.


Do you think YCSB sufficiently covers the kind of testing you'd prefer?





Similarly, it would help if you could describe the use case(s) behind your statement of interest.





On Thu, Jul 10, 2014 at 12:11 PM, Marc Parisi <HYPERLINK "mailto:[EMAIL PROTECTED]" \[EMAIL PROTECTED]> wrote:

I care 


On Thu, Jul 10, 2014 at 11:33 AM, Chuck Adams <HYPERLINK "mailto:[EMAIL PROTECTED]" \[EMAIL PROTECTED]> wrote:

Dr. Kepner,

Who cares how fast you can load data into a non-indexed HBase or Accumulo database?  What is the strategy to handle user queries against this corpus?  Run some real world tests which include simultaneous queries, index maintenance, information life cycle management, during the initial and incremental data loads.

The test you are running do not appear to have any of the real world bottlenecks that occur for production systems that users rely on for their business.

Chuck Adams 

Vice President Technical Leadership Team
Oracle National Security Group
1910 Oracle Way
Reston, VA 20190

Cell:     HYPERLINK "tel:301.529.9396" \n301.529.9396