Great question. Kiji is more strongly typed than systems like MongoDB.
While your schema can evolve (using Avro evolution) without structurally
updating existing data, you still need to specify your Avro schemas in a
data dictionary. It's challenging to author systems in Java (as is typical
of HBase/HDFS/MapReduce-facing applications) without some strong typing in
the persistence layer. You wind up reading a lot of other peoples' code to
figure out what types were written -- assuming you can find the code (or
the hbase columns) in the first place.
You can create table schemas either "manually" by filling out a JSON /
Avro-based table layout specification, or you can use the DDL shell which
lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the
table's set up, then you can write to it. I think the DDL shell included
with the bento box makes this a reasonably low-overhead process.
We don't currently have any Pig integration. We've made some initial
proof-of-concept progress on a StorageHandler that lets Hive query Kiji,
but it's not in a ready state yet. Someone (you? :) could write a Pig
integration; Pig already supports Avro I think. And you could even make it
analyze the first output tuple and use that to infer types/column names to
set up a result table with the appropriate table schema by invoking the DDL
Sorry I don't have a "magic wand" answer for you -- for the use cases we
target, these sorts of setup costs often pay off in the long run, so that's
the case we've optimized the design around. Let me know if there's anything
else I can help with.
On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: