Have you thought about HBase?

I would suggest that if you're using Hive or Pig, to look at taking these files and putting the JSON records in to a sequence file.
Or set of sequence files.... (Then look at HBase to help index them...) 200KB is small.

That would be the same for either pig/hive.

In terms of SerDes, I've worked w Pig and ElephantBird, its pretty nice. And yes you get each record as a row, however you can always flatten them as needed.

I haven't worked with the latest SerDe, but maybe Dean Wampler or Edward Capriolo could give you a better answer.
Going from memory, I don't know that there is a good SerDe that would write JSON, just read it. (Hive)

IMHO Pig/ElephantBird is the best so far, but then again I may be dated and biased.

I think you're on the right track or at least train of thought.


On Jun 12, 2013, at 7:57 PM, Tecno Brain <[EMAIL PROTECTED]> wrote: