Parquet is faster especially if you are only looking for a subset of json objects. Every JSON key / array is treated as a column.
With that said creating parquet from JSON is not bullet proof if you have really complex json which may have NULL values or many optional keys (Drill can't figure out what data type a NULL JSON value is and has trouble merging optional keys after sampling the first 20,000? records)
If you are creating parquet you should be using the parquet libraries to define a consistent schema first. I've pretty much given up trying to create parquet from json which always ends in index out of bound (server crashing) errors when trying to query parquet.
From: Ted Dunning [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 11, 2018 4:47 AM
To: user <[EMAIL PROTECTED]>
Subject: Re: Which perform better JSON or convert JSON to parquet format ?
Yes. Drill is good at JSON.
But Parquet will be faster during a scan.
Faster may be better. Or other things may be more important.
You have to decide what is important to you. The great virtue of drill is that you have the choice.
On Mon, Jun 11, 2018 at 11:06 AM Divya Gehlot <[EMAIL PROTECTED]>
This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers
for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations
© 2018 BlackRock, Inc. All rights reserved.