Parquet is faster especially if you are only looking for a subset of json objects. Every JSON key / array is treated as a column.

With that said creating parquet from JSON is not bullet proof if you have really complex json which may have NULL values or many optional keys (Drill can't figure out what data type a NULL JSON value is and has trouble merging optional keys after sampling the first 20,000? records)

If you are creating parquet you should be using the parquet libraries to define a consistent schema first. I've pretty much given up trying to create parquet from json which always ends in index out of bound (server crashing) errors when trying to query parquet.

Yes. Drill is good at JSON.

But Parquet will be faster during a scan.

Faster may be better. Or other things may be more important.

You have to decide what is important to you. The great virtue of drill is that you have the choice.

