Slightly less "hackish" way to do this without joins is to write custom UDF
that will take data.BLOCK__OFFSET__INSIDE__FILE as input parameter and
return the corresponding data from the small file. If you mark it
"deterministic" using @UDFType(deterministic = true), the performance
should be quite good.
To avoid the full table scan, partitioning is IMHO the best way to speed
On Thu, Jun 27, 2013 at 11:18 AM, Peter Marron <
[EMAIL PROTECTED]> wrote: