Slightly less "hackish" way to do this without joins is to write custom UDF
that will take data.BLOCK__OFFSET__INSIDE__FILE as input parameter and
return the corresponding data from the small file. If you mark it
"deterministic" using @UDFType(deterministic = true), the performance
should be quite good.

To avoid the full table scan, partitioning is IMHO the best way to speed
things up.

