Subject: Completing a bulk load from HFiles stored in S3


Short answer: no, it will not work and you need to copy it to HDFS first.

IIRC, the bulk load code is ultimately calling a filesystem rename from
the path you provided to the proper location in the hbase.rootdir's
filesystem. I don't believe that an `fs.rename` is going to work across
filesystems because you can't do this atomically, which HDFS guarantees
for the rename method [1]

Additionally, for Kerberos-secured clusters, the server-side bulk load
logic expects that the filesystem hosting your hfiles is HDFS (in order
to read the files with the appropriate authentication). This fails right
now, but is something our PeterS is looking at.

[1]
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29

On 10/31/19 6:55 AM, Wellington Chevreuil wrote: