Yes, that's correct. I've never tried bulk loading from S3 on 2.x
> Thanks for the info, Austin. I'm guessing that's how 1.x works since
> you mention EMR?
>
> I think this code has changed in 2.x with the SecureBulkLoad stuff
> moving into "core" (instead of external as a coproc endpoint).
>
> On 11/12/19 10:39 AM, Austin Heyne wrote:
>> Sorry for the late reply. You should be able to bulk load files from
>> S3 as it will detect that they're not the same filesystem and have
>> the regionservers copy the files locally and then up to HDFS. This is
>> related to a problem I reported a while ago when using HBase on S3
>> with EMR.
>>
>>
https://issues.apache.org/jira/browse/HBASE-20774>>
>> -Austin
>>
>> On 11/1/19 8:04 AM, Wellington Chevreuil wrote:
>>> Ah yeah, didn't realise it would assume same FS, internally. Indeed,
>>> no way
>>> to have rename working between different FSes.
>>>
>>> Em qui, 31 de out de 2019 às 16:25, Josh Elser <[EMAIL PROTECTED]>
>>> escreveu:
>>>
>>>> Short answer: no, it will not work and you need to copy it to HDFS
>>>> first.
>>>>
>>>> IIRC, the bulk load code is ultimately calling a filesystem rename
>>>> from
>>>> the path you provided to the proper location in the hbase.rootdir's
>>>> filesystem. I don't believe that an `fs.rename` is going to work
>>>> across
>>>> filesystems because you can't do this atomically, which HDFS
>>>> guarantees
>>>> for the rename method [1]
>>>>
>>>> Additionally, for Kerberos-secured clusters, the server-side bulk load
>>>> logic expects that the filesystem hosting your hfiles is HDFS (in
>>>> order
>>>> to read the files with the appropriate authentication). This fails
>>>> right
>>>> now, but is something our PeterS is looking at.
>>>>
>>>> [1]
>>>>
>>>>
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29 >>>>
>>>>
>>>> On 10/31/19 6:55 AM, Wellington Chevreuil wrote:
>>>>> I believe you can specify your s3 path for the hfiles directly, as
>>>>> hdfs
>>>>> FileSystem does support s3a scheme, but you would need to add your s3
>>>>> access and secret key to your completebulkload configuration.
>>>>>
>>>>> Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
>>>>> [EMAIL PROTECTED]> escreveu:
>>>>>
>>>>>> If I have Hfiles stored in S3, can I run CompleteBulkLoad and
>>>>>> provide an
>>>>>> S3 Endpoint to run a single command, or do I need to first copy
>>>>>> the S3
>>>>>> Hfiles to HDFS first? The documentation is not very clear.
>>>>>>