I think the null pointer exception happens due to some issue in my new
writer (which used my implementation of the ByteBuffer writable
interface)...let me narrow it down first.

The basic code, that does not use my writer's implementation, seems to
work. This is the code which is at github. I did not push the new writer
implementation yet.


On 20 Dec 2017 14:51, "Animesh Trivedi" <[EMAIL PROTECTED]> wrote:

Wes, Emilio, Siddharth - many thanks for helpful replies and comments !

I managed to upgrade the code to 0.8 API. I have to say that 0.8 API is
much more intuitive ;)  I will summarize my code example with some
documentation in a blog post soon (and post it here too).

- Is there 1st class support to read/write files to HDFS files?
Because FSData[Output/Input]Stream from HDFS do not implement
[Read/Writeable]ByteChannel interfaces required to instantiate ArrowFile
readers and writers. I already implemented something for me that works but
am wondering if it does not make sense to have these facilities as
utilities in the Arrow code?

However, my example code runs fine on a small example of 10 rows with
multiple batches. But it fails to read for anything larger. I have not
verified if it was working for 0.7 version or at what row count it starts
to fail. The writes are fine as far as I can tell. For example, I am
writing and then reading TPC-DS data (store_sales table with int, long, and
doubles) and I get

Reading the arrow file : ./store_sales.arrow
File size : 3965838890 schema is Schema<ss_sold_date_sk: Int(32, true),
ss_sold_time_sk: Int(32, true), ss_item_sk: Int(32, true), ss_customer_sk:
Int(32, true), ss_cdemo_sk: Int(32, true), ss_hdemo_sk: Int(32, true),
ss_addr_sk: Int(32, true), ss_store_sk: Int(32, true), ss_promo_sk: Int(32,
true), ss_ticket_number: Int(64, true), ss_quantity: Int(32, true),
ss_wholesale_cost: FloatingPoint(DOUBLE), ss_list_price:
FloatingPoint(DOUBLE), ss_sales_price: FloatingPoint(DOUBLE),
ss_ext_discount_amt: FloatingPoint(DOUBLE), ss_ext_sales_price:
FloatingPoint(DOUBLE), ss_ext_wholesale_cost: FloatingPoint(DOUBLE),
ss_ext_list_price: FloatingPoint(DOUBLE), ss_ext_tax:
FloatingPoint(DOUBLE), ss_coupon_amt: FloatingPoint(DOUBLE), ss_net_paid:
FloatingPoint(DOUBLE), ss_net_paid_inc_tax: FloatingPoint(DOUBLE),
ss_net_profit: FloatingPoint(DOUBLE)>
Number of arrow blocks are 19
        at org.apache.arrow.vector.ipc.message.MessageSerializer.
        at org.apache.arrow.vector.ipc.message.MessageSerializer.
        at org.apache.arrow.vector.ipc.ArrowFileReader.readRecordBatch(
        at org.apache.arrow.vector.ipc.ArrowFileReader.loadNextBatch(
        at org.apache.arrow.vector.ipc.ArrowFileReader.loadRecordBatch(
        at com.github.animeshtrivedi.arrowexample.ArrowRead.
        at com.github.animeshtrivedi.arrowexample.ArrowRead.main(
Some context, the file size is 3965838890 bytes and the schema read from
the file is correct. The code where it fails is doing something like:

        System.out.println("File size : " + arrowFile.length() + " schema
is "  + root.getSchema().toString());
        List<ArrowBlock> arrowBlocks = arrowFileReader.getRecordBlocks();
        System.out.println("Number of arrow blocks are " +
        for (int i = 0; i < arrowBlocks.size(); i++) {
            ArrowBlock rbBlock = arrowBlocks.get(i);
            if (!arrowFileReader.loadRecordBatch(rbBlock)) {
                throw new IOException("Expected to read record batch");

the stack comes from here: https://github.com/animeshtrivedi/ArrowExample/

Any idea what might be happening?


On Tue, Dec 19, 2017 at 7:03 PM, Siddharth Teotia <[EMAIL PROTECTED]>