Firstly I would like to suggest switching over to Arrow 0.8 release asap
since you are writing JAVA programs and the API usage has changed
drastically. The new APIs are much simpler with good javadocs and detailed
If you are writing stop-gap implementation then it is probably fine to
continue with old version but for long term new API usage is recommended.
- Create an instance of the vector. Note that this doesn't allocate any
memory for the elements in the vector
- Grab the corresponding mutator and accessor objects by calls to
- Allocate memory
- *allocateNew()* - we will allocate memory for default number of
elements in the vector. This is applicable to both fixed width
- *allocateNew(valueCount)* - for fixed width vectors. Use this
method if you have already know the number of elements to store in the
- *allocateNew(bytes, valueCount)* - for variable width vectors. Use
this method if you already know the total size (in bytes) of all the
variable width elements you will be storing in the vector. For
you are going to store 1024 elements in the vector and the total size
across all variable width elements is under 1MB, you can call
- Populate the vector:
- Use the *set() or setSafe() *APIs in the mutator interface. From
Arrow 0.8 onwards, you can use these APIs directly on the vector instance
and mutator/accessor are removed.
- The difference between set() and corresponding setSafe() API is
that latter internally takes care of expanding the vector's buffer(s) for
storing new data.
- Each set() API has a corresponding setSafe() API.
- Do a setValueCount() based on the number of elements you populated in
- Retrieve elements from the vector:
- Use the get(), getObject() APIs in the accessor interface. Again,
from Arrow 0.8 onwards you can use these APIs directly.
- With respect to usage of setInitialCapacity:
- Let's say your application always issues calls to allocateNew(). It
is likely that this will end up over-allocating memory because
it assumes a
default value count to begin with.
- In this case, if you do setInitialCapacity() followed by
allocateNew() then latter doesn't do default memory allocation. It does
exactly for the value capacity you specified in setInitialCapacity().
I would highly recommend taking a look athttps://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
This has lots of examples around populating the vector, retrieving from
vector, using setInitialCapacity(), using set(), setSafe() methods and a
combination of them to understand when things can go wrong.
Hopefully this helps. Meanwhile we will try to add some internal README for
the usage of vectors.
On Tue, Dec 19, 2017 at 8:55 AM, Emilio Lahr-Vivaz <[EMAIL PROTECTED]>