Erlang implementation of the Apache Arrow in-memory columnar format.
As of right now, serde_arrow
only provides serialization (write) of Erlang
data structures into to Arrow. Support for deserialization (read) will be added
soon.
We provide support for the Apache Arrow Columnar Format and the Apache Arrow IPC Format. Support for Flight RPC, Flight SQL, as well conversion of Arrow into other formats like Apache Parquet, Apache Avro, CSV and JSON is out of the scope of the project.
In addition to an Erlang installation, you will need a Rust installation with
cargo
. You can then add the following to your rebar.config:
{serde_arrow, {git, "https://github.com/Benjamin-Philip/serde_arrow.git"}}
And compile!
$ rebar3 compile
This implementation is still a work in progress. As mentioned earlier, we do not have read functionality as of right now, only write.
We support the following primitive data types:
- Int 8/16/32/64
- UInt 8/16/32/64
- Float 32/64
- Fixed Size Binary
- Binary
- Large Binary
and the following nested data types:
- Fixed Size List
- List
- Large List
support for the other data types (both primitive and nested) will be added soon.
Currently we support all the 3 "formats":
- Encapsulated Message Format
- Stream Format
- File Format
and the following message types:
- Schema
- RecordBatch
Support for the following will be added shortly:
- Buffer compression
- Endianness conversion
- Custom schema metadata
Support for the following will be added post v0.1.0:
- Dictionaries
- Replacement dictionaries
- Delta dictionaries
- Tensors
- Sparse Tensors