A JSON to Arrow IPC converter and Pulsar publishing tool.
- Bolson receives newline-separated JSONs over a TCP connection.
- The JSONs are converted to Arrow RecordBatches.
- The Arrow RecordBatches are serialized to an Arrow IPC message.
- The IPC messages are published to a Pulsar broker.
- The implementation aims to achieve high throughput and low latency.
- The implementation allows using FPGA accelerators for more performance.
To build Bolson, make sure your system adheres to the following requirements:
- Toolchain:
- CMake 3.14+
- A C++17 compiler.
- Dependencies:
- Arrow 3.0.0
- When building from source, run
cmake
with-DARROW_JSON=ON
.
- When building from source, run
- Pulsar 2.7.0
- Arrow 3.0.0
Build Bolson as follows:
git clone https://github.com/teratide/bolson.git
cd bolson
mkdir build
cd build
cmake ..
make
There are two subcommands, stream
and bench
.
More detailed options can be found by running:
bolson --help <subcommand>
To enable FPGA-accelerated parsing, continue to read here.
- Why is it called Bolson?
- The name is inspired by the "Bolson Pupfish", which sounds a bit like " JSON publish". It's a working title.