Benchmark to measure ingest throughput of a realtime database.
A real-time database is one that can sustain a high write rate of new incoming data, while at the same time allow applications to make decisions based on fresh data. There could be a time lag between when the data is written to the database and when it is visible in a query. This is called the data latency, or end-to-end latency, of the database. The data latency is different from a query latency, which is what is typically used to measure the latency of querying a database.
Data latency is one of the distinguishing factors that differentiates one real-time database from another. It is an important measure for developers of low-latency applications, like real-time personalization, IoT automation and security analytics, where speed is critical.
RockBench measures the data latency of any real-time database. It is designed to continuously stream documents in batches of fixed size to a database and also calculate and report the data latency by querying the database at fixed intervals.
The goal for rockbench: have a representative way to generate real life event data with the following characteristics:
- Streaming Writes: records are written in streaming fashion
- Nested Objects: records have nested objects with arrays of nested objects inside it
- Mutable records: support both updates and inserts
You can run this directly, or through Docker container.
- Clone the repository
git clone https://github.com/rockset/rockbench.git
- To run directly
# Build
go build
# Send data to Rockset and report data latency
ROCKSET_API_KEY=xxxx ROCKSET_COLLECTION=yyyy WPS=1 BATCH_SIZE=50 DESTINATION=Rockset TRACK_LATENCY=true ./rockbench
# Send data to ElasticSearch and report data latency
ELASTIC_AUTH="ApiKey xxx" ELASTIC_URL=https://... ELASTIC_INDEX=index_name WPS=1 BATCH_SIZE=50 DESTINATION=Elastic TRACK_LATENCY=true ./rockbench
# Send data to Snowflake and report data latency
SNOWFLAKE_ACCOUNT=xxxx SNOWFLAKE_USER=xxxx SNOWFLAKE_PASSWORD=xxxx SNOWFLAKE_WAREHOUSE=xxxx SNOWFLAKE_DATABASE=xxxx SNOWFLAKE_STAGES3BUCKETNAME=xxxx AWS_REGION=xxxx WPS=1 BATCH_SIZE=50 TRACK_LATENCY=true DESTINATION=Snowflake ./rockbench
- To run with Docker container
docker build -t rockset/write_generator .
docker run -e [env variable as above] rockset/write_generator
RockBench can also measure the speed of patches.
mode | operation |
---|---|
add | Perform strictly inserts (using either id scheme) |
patch | Perform patches on id range specified from [0, NUM_DOCS) |
add_then_patch | Perform add mode then patch mode |
Setting NUM_DOCS
to a non-negative value will limit the number of writes made and then perform patches against that
document set.
Patch mode must be explicitly enabled via MODE=patch
or MODE=add_then_patch
and the patches per second is controlled
via PPS
.
PPS
== WPS
unless PPS
is specified.
BATCH_SIZE
is used for both patching and inserting.
Each patch will update a timestamp field for latency detection and also one other field/array in the document.
Patches can take on various forms, currently
- replace: replaces random fields with roughly equivalent type and similar size
- add: Adds new top level fields and prepends entries into the top level tags array
Specify PATCH_MODE
as either 'replace' or 'add'. Default will be 'replace'.
You can also specify the _id
scheme for Rockset destination to be either uuid
or sequential
(increasing sequential
numbers) using ID_MODE
Implement the Destination interface and provide the appropriate configs required. Check Rockset and Elastic for reference. The interface has two methods:
SendDocument
: Method to send batch of documents to the destinationGetLatestTimestamp
: Fetch the latest timestamp from the database
Once the new source is implemented, handle it in main.go.