Time-series Data Compressor

This is a repository with prototype of time-series compressor loosely based on ideas described in the paper Gorilla: A Fast, Scalable, In-Memory Time Series Database (a copy included into this repository)

The main motivation was to investigate the effect of:

double delta compression
XOR compression for distant timestamps

Current implementation is a research prototype (compresses data using double deltas and XOR binary compression for float numbers)

Installation

python3 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt

Testing

pytest
python3 compress.py -i data/stock_data.json -o data/stock_data.bin

The output will show some compression statistics similar to the next one (stat on number of key and delta records, plus memory overhead for string cache and schema):

string cache : 120.0 bytes
schema block : 3144.0 bytes
key record : 1682.125 bytes
delta record : 26319.375 bytes
total 31265.5 bytes
string cache : 7 block(s), avg size = 17.142857142857142 bytes/block
schema block : 2 block(s), avg size = 1572.0 bytes/block
key record : 30 block(s), avg size = 56.07083333333333 bytes/block
delta record : 677 block(s), avg size = 38.876477104874446 bytes/block

One block is roughly equivalent to one line of original data in the input file. The average size of json line is 305 bytes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bitbuffer.py		bitbuffer.py
bitbuffer_test.py		bitbuffer_test.py
cache.py		cache.py
cache_test_schemas.py		cache_test_schemas.py
cache_test_strings.py		cache_test_strings.py
compress.py		compress.py
datablock.py		datablock.py
generate_test_records.py		generate_test_records.py
reader.py		reader.py
record.py		record.py
record_test.py		record_test.py
recordbuffer.py		recordbuffer.py
recordbuffer_test.py		recordbuffer_test.py
requirements.txt		requirements.txt
schema.py		schema.py
schema_test.py		schema_test.py
serializable.py		serializable.py
transform.py		transform.py
utils.py		utils.py
utils_test.py		utils_test.py
writer.py		writer.py
writer_test.py		writer_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Time-series Data Compressor

Installation

Testing

About

Releases

Packages

Languages

License

akaliutau/timeseries-compressor

Folders and files

Latest commit

History

Repository files navigation

Time-series Data Compressor

Installation

Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages