CoreCache

A distributed key-value store

MVP

Leader election and coordination via ZooKeeper
Read and Writes through leader
Data to be stored in Log Structured Merge Tree
GET, PUT and DELETE APIs
Not available as part of MVP
1. Replicas
2. Write ahead logs
Data is read from and written to Memtables
Once flushed to SSTable, Memtables are emptied
DELETE marks the data to be collected during compaction
Role of Compaction in MVP is to take care of updates and deletes - so effectively rewrite the index and data file

Dependencies

Kazoo - lib for dealing with ZooKeeper
dynaconf - For managing python dependencies
Colima - Not a direct dependency, but recommended for local development instead of docker

Running this locally

Run Zookeeper, feel free to use the containerized version

docker run --name some-zookeeper -p 2181:2181 --restart always -d zookeeper

To connect to ZooKeeper runnning in a container

docker run -it --rm --link some-zookeeper:zookeeper zookeeper zkCli.sh -server zookeeper

Deploying CoreCache

Make sure python and pip are installed
curl -L -o corecache-0.09.tar.gz https://github.com/gtinside/distributed-key-value-store/archive/refs/tags/0.11.tar.gz
tar -xvzf corecache-0.11.tar.gz
Run ZooKeeper
cd distributed-key-value-store-0.09/scripts
```start_server.sh --zooKeeperHost localhost --zooKeeperPort 2181`

Limitations

Race condition, if the data is getting inserted into cache and cache becomes qualified for a flush to SSTable.
Configuration items such as data directory, port range, flush condition should all come from a properties file.
When getting data from SSTable, only the searched key will be made available in the Memcache
Only leader can insert the data in cache for now.
MemTable flush is a stop the world process. If scheduler kicks in, it stop the application to take new data put requests
If the MemTable (cache) is empty then each index file has to be read to figure out which data file contains the data. This is not bad but there might be a better way to reduce the number of index files that needs to be scanned
Timestamp on 'Data' is incorrect. Should be the time when key,value was first inserted into cache
Migrate to Poetry for better dependency management
Source all properties from a property file
Compaction loads data from all the files in memory and then merges, updates, deletes and create new SSTable file
There can be a race condition when both flush to SSTable and Compaction starts at the same time
There can also be a race condtion when SSTables are being read for a GET and being archived by Compaction at the same time
Proper error handling across all APIs
All file manipulations to be migrated to pathlib
If a Key is marked as deleted = true and node crashes before the data is flushed to disk, key will not be deleted. This can be solved by having a WAL.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github/workflows		.github/workflows
compaction		compaction
deployment		deployment
exception		exception
lsmt		lsmt
observability/config		observability/config
scheduler		scheduler
scripts		scripts
server		server
test		test
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
settings.yaml		settings.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoreCache

MVP

Dependencies

Running this locally

Deploying CoreCache

Limitations

About

Releases 8

Packages

Languages

License

gtinside/distributed-key-value-store

Folders and files

Latest commit

History

Repository files navigation

CoreCache

MVP

Dependencies

Running this locally

Deploying CoreCache

Limitations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages