A distributed key-value store
- Leader election and coordination via ZooKeeper
- Read and Writes through leader
- Data to be stored in Log Structured Merge Tree
- GET, PUT and DELETE APIs
- Not available as part of MVP
- Replicas
- Write ahead logs
- Data is read from and written to Memtables
- Once flushed to SSTable, Memtables are emptied
- DELETE marks the data to be collected during compaction
- Role of Compaction in MVP is to take care of updates and deletes - so effectively rewrite the index and data file
- Kazoo - lib for dealing with ZooKeeper
- dynaconf - For managing python dependencies
- Colima - Not a direct dependency, but recommended for local development instead of docker
- Run Zookeeper, feel free to use the containerized version
docker run --name some-zookeeper -p 2181:2181 --restart always -d zookeeper
- To connect to ZooKeeper runnning in a container
docker run -it --rm --link some-zookeeper:zookeeper zookeeper zkCli.sh -server zookeeper
-
Make sure python and pip are installed
-
curl -L -o corecache-0.09.tar.gz https://github.com/gtinside/distributed-key-value-store/archive/refs/tags/0.11.tar.gz
-
tar -xvzf corecache-0.11.tar.gz
-
Run ZooKeeper
-
cd distributed-key-value-store-0.09/scripts
-
```start_server.sh --zooKeeperHost localhost --zooKeeperPort 2181`
- Race condition, if the data is getting inserted into cache and cache becomes qualified for a flush to SSTable.
- Configuration items such as data directory, port range, flush condition should all come from a properties file.
- When getting data from SSTable, only the searched key will be made available in the Memcache
- Only leader can insert the data in cache for now.
- MemTable flush is a stop the world process. If scheduler kicks in, it stop the application to take new data put requests
- If the MemTable (cache) is empty then each index file has to be read to figure out which data file contains the data. This is not bad but there might be a better way to reduce the number of index files that needs to be scanned
- Timestamp on 'Data' is incorrect. Should be the time when key,value was first inserted into cache
- Migrate to Poetry for better dependency management
- Source all properties from a property file
- Compaction loads data from all the files in memory and then merges, updates, deletes and create new SSTable file
- There can be a race condition when both flush to SSTable and Compaction starts at the same time
- There can also be a race condtion when SSTables are being read for a GET and being archived by Compaction at the same time
- Proper error handling across all APIs
- All file manipulations to be migrated to pathlib
- If a Key is marked as deleted =
true
and node crashes before the data is flushed to disk, key will not be deleted. This can be solved by having a WAL.