Base implementation of RocksDB support#1416
Conversation
| def __delitem__(self, key: bytes) -> None: | ||
| v = self.db.get(key) | ||
| if v is None: | ||
| raise KeyError(key) |
There was a problem hiding this comment.
Without this the leveldb backend isn't fully compliant with the BaseDB api.
| def test_set_and_get(memory_db, level_db): | ||
| level_db.set(b'1', b'1') | ||
| memory_db.set(b'1', b'1') | ||
| assert level_db.get(b'1') == memory_db.get(b'1') |
There was a problem hiding this comment.
These were removed as it appeared all of these tests are covered by test_base_db_api.py
| def run_database_process(trinity_config: TrinityConfig) -> None: | ||
| with trinity_config.process_id_file('database'): | ||
| base_db = db_class(db_path=trinity_config.database_dir) | ||
| base_db = trinity_config.db_class(db_path=trinity_config.database_dir) |
There was a problem hiding this comment.
I think rocks and level use names that prevent filepath collisions, but worth double checking.
There was a problem hiding this comment.
I hadn't gotten this far but yes, we should be very sure that we cannot re-use a database directory that was previously used with another backend.
I thought about fixing this by namespacing the directory based on the backend but... then we could get into a situation where the user switches the backend midstream and effectively has two versions of the same chain.
So maybe just a small file we place somewhere to be able to check what database is being used.
There was a problem hiding this comment.
TODO: make sure leveldb and rocksdb databases cannot use the same root directory
| sudo apt-get install -y liblz4-dev libsnappy-dev libgflags-dev zlib1g-dev libbz2-dev libzstd-dev | ||
| if [ ! -f "/root/project/rocksdb/librocksdb.a" ]; then | ||
| git clone https://github.com/facebook/rocksdb | ||
| cd rocksdb/ && git checkout v5.8.8 && make install-shared INSTALL_PATH=/usr |
There was a problem hiding this comment.
Haha, where are those bash aliases when you need them?
|
|
||
|
|
||
| def test_database_api_missing_key_for_deletion(db): | ||
| db.delete(b'does-not-exist') |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
(After CircleCI kicks off and goes green, of course, plus some commit squashing) |
cc8a176 to
0b61dd2
Compare
| def run_database_process(trinity_config: TrinityConfig) -> None: | ||
| with trinity_config.process_id_file('database'): | ||
| base_db = db_class(db_path=trinity_config.database_dir) | ||
| base_db = trinity_config.db_class(db_path=trinity_config.database_dir) |
There was a problem hiding this comment.
TODO: make sure leveldb and rocksdb databases cannot use the same root directory
|
Alright, last thing is to ensure that the rocksdb build is properly cached (since it takes about 8-9 minutes to build and we don't want to do that in every ci run). |
767c1a9 to
57d2164
Compare
|
I CURRENTLY HATE EVERYTHING. The |
1c18f54 to
0b1a67b
Compare
carver
left a comment
There was a problem hiding this comment.
Mostly just nitpicks/questions about naming, yagni, docs. GTG
The tox -r change is the one I'd have to think about more if you want to keep as is in the PR.
| @@ -1,4 +1,4 @@ | |||
| version: 2.0 | |||
| version: 2.1 | |||
There was a problem hiding this comment.
Just curious, what required this upgrade?
There was a problem hiding this comment.
It's superfluous. I should remove it.
| geth version | ||
| - run: | ||
| name: run tox | ||
| command: ~/.local/bin/tox -r |
There was a problem hiding this comment.
I think this was helping us quickly catch when a live change in a dependency was causing problems with py-evm. Was the change for performance? How much shorter was the test?
There was a problem hiding this comment.
Another thing that didn't get reverted. will remove.
| #!/usr/bin/env bash | ||
|
|
||
| set -e | ||
| set -u |
There was a problem hiding this comment.
I find the long-form nicer for readability, like:
set -o errexit
set -o nounset
set -o pipefailThere was a problem hiding this comment.
TIL there is a longform.
|
|
||
| .. code:: sh | ||
|
|
||
| apt-get install liblz4-dev lib-rocksdb5.8 |
There was a problem hiding this comment.
Did this doc get stale? I suppose there's a reason you had to switch circle to use:
sudo apt-get install -y liblz4-dev libsnappy-dev libgflags-dev zlib1g-dev libbz2-dev libzstd-devThere was a problem hiding this comment.
Not stale. Requirements for rocksdb installation, will double check but I believe these are directly from their readme.
On a related note, I think our installation story is about to get ungood and we should look into debian packages and/or brew. We're gonna have to do it eventually...
| self.db.delete(key) | ||
|
|
||
| @contextmanager | ||
| def atomic_batch(self) -> Generator['RocksDBWriteBatch', None, None]: |
There was a problem hiding this comment.
Iterator['RocksDBWriteBatch']?
There was a problem hiding this comment.
likely copy/paste from LevelDB. Will change.
|
|
||
| @property | ||
| def is_db_backend_rocksdb(self) -> bool: | ||
| return self.db_backend == DB_ROCKS |
There was a problem hiding this comment.
I want to say YAGNI about these is_db_b... properties, and go back to just running these lines in the if test internally. Do we lose much by dropping the methods?
There was a problem hiding this comment.
I wrote these because I don't want to have to import the DB_ROCKS and DB_LEVEL constants in the places that these checks happen. I ended up doing the same with is_full_sync and is_fast_sync. Unless you feel strongly I'd like to leave them.
|
|
||
| # Verify that the database engine that trinity is configured to use matches | ||
| # the existing on-disk engine. | ||
| if trinity_config.db_backend != trinity_config.on_disk_database_engine: |
There was a problem hiding this comment.
Since both db_backend and database_engine are storing those constants in {'rocks', 'level'}, can we unify the names a bit? I think I like engine better, because backend implies to me that I'm going to get a LevelDB or LevelDB() back instead of 'level'.
So maybe config.db_engine and config.on_disk_db_engine?
| ) | ||
|
|
||
|
|
||
| DATABASE_ENGINE_LOCK_NAME = '.trinity.db_engine.lock' |
There was a problem hiding this comment.
I see what you're saying with lock, but also I'm used to lock files being things that should go away after shutdown.
Maybe?
.trinity.db_engine.freeze.trinity.db_engine.frozen.trinity.db_engine.initialized.trinity.db_engine.selected
(and all the other places that use lock for this choice)
This reverts commit 61a9a37.
This reverts commit 61a9a37.
What was wrong?
LevelDB doesn't support having multiple processes reading from the same database. We worked around this by having a dedicated database process which has a direct handle on the database and all other processes interact with it over multiprocessing pipes.
This is not ideal. It incurs extra overhead both with respect to performance and code complexity.
How was it fixed?
Implemented a RocksDB backend which uses an alternate underlying database called rocksdb -> https://rocksdb.org/
This PR establishes the database backend class and adds a CLI argument to allow opt-in use of the new database backend. A subsequent PR will tackle removal of the database process entirely.
Cute Animal Picture