Skip to content
This repository was archived by the owner on Sep 8, 2025. It is now read-only.

Base implementation of RocksDB support#1416

Merged
pipermerriam merged 6 commits intoethereum:masterfrom
pipermerriam:piper/rocksdb
Oct 24, 2018
Merged

Base implementation of RocksDB support#1416
pipermerriam merged 6 commits intoethereum:masterfrom
pipermerriam:piper/rocksdb

Conversation

@pipermerriam
Copy link
Copy Markdown
Member

What was wrong?

LevelDB doesn't support having multiple processes reading from the same database. We worked around this by having a dedicated database process which has a direct handle on the database and all other processes interact with it over multiprocessing pipes.

This is not ideal. It incurs extra overhead both with respect to performance and code complexity.

How was it fixed?

Implemented a RocksDB backend which uses an alternate underlying database called rocksdb -> https://rocksdb.org/

This PR establishes the database backend class and adds a CLI argument to allow opt-in use of the new database backend. A subsequent PR will tackle removal of the database process entirely.

Cute Animal Picture

put a cute animal picture link inside the parentheses

Comment thread eth/db/backends/level.py
def __delitem__(self, key: bytes) -> None:
v = self.db.get(key)
if v is None:
raise KeyError(key)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the leveldb backend isn't fully compliant with the BaseDB api.

def test_set_and_get(memory_db, level_db):
level_db.set(b'1', b'1')
memory_db.set(b'1', b'1')
assert level_db.get(b'1') == memory_db.get(b'1')
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were removed as it appeared all of these tests are covered by test_base_db_api.py

Comment thread trinity/main.py
def run_database_process(trinity_config: TrinityConfig) -> None:
with trinity_config.process_id_file('database'):
base_db = db_class(db_path=trinity_config.database_dir)
base_db = trinity_config.db_class(db_path=trinity_config.database_dir)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rocks and level use names that prevent filepath collisions, but worth double checking.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't gotten this far but yes, we should be very sure that we cannot re-use a database directory that was previously used with another backend.

I thought about fixing this by namespacing the directory based on the backend but... then we could get into a situation where the user switches the backend midstream and effectively has two versions of the same chain.

So maybe just a small file we place somewhere to be able to check what database is being used.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: make sure leveldb and rocksdb databases cannot use the same root directory

Comment thread .circleci/config.yml Outdated
sudo apt-get install -y liblz4-dev libsnappy-dev libgflags-dev zlib1g-dev libbz2-dev libzstd-dev
if [ ! -f "/root/project/rocksdb/librocksdb.a" ]; then
git clone https://github.com/facebook/rocksdb
cd rocksdb/ && git checkout v5.8.8 && make install-shared INSTALL_PATH=/usr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, where are those bash aliases when you need them?



def test_database_api_missing_key_for_deletion(db):
db.delete(b'does-not-exist')

This comment was marked as resolved.

@carver
Copy link
Copy Markdown
Contributor

carver commented Oct 22, 2018

(After CircleCI kicks off and goes green, of course, plus some commit squashing)

@pipermerriam pipermerriam force-pushed the piper/rocksdb branch 2 times, most recently from cc8a176 to 0b61dd2 Compare October 22, 2018 21:41
Copy link
Copy Markdown
Member Author

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to ensure we don't have rocksdb based databases and leveldb based databases in the same path.

Comment thread trinity/main.py
def run_database_process(trinity_config: TrinityConfig) -> None:
with trinity_config.process_id_file('database'):
base_db = db_class(db_path=trinity_config.database_dir)
base_db = trinity_config.db_class(db_path=trinity_config.database_dir)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: make sure leveldb and rocksdb databases cannot use the same root directory

@pipermerriam
Copy link
Copy Markdown
Member Author

Alright, last thing is to ensure that the rocksdb build is properly cached (since it takes about 8-9 minutes to build and we don't want to do that in every ci run).

@pipermerriam
Copy link
Copy Markdown
Member Author

I CURRENTLY HATE EVERYTHING. The lightchain_integration tests are failing in ways that they shouldn't..... grumble.

Copy link
Copy Markdown
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just nitpicks/questions about naming, yagni, docs. GTG

The tox -r change is the one I'd have to think about more if you want to keep as is in the PR.

Comment thread .circleci/config.yml Outdated
@@ -1,4 +1,4 @@
version: 2.0
version: 2.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what required this upgrade?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's superfluous. I should remove it.

Comment thread .circleci/config.yml
geth version
- run:
name: run tox
command: ~/.local/bin/tox -r
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was helping us quickly catch when a live change in a dependency was causing problems with py-evm. Was the change for performance? How much shorter was the test?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that didn't get reverted. will remove.

Comment thread .circleci/install_rocksdb.sh Outdated
#!/usr/bin/env bash

set -e
set -u
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the long-form nicer for readability, like:

set -o errexit
set -o nounset
set -o pipefail

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL there is a longform.


.. code:: sh

apt-get install liblz4-dev lib-rocksdb5.8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this doc get stale? I suppose there's a reason you had to switch circle to use:

sudo apt-get install -y liblz4-dev libsnappy-dev libgflags-dev zlib1g-dev libbz2-dev libzstd-dev

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not stale. Requirements for rocksdb installation, will double check but I believe these are directly from their readme.

On a related note, I think our installation story is about to get ungood and we should look into debian packages and/or brew. We're gonna have to do it eventually...

Comment thread eth/db/backends/rocks.py Outdated
self.db.delete(key)

@contextmanager
def atomic_batch(self) -> Generator['RocksDBWriteBatch', None, None]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterator['RocksDBWriteBatch']?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely copy/paste from LevelDB. Will change.

Comment thread trinity/config.py Outdated

@property
def is_db_backend_rocksdb(self) -> bool:
return self.db_backend == DB_ROCKS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to say YAGNI about these is_db_b... properties, and go back to just running these lines in the if test internally. Do we lose much by dropping the methods?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote these because I don't want to have to import the DB_ROCKS and DB_LEVEL constants in the places that these checks happen. I ended up doing the same with is_full_sync and is_fast_sync. Unless you feel strongly I'd like to leave them.

Comment thread trinity/main.py Outdated

# Verify that the database engine that trinity is configured to use matches
# the existing on-disk engine.
if trinity_config.db_backend != trinity_config.on_disk_database_engine:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both db_backend and database_engine are storing those constants in {'rocks', 'level'}, can we unify the names a bit? I think I like engine better, because backend implies to me that I'm going to get a LevelDB or LevelDB() back instead of 'level'.

So maybe config.db_engine and config.on_disk_db_engine?

Comment thread trinity/utils/chains.py Outdated
)


DATABASE_ENGINE_LOCK_NAME = '.trinity.db_engine.lock'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying with lock, but also I'm used to lock files being things that should go away after shutdown.

Maybe?

  • .trinity.db_engine.freeze
  • .trinity.db_engine.frozen
  • .trinity.db_engine.initialized
  • .trinity.db_engine.selected

(and all the other places that use lock for this choice)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went with marker

@pipermerriam pipermerriam merged commit 61a9a37 into ethereum:master Oct 24, 2018
@pipermerriam pipermerriam deleted the piper/rocksdb branch October 24, 2018 19:22
cburgdorf added a commit to cburgdorf/py-evm that referenced this pull request Oct 25, 2018
cburgdorf added a commit that referenced this pull request Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants