Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate the overhead of building the hash of file locations #9504

Closed
wants to merge 35 commits into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Feb 4, 2022

Summary:
The patch builds on the refactoring done in #9494
and improves the performance of building the hash of file
locations in VersionStorageInfo in two ways. First, the hash
building is moved from AddFile (which is called under the DB mutex)
to a separate post-processing step done as part of PrepareForVersionAppend
(during which the mutex is not held). Second, the space necessary
for the hash is preallocated to prevent costly reallocation/rehashing
operations. These changes mitigate the overhead of the file location hash,
which can be significant with certain workloads where the baseline CPU usage
is low (see #9351,
which is a workload where keys are sorted, WAL is turned
off, the vector memtable implementation is used, and there are lots of small
SST files).

Fixes #9351

Test Plan:
make check

numactl --interleave=all ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/ltamasi-dbbench --wal_dir=/data/ltamasi-dbbench --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --disable_wal=1 --seed=<some_seed>

Final statistics before this patch:

Cumulative writes: 0 writes, 697M keys, 0 commit groups, 0.0 writes per commit group, ingest: 283.25 GB, 241.08 MB/s
Interval writes: 0 writes, 1264K keys, 0 commit groups, 0.0 writes per commit group, ingest: 525.69 MB, 176.67 MB/s

With the patch:

Cumulative writes: 0 writes, 759M keys, 0 commit groups, 0.0 writes per commit group, ingest: 308.57 GB, 262.63 MB/s
Interval writes: 0 writes, 1555K keys, 0 commit groups, 0.0 writes per commit group, ingest: 646.61 MB, 215.11 MB/s

…ateAccumulatedStats if update_stats flag is set
…ateAccumulatedStats if update_stats flag is set
@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment on lines 5198 to 5206
for (size_t i = 0; i < new_last_level.size(); ++i) {
const FileMetaData* const meta = new_last_level[i];
assert(meta);

const uint64_t file_number = meta->fd.GetNumber();

vstorage->file_locations_[file_number] =
VersionStorageInfo::FileLocation(new_levels - 1, i);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not update the index after ReduceNumberOfLevels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be done by PrepareForVersionAppend, which is done as part of the LogAndApply call below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Thanks for the clarification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some comment about the index not consistent at this point will be informative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this a bit more and realized that this method is only called by ldb reduce_levels, and the DB is not even open during the call (which I suppose is why manipulating the current version directly is "safe"). Still, I tend to think it would be nice to clear the index here and let LogAndApply and friends rebuild it from scratch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second (third? fourth?) thought, LogAndApply creates a new Version by "applying" an empty VersionEdit, so the contents of the index in the original Version do not matter at all. So we might as well leave this as-is for consistency

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fillseq throughput is 13% slower from PR 6862
3 participants