Mitigate the overhead of building the hash of file locations #9504

ltamasi · 2022-02-04T21:20:49Z

Summary:
The patch builds on the refactoring done in #9494
and improves the performance of building the hash of file
locations in VersionStorageInfo in two ways. First, the hash
building is moved from AddFile (which is called under the DB mutex)
to a separate post-processing step done as part of PrepareForVersionAppend
(during which the mutex is not held). Second, the space necessary
for the hash is preallocated to prevent costly reallocation/rehashing
operations. These changes mitigate the overhead of the file location hash,
which can be significant with certain workloads where the baseline CPU usage
is low (see #9351,
which is a workload where keys are sorted, WAL is turned
off, the vector memtable implementation is used, and there are lots of small
SST files).

Fixes #9351

Test Plan:
make check

numactl --interleave=all ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/ltamasi-dbbench --wal_dir=/data/ltamasi-dbbench --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --disable_wal=1 --seed=<some_seed>

Final statistics before this patch:

Cumulative writes: 0 writes, 697M keys, 0 commit groups, 0.0 writes per commit group, ingest: 283.25 GB, 241.08 MB/s
Interval writes: 0 writes, 1264K keys, 0 commit groups, 0.0 writes per commit group, ingest: 525.69 MB, 176.67 MB/s

With the patch:

Cumulative writes: 0 writes, 759M keys, 0 commit groups, 0.0 writes per commit group, ingest: 308.57 GB, 262.63 MB/s
Interval writes: 0 writes, 1555K keys, 0 commit groups, 0.0 writes per commit group, ingest: 646.61 MB, 215.11 MB/s

…ateAccumulatedStats if update_stats flag is set

…st cases

…ateAccumulatedStats if update_stats flag is set

…st cases

facebook-github-bot · 2022-02-04T21:25:12Z

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963 · 2022-02-04T21:56:06Z

db/version_set.cc

-    for (size_t i = 0; i < new_last_level.size(); ++i) {
-      const FileMetaData* const meta = new_last_level[i];
-      assert(meta);
-
-      const uint64_t file_number = meta->fd.GetNumber();
-
-      vstorage->file_locations_[file_number] =
-          VersionStorageInfo::FileLocation(new_levels - 1, i);
-    }


We do not update the index after ReduceNumberOfLevels?

It will be done by PrepareForVersionAppend, which is done as part of the LogAndApply call below.

Oh I see. Thanks for the clarification.

Maybe some comment about the index not consistent at this point will be informative.

Looked into this a bit more and realized that this method is only called by ldb reduce_levels, and the DB is not even open during the call (which I suppose is why manipulating the current version directly is "safe"). Still, I tend to think it would be nice to clear the index here and let LogAndApply and friends rebuild it from scratch.

Actually, on second (third? fourth?) thought, LogAndApply creates a new Version by "applying" an empty VersionEdit, so the contents of the index in the original Version do not matter at all. So we might as well leave this as-is for consistency

facebook-github-bot · 2022-02-05T06:07:48Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-02-05T06:09:21Z

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ltamasi added 30 commits February 2, 2022 09:26

First cut

d19b6bd

Add assertion

458b021

Rename Version::PrepareApply to PrepareAppend

b020e7b

Update comment

c891c57

Add some more details to comment

92f8aa9

Move the ComputeCompensatedSizes call to PrepareAppend, only call Upd…

63bc42c

…ateAccumulatedStats if update_stats flag is set

Introduce a helper VersionStorageInfo::PrepareAppend

cc7ed3f

Use VersionStorageInfo::PrepareAppend in tests

85e607a

Make helper methods private

62e4350

Fix up VersionStorageInfoTest.MaxBytesForLevelStatic

cd7e623

Split VersionStorageInfoTest.MaxBytesForLevelDynamic into multiple te…

481f4b2

…st cases

Apply formatting

762409e

Remove now-unnecessary index building code from ReduceNumberOfLevels

e60982d

Make sure VersionStorageInfo's are finalized in tests

66957ca

Fix/clarify some comments and assertions

2dbceb7

Rename Version::PrepareApply to PrepareAppend

fc632ef

Update comment

fb809f8

Add some more details to comment

a76f4c5

Move the ComputeCompensatedSizes call to PrepareAppend, only call Upd…

a57429d

…ateAccumulatedStats if update_stats flag is set

Introduce a helper VersionStorageInfo::PrepareAppend

ecd0eb2

Use VersionStorageInfo::PrepareAppend in tests

68bdc3e

Fix up VersionStorageInfoTest.MaxBytesForLevelStatic

237ec0b

Split VersionStorageInfoTest.MaxBytesForLevelDynamic into multiple te…

79155ae

…st cases

Call UpdateVersionStorageInfo in version_set_test

c4a888d

No need to explicitly set compensated_file_size in test

5caa26a

Update version_builder_test

a4e655b

Remove unused method from test

bf03f43

OK to set compensated file size

8d08d5e

Make the helpers used by PrepareAppend private

da415cc

Fix up some comments

449b897

ltamasi added 4 commits February 3, 2022 13:05

Merge branch 'vsi_cleanup' into file_location_perf

c43cb52

Merge branch 'main' into file_location_perf

3f73d40

Fix method name in comments

753ac9c

Move method implementation to .cc file

5dbf865

facebook-github-bot added the CLA Signed label Feb 4, 2022

ltamasi requested a review from riversand963 February 4, 2022 21:20

riversand963 reviewed Feb 4, 2022

View reviewed changes

riversand963 approved these changes Feb 4, 2022

View reviewed changes

Revert changes in ReduceNumberOfLevels

8c743ee

facebook-github-bot closed this in 0cc0543 Feb 7, 2022

ltamasi mentioned this pull request May 23, 2022

fillseq throughput is 13% slower from PR 6862 #9351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate the overhead of building the hash of file locations #9504

Mitigate the overhead of building the hash of file locations #9504

ltamasi commented Feb 4, 2022 •

edited

Loading

facebook-github-bot commented Feb 4, 2022

riversand963 Feb 4, 2022

ltamasi Feb 4, 2022

riversand963 Feb 4, 2022

riversand963 Feb 4, 2022

ltamasi Feb 4, 2022

ltamasi Feb 5, 2022

facebook-github-bot commented Feb 5, 2022

facebook-github-bot commented Feb 5, 2022

Mitigate the overhead of building the hash of file locations #9504

Mitigate the overhead of building the hash of file locations #9504

Conversation

ltamasi commented Feb 4, 2022 • edited Loading

facebook-github-bot commented Feb 4, 2022

riversand963 Feb 4, 2022

Choose a reason for hiding this comment

ltamasi Feb 4, 2022

Choose a reason for hiding this comment

riversand963 Feb 4, 2022

Choose a reason for hiding this comment

riversand963 Feb 4, 2022

Choose a reason for hiding this comment

ltamasi Feb 4, 2022

Choose a reason for hiding this comment

ltamasi Feb 5, 2022

Choose a reason for hiding this comment

facebook-github-bot commented Feb 5, 2022

facebook-github-bot commented Feb 5, 2022

ltamasi commented Feb 4, 2022 •

edited

Loading