Refactor iterate_meta_bits #1181

wks · 2024-07-29T13:37:54Z

This PR refactors the mechanism for visiting (reading and/or updating) side metadata in bulk, specifically the function SideMetadataSpec::iterate_meta_bits.

The function now uses a single FnMut callback instead of two Fn callbacks. It uses the enum type BitByteRange to distinguish whole byte ranges from bit ranges in a byte. This allows the user to capture variables mutably in the callback. This also removes the Cell used in find_prev_non_zero_value_fast.

The function is made non-recursive to improve the performance. Some test cases are added to test for corner cases.

The function is moved to a dedicated ranges module and renamed to break_bit_range, for several reasons:

It was a method of SideMetadataSpec, but it does not access any member of SideMetadataSpec.
It needs a non-trivial amount of testing to get corner cases correct, especially after refactoring into a non-recursive function.
Related types and functions can be added to the ranges module in the future.
- Breaking a range of bytes into a range of aligned words and unaligned bytes in the beginning and the end. It will be used by finding VO bits from internal pointers and finding all VO bits in a region (for heap traversal).
- Breaking a range of bytes at chunk boundaries. It will be used by bulk_update_metadata.

wks · 2024-07-31T09:10:44Z

Here are some benchmark results.

size	data bytes	meta bytes	master	master no recursion	this PR	memset
line	256	4	6.1574 ns	4.1684 ns	4.0134 ns	1.1416 ns
block	32768	512	6.5757 ns	4.5108 ns	4.5009 ns	1.7800 ns

The benchmarks can be found in benches/bulk_meta/bzero_bset.rs, and can be invoked by

taskset -c 0 cargo bench bzero_bset --features bench

Note: You need taskset only if your CPU has heterogeneous cores (so-called "performance-cores" and "efficient-cores").

This PR makes iterate_meta_bits faster than the master branch. That's partly because the existing iterate_meta_bits function is recursive, although it is guaranteed to terminate at the second level of recursion. This PR replaced the recursion with base cases and reduced the cost. I modified the master branch (see 24326a2) to make iterate_meta_bits non-recursive using the same algorithm as this PR, and it's almost as fast as this PR (as seen in the "master no recursion" column in the table above). There may be some subtle difference in the generated code after changing two closures into one closure, but the difference between "master no recusrion" and "this PR" is small.

But neither master nor this PR is as fast as calling memset directly to set the meta bits, and the overhead is huge.

I'll try to rebase #1174 on top of this PR and mark both PRs for review if the refactored code is pleasant for implementing heap traversal with.

Let's add it back when implementing VO bit scanning.

qinsoon

The refactoring looks good to me. I have some comments about exposing private functions for testing/benchmarking.

qinsoon · 2024-08-06T07:28:11Z

Cargo.toml


+# This feature is only used for benchmarks.
+# It will expose some private functions for benchmarking.
+bench = []


I suggest renaming this to something like test_private or test_private_items. This would be clear for both the function definitions in mmtk-core, and for the bench/test modules outside mmtk-core.

I found the following code confusing: we are in the bench crate, why do we have the bench feature for some tests but not other tests? A better feature name would make it more clear.

#[cfg(feature = "bench")] mod bulk_meta; #[cfg(feature = "mock_test")] mod mock_bench;

It is also posssible to have mock_test = ["test_private_item"]. In that case, you don't have to change bench_main and you can simply add your benchmark.

I am OK with renaming it to test_private.

The reason for not requiring the "bench" feature for some test cases is because the current mock tests don't require any of the items exposed from the "bench" feature. But it will probably make things clearer to unconditionally require the "bench" or "test_private" feature when doing benchmarks.

qinsoon · 2024-08-06T07:40:05Z

benches/main.rs

+    // Some benchmarks rely on the "bench" feature to expose some private functions.
+    // Run them with `cargo bench --features bench`.
+    #[cfg(feature = "bench")]
+    bulk_meta::bzero_bset::bench(_c);


See my comments about the features. However, if you do want to separate mock_test and bench, you should create a pattern here so others can create new benchmarks with the bench feature easily.

Criterion benchmarks usually don't use the match statement to select benchmarks. Instead it is similar to running tests. It can be selected on the command line. For example,

cargo bench --features test_private -- bzero

It will select all benchmarks that have "bzero" in the name.

The mock tests are special. Because they rely on a global singleton MockVM, we can only run one test at a time. Previously, we rely on the environment variable MMTK_BENCH to ensure only one mock test benchmark is selected at a time.

At the moment, I just isolated the match statement into the bench_mock function, and leave everything else in bench_main. To install more benchmarks, we simply call c.bench_function, or call functions in other modules (such as bulk_meta::bzero_bset::bench) to install more benchmarks. In the future, the bench_main function will simply call more functions, like these lines: https://github.com/mmtk/mmtk-core/pull/1174/files#diff-9ad1cee13898f4e5ac553eb567cd47a8de7d8852bf292c3d8d5991e23677db32R50-R51

In the future, we can refactor the mock_bench so that we can use the command line to select benchmarks, and as long as we select only one benchmark at a time, they will still be able to run. But if that's impossible, we can still rely on the environment variable.

I have now completely isolated mock benches from other benchmarks. Now regular benchmarks are in a separate module regular_bench, and each module inside it has its fn bench method for registering benchmarks. When others add more regular benchmarks, they change the fn bench in the right module; when adding a mock benchmark, they edit mock_bench::bench which now holds the match statement.

qinsoon · 2024-08-06T08:00:23Z

src/util/metadata/side_metadata/global.rs

-    pub(super) fn zero_meta_bits(
+    /// Expose `zero_meta_bits` when running `cargo bench`.
+    #[cfg(feature = "bench")]
+    pub fn bench_zero_meta_bits(


We may move these functions to a different module, like util::test_util, so we don't need to worry about duplicate function names. Also they won't appear as a disruption when we read the code.

I think it's better to just add a top-level module crate::testing because they are not really "utilities for testing". They are just the functions to be tested.

I am slightly concerned that it will appear at the top-level of our docs. But maybe it is okay.

I moved it to util::test_private. It does not appear on the first page of the cargo docs.

qinsoon

LGTM

wks added 2 commits July 29, 2024 21:34

Refactor iterate_meta_bits

a837987

Benchmarks

acaf942

wks added 8 commits August 1, 2024 16:41

Remove ByteWordRange

f4a08b3

Let's add it back when implementing VO bit scanning.

Fix clippy warnings

a51cdc4

Formatting

b1486e3

Move iterate_meta_bits to ranges::break_bit_range

ab28a79

Add tests and fix bugs

e2bafd5

Just use u8 for the type of bit offset.

c3c3dfa

Comments

a73061c

Merge branch 'master' into fix/bulk-metadata-visit-simple

d994f6e

wks mentioned this pull request Aug 6, 2024

Refactor bulk metadata visiting #1180

Closed

wks added 2 commits August 6, 2024 13:46

Revert criterion version update

323d39e

Fix comments

95cd6df

wks marked this pull request as ready for review August 6, 2024 06:20

wks requested a review from qinsoon August 6, 2024 06:21

qinsoon reviewed Aug 6, 2024

View reviewed changes

wks added 5 commits August 6, 2024 16:19

Rename the "bench" feature to "test_private"

be28f28

Move bench_* into test_private

6994e09

Inline test_private functions

6a9320b

Isolate mock_bench from other benchmarks.

74cf0d8

Minor changes

4f71855

qinsoon approved these changes Aug 6, 2024

View reviewed changes

wks added this pull request to the merge queue Aug 6, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 6, 2024

wks added this pull request to the merge queue Aug 6, 2024

Merged via the queue into mmtk:master with commit 8623b77 Aug 6, 2024

wks deleted the fix/bulk-metadata-visit-simple branch August 6, 2024 15:12

wks mentioned this pull request Aug 8, 2024

Heap traversal #1174

Merged

wks mentioned this pull request Oct 23, 2025

Optimize bulk metadata operations for Region #1407

Open

Refactor iterate_meta_bits #1181

Refactor iterate_meta_bits #1181

Uh oh!

Conversation

wks commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wks commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinsoon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinsoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wks commented Jul 29, 2024 •

edited

Loading

wks commented Jul 31, 2024 •

edited

Loading