`storage`: fixes for book keeping of `dirty` and `closed` bytes #25076

WillemKauf · 2025-02-11T22:17:31Z

There were a number of bugs in the disk_log_impl dirty and closed bytes book-keeping that are fixed by this PR:

Some calls to subtract_dirty_segment_bytes which did not check if the segment in question was already cleanly compacted. This would leave to over-deduction of dirty bytes.
In do_compact_adjacent_segments(), it is possible to merge a cleanly compacted segment with a non-cleanly compacted segment. Later on, when this replacement segment is finally marked cleanly compacted, the sum of bytes (including those that have already been removed when the cleanly compacted segment was marked) will once again be deducted. We must add these back and consider them as "dirty" to avoid this over-deduction.
Also in do_compact_adjacent_segments(), a call to remove_segment_permanently() for one of the segments being merged would remove its dirty and closed bytes. These bytes are still present in the replacement segment and should not be removed.
Finally, in rewrite_segment_with_offset_map(), we would again over-deduct dirty bytes for a segment that was already marked as cleanly compacted.

Backports Required

Release Notes

none

No functional changes in this PR, just a new accessor function.

There were a number of bugs in the book-keeping that are fixed by this commit: 1. Some calls to `subtract_dirty_segment_bytes` which did not check if the segment in question was already cleanly compacted. This would leave to over-deduction of dirty bytes. 2. In `do_compact_adjacent_segments()`, it is possible to merge a cleanly compacted segment with a non-cleanly compacted segment. Later on, when this replacement segment is finally marked cleanly compacted, the sum of bytes (including those that have already been removed when the cleanly compacted segment was marked) will once again be deducted. We must add these back and consider them as "dirty" to avoid this over-deduction. 3. Also in `do_compact_adjacent_segments()`, a call to `remove_segment_permanently()` for one of the segments being merged would remove its dirty and closed bytes. These bytes are still present in the replacement segment and should not be removed. 4. Finally, in `rewrite_segment_with_offset_map()`, we would again over-deduct dirty bytes for a segment that was already marked as cleanly compacted.

vbotbuildovich · 2025-02-12T02:00:32Z

CI test results

test results on build#61806

test_id	test_kind	job_url	test_status	passed
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP	ducktape	https://buildkite.com/redpanda/redpanda/builds/61806#0194f760-a7a1-4466-b888-e765d2454312	FLAKY	1/2
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/61806#0194f760-a7a1-4466-b888-e765d2454312	FLAKY	1/2
rptest.transactions.producers_api_test.ProducersAdminAPITest.test_producers_state_api_during_load	ducktape	https://buildkite.com/redpanda/redpanda/builds/61806#0194f773-2fb8-4b10-be0c-7c7dedf56454	FLAKY	1/2

WillemKauf · 2025-02-12T16:45:27Z

/ci-repeat 1
skip-build
skip-units
dt-repeat=50
tests/rptest/tests/log_compaction_test.py

nvartolomei · 2025-02-14T01:07:20Z

src/v/storage/disk_log_impl.cc

+
+    add_dirty_segment_bytes(dirty_bytes_to_readd);
+    const auto removed_bytes = ret.size_before - ret.size_after;
+    if (!target->has_clean_compact_timestamp()) {


Disclaimer: I'm still confused about dirty/clean segments but trying to make sense of this as a learning exercise too.

I'm confused by this if statement. To me it looks like a copy paste mistake. target here is the newly created segment. has_clean_compact_timestamp is the property on that new segment?

Why we're subtracting removed_bytes based on the "clean state" of the new segment? What does this mean?

~~let's say we started with 2 segments:~~
~~- a) clean 0 - dirty bytes; b) non-clean - 100 dirty bytes~~

~~the dirty bytes counter is at 100~~

~~dirty_bytes_to_readd is at 100 too~~

~~the newly compacted segment is 75 bytes for example, therefore, removed_bytes = 25~~

~~target in this case is considered non-clean? if so~~
~~we add 100 to dirty bytes (counter is at 200)~~
~~we subtract 25 * 2~~
~~counter for dirty is at 150?~~

~~Questions:~~
* is the math right?
* are you sure you want to check target here rather than segments.back()?

On the second look the math looks all right.

Last question: non-trivial to add a test for these "metrics"? The logic doesn't seem very simple so maybe we can add a test to make sure we don't break it in the future?

I'm still confused about dirty/clean segments but trying to make sense of this as a learning exercise too.

A "clean" segment is a closed segment that has had all of its keys fully deduplicated in its prefix of the log. Clean segments can only be produced as a result of being fully indexed (and using the key-offset map to eliminate all duplicates in earlier segments in the log) during the sliding window process.

A "dirty' segment is a closed segment that has not yet fulfilled the above requirement.

Why we're subtracting removed_bytes based on the "clean state" of the new segment? What does this mean?

The new segment, target, is only designated as "clean" if both of the segments merged together were designated as "clean". As an aside, if segment one had n_1 bytes and segment two had n_2 bytes, target will have exactly n_t = n_1 + n_2 bytes.

In the case that target is not clean (i.e dirty), after the self-compaction process of target, we must subtract the removed bytes from dirty segment bytes. Otherwise, we would be miscounting bytes.

Take the following example in which we have only two closed segments in the log.
Segment 1 (150 bytes, clean) and Segment 2 (100 bytes, dirty). Therefore, _dirty_segment_bytes = 100, _closed_segment_bytes = 250.

S1 and S2 are selected for adjacent compaction

dirty_bytes_to_readd is therefore 150.

target is created by merging S1 and S2, size is 250 bytes, segment is designated as "dirty" due to S2 being dirty.

Self compaction of target removes 50 bytes, size of target is now 200 bytes

We add back dirty_bytes_to_readd; _dirty_segment_bytes is now 250.

We subtract bytes removed, _dirty_segment_bytes is now 200, _closed_segment_bytes is now 200.

This makes sense, as we only have one closed segment in the log, and it is dirty. When it is eventually marked as clean by sliding window compaction the entire 200 bytes will be subtracted from _dirty_segment_bytes, leaving it as 0, while _closed_segment_bytes remains 200.

I hope this explains it a bit better, please let me know if there's anything else unclear.

Last question: non-trivial to add a test for these "metrics"? The logic doesn't seem very simple so maybe we can add a test to make sure we don't break it in the future?

There is currently checks in log_compaction_test.py, though not very in depth. There are also fixture tests in this PR.

The most ideal test would be a fixture test in which producing, compaction and other processes are happening, in which we could loop over the segments in a log and manually compute dirty_segment_bytes and closed_segment_bytes and then compare against the book-kept values.

I'd be happy to look at adding it to make sure the book-keeping is now bug free.

I was thinking something very simple, more like a unit test: create few segments of precise size, compact, check the metrics equal to expected size. I haven’t worked on this part of the codebase so don’t know what’s available in terms of testing. I’ll defer to you the decision of how and when.

And thanks for explanation! 🙏

Re the if statement: it is basically the same as saying !all_segments_cleanly_compacted or bytes_to_readd > 0 but in an indirect way? If so, can we remove the if statement completely? If target is cleanly compacted then bytes to readd will be 0 and calls to subtract will have no effect.

Less cognitive load for the reader.

nvartolomei · 2025-02-14T04:31:45Z

src/v/storage/disk_log_impl.cc

-    auto removed_segment_bytes = s->size_bytes();
-    if (!s->index().has_clean_compact_timestamp()) {
-        subtract_dirty_segment_bytes(removed_segment_bytes);
+    if (deduct_bytes) {


I believe this would be cleaner if we didn't have the bool flag. Instead, in the caller when we create the new segment, we can increment dirty bytes accounting for what is going to be subtracted. You already added a special case there so can be extended to cover this case as well.

We're replace [a, b] with [c] and removing [b]. [c] is all dirty. If b was dirty then increment dirty bytes by [c] size, else by [c] - [a]. Wdyt?

Sure, I can change this to over-add to dirty bytes, expecting the subtraction later.

nvartolomei · 2025-02-14T04:34:52Z

src/v/storage/disk_log_impl.cc

+    }
+
+    add_dirty_segment_bytes(dirty_bytes_to_readd);
+    const auto removed_bytes = ret.size_before - ret.size_after;


I think removed_bytes can be negative 🤔 subtract_dirty_segment_bytes accepts an uint.

We do recompress batches during compaction. If they were produced to us with the highest compression ration but we recompress with a lower compression ratio then, I imagine, the size could become larger.

🤔 I was aware we recompress, but I didn't think this would ever lead to a situation where the recompressed batch could be larger than the original compressed batch...

Similar behaviour can be caused by compression library upgrade. There is no guarantee that compression output is the same across different versions of compressor. Or just settings changes of course like block sizes, etc.

WillemKauf added 2 commits February 11, 2025 17:04

storage: add segment::has_clean_compact_timestamp()

3b3b7ca

No functional changes in this PR, just a new accessor function.

WillemKauf requested review from andrwng and dotnwat February 11, 2025 22:17

github-actions bot added the area/redpanda label Feb 11, 2025

nvartolomei reviewed Feb 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`storage`: fixes for book keeping of `dirty` and `closed` bytes #25076

`storage`: fixes for book keeping of `dirty` and `closed` bytes #25076

WillemKauf commented Feb 11, 2025

vbotbuildovich commented Feb 12, 2025

WillemKauf commented Feb 12, 2025

nvartolomei Feb 14, 2025 •

edited

Loading

WillemKauf Feb 14, 2025 •

edited

Loading

nvartolomei Feb 14, 2025

nvartolomei Feb 14, 2025

nvartolomei Feb 14, 2025

nvartolomei Feb 14, 2025

WillemKauf Feb 14, 2025 •

edited

Loading

nvartolomei Feb 14, 2025

WillemKauf Feb 14, 2025

nvartolomei Feb 14, 2025

storage: fixes for book keeping of dirty and closed bytes #25076

Are you sure you want to change the base?

storage: fixes for book keeping of dirty and closed bytes #25076

Conversation

WillemKauf commented Feb 11, 2025

Backports Required

Release Notes

vbotbuildovich commented Feb 12, 2025

CI test results

WillemKauf commented Feb 12, 2025

nvartolomei Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

WillemKauf Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

nvartolomei Feb 14, 2025

Choose a reason for hiding this comment

nvartolomei Feb 14, 2025

Choose a reason for hiding this comment

nvartolomei Feb 14, 2025

Choose a reason for hiding this comment

nvartolomei Feb 14, 2025

Choose a reason for hiding this comment

WillemKauf Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

nvartolomei Feb 14, 2025

Choose a reason for hiding this comment

WillemKauf Feb 14, 2025

Choose a reason for hiding this comment

nvartolomei Feb 14, 2025

Choose a reason for hiding this comment

`storage`: fixes for book keeping of `dirty` and `closed` bytes #25076

`storage`: fixes for book keeping of `dirty` and `closed` bytes #25076

nvartolomei Feb 14, 2025 •

edited

Loading

WillemKauf Feb 14, 2025 •

edited

Loading

WillemKauf Feb 14, 2025 •

edited

Loading