Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fixes for book keeping of dirty and closed bytes #25076

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

WillemKauf
Copy link
Contributor

There were a number of bugs in the disk_log_impl dirty and closed bytes book-keeping that are fixed by this PR:

  1. Some calls to subtract_dirty_segment_bytes which did not check if the segment in question was already cleanly compacted. This would leave to over-deduction of dirty bytes.
  2. In do_compact_adjacent_segments(), it is possible to merge a cleanly compacted segment with a non-cleanly compacted segment. Later on, when this replacement segment is finally marked cleanly compacted, the sum of bytes (including those that have already been removed when the cleanly compacted segment was marked) will once again be deducted. We must add these back and consider them as "dirty" to avoid this over-deduction.
  3. Also in do_compact_adjacent_segments(), a call to remove_segment_permanently() for one of the segments being merged would remove its dirty and closed bytes. These bytes are still present in the replacement segment and should not be removed.
  4. Finally, in rewrite_segment_with_offset_map(), we would again over-deduct dirty bytes for a segment that was already marked as cleanly compacted.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

No functional changes in this PR, just a new accessor function.
There were a number of bugs in the book-keeping that are fixed by
this commit:

1. Some calls to `subtract_dirty_segment_bytes` which did not check
   if the segment in question was already cleanly compacted. This
   would leave to over-deduction of dirty bytes.
2. In `do_compact_adjacent_segments()`, it is possible to merge
   a cleanly compacted segment with a non-cleanly compacted segment.
   Later on, when this replacement segment is finally marked cleanly
   compacted, the sum of bytes (including those that have already been
   removed when the cleanly compacted segment was marked) will once
   again be deducted. We must add these back and consider them as
   "dirty" to avoid this over-deduction.
3. Also in `do_compact_adjacent_segments()`, a call to
   `remove_segment_permanently()` for one of the segments being merged
   would remove its dirty and closed bytes. These bytes are still
   present in the replacement segment and should not be removed.
4. Finally, in `rewrite_segment_with_offset_map()`, we would again
   over-deduct dirty bytes for a segment that was already marked as
   cleanly compacted.
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#61806
test_id test_kind job_url test_status passed
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61806#0194f760-a7a1-4466-b888-e765d2454312 FLAKY 1/2
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/61806#0194f760-a7a1-4466-b888-e765d2454312 FLAKY 1/2
rptest.transactions.producers_api_test.ProducersAdminAPITest.test_producers_state_api_during_load ducktape https://buildkite.com/redpanda/redpanda/builds/61806#0194f773-2fb8-4b10-be0c-7c7dedf56454 FLAKY 1/2

@WillemKauf
Copy link
Contributor Author

/ci-repeat 1
skip-build
skip-units
dt-repeat=50
tests/rptest/tests/log_compaction_test.py


add_dirty_segment_bytes(dirty_bytes_to_readd);
const auto removed_bytes = ret.size_before - ret.size_after;
if (!target->has_clean_compact_timestamp()) {
Copy link
Contributor

@nvartolomei nvartolomei Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I'm still confused about dirty/clean segments but trying to make sense of this as a learning exercise too.

I'm confused by this if statement. To me it looks like a copy paste mistake. target here is the newly created segment. has_clean_compact_timestamp is the property on that new segment?

Why we're subtracting removed_bytes based on the "clean state" of the new segment? What does this mean?

let's say we started with 2 segments:
- a) clean 0 - dirty bytes; b) non-clean - 100 dirty bytes

the dirty bytes counter is at 100

dirty_bytes_to_readd is at 100 too

the newly compacted segment is 75 bytes for example, therefore, removed_bytes = 25

target in this case is considered non-clean? if so
we add 100 to dirty bytes (counter is at 200)
we subtract 25 * 2
counter for dirty is at 150?

Questions:
* is the math right?
* are you sure you want to check target here rather than segments.back()?

On the second look the math looks all right.

Last question: non-trivial to add a test for these "metrics"? The logic doesn't seem very simple so maybe we can add a test to make sure we don't break it in the future?

Copy link
Contributor Author

@WillemKauf WillemKauf Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused about dirty/clean segments but trying to make sense of this as a learning exercise too.

A "clean" segment is a closed segment that has had all of its keys fully deduplicated in its prefix of the log. Clean segments can only be produced as a result of being fully indexed (and using the key-offset map to eliminate all duplicates in earlier segments in the log) during the sliding window process.

A "dirty' segment is a closed segment that has not yet fulfilled the above requirement.

Why we're subtracting removed_bytes based on the "clean state" of the new segment? What does this mean?

The new segment, target, is only designated as "clean" if both of the segments merged together were designated as "clean". As an aside, if segment one had n_1 bytes and segment two had n_2 bytes, target will have exactly n_t = n_1 + n_2 bytes.

In the case that target is not clean (i.e dirty), after the self-compaction process of target, we must subtract the removed bytes from dirty segment bytes. Otherwise, we would be miscounting bytes.

Take the following example in which we have only two closed segments in the log.
Segment 1 (150 bytes, clean) and Segment 2 (100 bytes, dirty). Therefore, _dirty_segment_bytes = 100, _closed_segment_bytes = 250.

  1. S1 and S2 are selected for adjacent compaction
  2. dirty_bytes_to_readd is therefore 150.
  3. target is created by merging S1 and S2, size is 250 bytes, segment is designated as "dirty" due to S2 being dirty.
  4. Self compaction of target removes 50 bytes, size of target is now 200 bytes
  5. We add back dirty_bytes_to_readd; _dirty_segment_bytes is now 250.
  6. We subtract bytes removed, _dirty_segment_bytes is now 200, _closed_segment_bytes is now 200.

This makes sense, as we only have one closed segment in the log, and it is dirty. When it is eventually marked as clean by sliding window compaction the entire 200 bytes will be subtracted from _dirty_segment_bytes, leaving it as 0, while _closed_segment_bytes remains 200.

I hope this explains it a bit better, please let me know if there's anything else unclear.

Last question: non-trivial to add a test for these "metrics"? The logic doesn't seem very simple so maybe we can add a test to make sure we don't break it in the future?

There is currently checks in log_compaction_test.py, though not very in depth. There are also fixture tests in this PR.

The most ideal test would be a fixture test in which producing, compaction and other processes are happening, in which we could loop over the segments in a log and manually compute dirty_segment_bytes and closed_segment_bytes and then compare against the book-kept values.

I'd be happy to look at adding it to make sure the book-keeping is now bug free.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something very simple, more like a unit test: create few segments of precise size, compact, check the metrics equal to expected size. I haven’t worked on this part of the codebase so don’t know what’s available in terms of testing. I’ll defer to you the decision of how and when.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thanks for explanation! 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the if statement: it is basically the same as saying !all_segments_cleanly_compacted or bytes_to_readd > 0 but in an indirect way? If so, can we remove the if statement completely? If target is cleanly compacted then bytes to readd will be 0 and calls to subtract will have no effect.

Less cognitive load for the reader.

auto removed_segment_bytes = s->size_bytes();
if (!s->index().has_clean_compact_timestamp()) {
subtract_dirty_segment_bytes(removed_segment_bytes);
if (deduct_bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this would be cleaner if we didn't have the bool flag. Instead, in the caller when we create the new segment, we can increment dirty bytes accounting for what is going to be subtracted. You already added a special case there so can be extended to cover this case as well.

We're replace [a, b] with [c] and removing [b]. [c] is all dirty. If b was dirty then increment dirty bytes by [c] size, else by [c] - [a]. Wdyt?

Copy link
Contributor Author

@WillemKauf WillemKauf Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change this to over-add to dirty bytes, expecting the subtraction later.

}

add_dirty_segment_bytes(dirty_bytes_to_readd);
const auto removed_bytes = ret.size_before - ret.size_after;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removed_bytes can be negative 🤔 subtract_dirty_segment_bytes accepts an uint.

We do recompress batches during compaction. If they were produced to us with the highest compression ration but we recompress with a lower compression ratio then, I imagine, the size could become larger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I was aware we recompress, but I didn't think this would ever lead to a situation where the recompressed batch could be larger than the original compressed batch...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar behaviour can be caused by compression library upgrade. There is no guarantee that compression output is the same across different versions of compressor. Or just settings changes of course like block sizes, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants