Skip to content

AK: Search CircularBuffer matches from the end of memory#26527

Merged
timschumi merged 2 commits intoSerenityOS:masterfrom
timschumi:circular-buffer-consistent-order
Jan 11, 2026
Merged

AK: Search CircularBuffer matches from the end of memory#26527
timschumi merged 2 commits intoSerenityOS:masterfrom
timschumi:circular-buffer-consistent-order

Conversation

@timschumi
Copy link
Member

@timschumi timschumi commented Dec 31, 2025

This is particularly useful for compression, where we want to search through the lookback buffer for the smallest possible distance [towards the end].

Do note that the new AK::memmem_reverse has somewhat unexpected semantics around what an "offset" is, matching the very last character of the memory range with a needle of size 1 would yield the offset 1. Please check the test files for more examples of this. Would love to hear some opinions about that, but I can't figure out how to solve it any better way.

@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Dec 31, 2025
Copy link
Contributor

@nico nico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any observable changes in the filesizes produced by tar -c --lzma with this?

@timschumi
Copy link
Member Author

Any observable changes in the filesizes produced by tar -c --lzma with this?

tar -c --lzma Documents Source Tests yields a bit-exact identical archive before and after this change (after about 30 minutes of compressing). I assume that the length restriction to 2 <= x < 3 for pure memmem hits makes this have a small enough impact that this case actually rarely happens in practice with more than one possible match.

@Hendiadyoin1
Copy link
Contributor

Hendiadyoin1 commented Jan 1, 2026

2 <= x < 3

So 2?

@timschumi
Copy link
Member Author

2 <= x < 3

So 2?

Sorry, yeah, it's effectively x = 2. I earlier had it noted down as x <= 3, but the x = 3 case is a validation fallback since the hash-based lookup can handle that already (and therefore it would get its preference from the hash lookup).

@timschumi
Copy link
Member Author

Found a bunch of other issues in other places that may affect this PR, will draft until I fixed those.

@timschumi timschumi marked this pull request as draft January 2, 2026 02:08
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Jan 2, 2026
@timschumi timschumi force-pushed the circular-buffer-consistent-order branch from 47f3ea9 to 554e426 Compare January 6, 2026 00:57
@timschumi
Copy link
Member Author

Applying the forward/reverse preference to the whole size range (instead of just the ones covered by non-hash search) indeed creates a noticeable size difference:

-rw-r--r--   1 anon    users    298.6 KiB   2026-01-06 01:29:04  ST_fwd.tar.lzma
-rw-r--r--   1 anon    users    281.2 KiB   2026-01-06 01:18:34  ST_rev.tar.lzma

Do note that in all cases (that I remember) we still pick the optimal match length-wise, so this size difference should be created purely by said distance encoding.

I'll clean up a few other issues beforehand separately and then open this PR back up for review.

This is particularly useful for compression, where we want to search
through the lookback buffer for the smallest possible distance [towards
the end].
This is generally advantageous for compression algorithms.
@timschumi timschumi force-pushed the circular-buffer-consistent-order branch from 554e426 to 6c29868 Compare January 7, 2026 22:29
@timschumi timschumi marked this pull request as ready for review January 7, 2026 22:29
@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Jan 7, 2026
@timschumi timschumi merged commit 13c720c into SerenityOS:master Jan 11, 2026
14 checks passed
@timschumi timschumi deleted the circular-buffer-consistent-order branch January 11, 2026 21:42
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments