Skip to content

[IMPROVED] Publish permissions cache pruning#6674

Merged
neilalexander merged 1 commit intomainfrom
prune_pub_perms_cache
Mar 14, 2025
Merged

[IMPROVED] Publish permissions cache pruning#6674
neilalexander merged 1 commit intomainfrom
prune_pub_perms_cache

Conversation

@kozlovic
Copy link
Copy Markdown
Member

The test TestLeafNodePubAllowedPruning would sometimes report (after all go routines calling c.pubAllowed() being done) that the cache size was greater than the max allowed (128).

@MauriceVanVeen did fix the test with a correct explanation of why this test flapped, but I think we can improve a tiny bit the code so that the cache is much likely to be pruned enough.

In this PR, I make prunePubPermsCache() attempts up to 5 times to prune enough the cache, locking/releasing at each attempt to improve the odds that another routine takes a shot at it.

In my experiments, I am seeing only up to 2 attempts to be below the max size (and less than 10ms execution time). That being said, with artificial sleep before the release of the lock, it could take way more attempts. So for very slow machines, although I have reverted the changes to the test done in PR#6636, I did add a little protection to call one more time c.pubAllowed() if we first detect that we are over the limit. If we are still after that the test will fail and would need another look.

Signed-off-by: Ivan Kozlovic ivan@synadia.com

The test `TestLeafNodePubAllowedPruning` would sometimes report
(after all go routines calling c.pubAllowed() being done) that
the cache size was greater than the max allowed (128).

@MauriceVanVeen did fix the test with a correct explanation of
why this test flapped, but I think we can improve a tiny bit the
code so that the cache is much likely to be pruned enough.

In this PR, I make `prunePubPermsCache()` attempts up to 5 times
to prune enough the cache, locking/releasing at each attempt to
improve the odds that another routine takes a shot at it.

In my experiments, I am seeing only up to 2 attempts to be below
the max size (and less than 10ms execution time). That being said,
with artificial sleep before the release of the lock, it could
take way more attempts. So for very slow machines, although I have
reverted the changes to the test done in PR#6636, I did add a
little protection to call one more time `c.pubAllowed()` if we
first detect that we are over the limit. If we are still after that
the test will fail and would need another look.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
@kozlovic kozlovic requested a review from a team as a code owner March 13, 2025 23:20
@kozlovic kozlovic requested a review from MauriceVanVeen March 13, 2025 23:24
@neilalexander neilalexander merged commit 92c99aa into main Mar 14, 2025
5 checks passed
@neilalexander neilalexander deleted the prune_pub_perms_cache branch March 14, 2025 09:45
neilalexander added a commit that referenced this pull request Apr 17, 2025
Includes the following (already cherry-picked) PRs:

- #6587
- #6607
- #6612
- #6609
- #6620
- #6668
- #6674
- #6647
- #6684
- #6691
- #6697
- #6705
- #6706
- #6704
- #6714
- #6720
- #6727
- #6730
- #6726
- #6732
- #6759
- #6753
- #6685
- #6769
- #6777
- #6785
- #6786
- #6778
- #6790
- #6791
- #6798
- #6794
- #6801

Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants