Skip to content

Hold certain retryable transactions until next slot#6864

Merged
apfitzge merged 3 commits intoanza-xyz:masterfrom
apfitzge:skip_transactions_until_next_slot
Sep 22, 2025
Merged

Hold certain retryable transactions until next slot#6864
apfitzge merged 3 commits intoanza-xyz:masterfrom
apfitzge:skip_transactions_until_next_slot

Conversation

@apfitzge
Copy link
Copy Markdown

@apfitzge apfitzge commented Jul 7, 2025

Problem

  • Transactions that are going over CU limits should not be retried immediately, but currently are.

Summary of Changes

  • Add mechanism to hold transactions until the next slot transition is detected.

Fixes #

@apfitzge apfitzge marked this pull request as ready for review July 7, 2025 18:05
Comment thread core/src/banking_stage/consumer.rs
Copy link
Copy Markdown

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat solution, just few minor comments.

Comment thread core/src/banking_stage/consumer.rs
|(index, processing_result)| {
processing_result.was_processed().then_some(RetryableIndex {
index,
immediately_retryable: true, // recording errors are always immediately retryable
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit picking: instead of mapping different retryable scenarios to immediately_retryable: bool in various parts of the code, maybe consider constructing RetryableIndex with the specific reason (say enum { NotScheduled, WouldExceedCuLimit, RecordErrror, ..}) and centralizing the immediate retry decision in one place (perhaps as a member function of RetryableIndex). This could help improve clarity and maintainability.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your suggestion is the more correct one, however I'd extend it. retryable_indexes should not be a thing at all...we should simply return the transaction results, and scheduler should make a decision from there.

That seems like a larger change, but I'll give it a shot.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking the change is pretty involved, because retryable_indexes.len() is used in a ton of places in metrics accumulation. I think if we remove retryable indexes it should be done separately, wdyt?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that returning transaction results to scheduler is a better solution, and it should be done in separately.

Comment thread core/src/banking_stage/transaction_scheduler/scheduler_common.rs
) -> usize;

/// Hold the tarnsaction until the next flush (next slot).
fn hold_transaction(&mut self, priority_id: TransactionPriorityId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hold_transaction_for_next_slot(...)?

apfitzge added a commit to apfitzge/agave that referenced this pull request Jul 28, 2025
apfitzge added a commit to apfitzge/agave that referenced this pull request Aug 19, 2025
apfitzge added a commit to apfitzge/agave that referenced this pull request Aug 19, 2025
apfitzge added a commit to apfitzge/agave that referenced this pull request Aug 26, 2025
@apfitzge apfitzge force-pushed the skip_transactions_until_next_slot branch from 0b87c3a to 10d791a Compare September 18, 2025 19:05
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.82090% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.9%. Comparing base (5453ef6) to head (10d791a).

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #6864    +/-   ##
========================================
  Coverage    82.9%    82.9%            
========================================
  Files         823      823            
  Lines      360873   360972    +99     
========================================
+ Hits       299346   299462   +116     
+ Misses      61527    61510    -17     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@apfitzge apfitzge mentioned this pull request Sep 19, 2025
@apfitzge apfitzge requested a review from tao-stones September 22, 2025 13:16
Copy link
Copy Markdown

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is good as-is for the stated issue.

|(index, processing_result)| {
processing_result.was_processed().then_some(RetryableIndex {
index,
immediately_retryable: true, // recording errors are always immediately retryable
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that returning transaction results to scheduler is a better solution, and it should be done in separately.

@apfitzge apfitzge merged commit 8b734e5 into anza-xyz:master Sep 22, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants