Skip to content

metrics: measure block assembly re-evaluation failures#6526

Merged
algorandskiy merged 15 commits intoalgorand:masterfrom
cce:transaction-pool-reevaluation-metrics
Jan 28, 2026
Merged

metrics: measure block assembly re-evaluation failures#6526
algorandskiy merged 15 commits intoalgorand:masterfrom
cce:transaction-pool-reevaluation-metrics

Conversation

@cce
Copy link
Copy Markdown
Contributor

@cce cce commented Jan 14, 2026

This PR adds a new TagCounter metric to track transaction groups that were successfully added to the pool, but later failed during block assembly re-evaluation:

  • algod_tx_pool_reeval_{TAG} where TAG is one of: fee, txn_dead, too_large, groupid, txid_eval, lease_eval, eval

This change also refactors the error classification logic to be shared between txHandler (initial remember) and transactionPool (re-evaluation):

  • Moved tag constants to pools.TxPoolErrTag* for shared use
  • Added pools.ClassifyTxPoolError() to classify BlockEvaluator errors
  • Both txHandler and transactionPool now use the same classification

The reeval metrics are distinct from the existing txHandler metrics (algod_transaction_messages_txpool_remember_err_{TAG}) which track initial admission failures.

@cce cce force-pushed the transaction-pool-reevaluation-metrics branch from 348c0b7 to 4dfd630 Compare January 14, 2026 16:09
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 14, 2026

Codecov Report

❌ Patch coverage is 76.54321% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.71%. Comparing base (b264d42) to head (3035a68).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
ledger/ledgercore/error.go 0.00% 8 Missing ⚠️
ledger/eval/eval.go 44.44% 5 Missing ⚠️
ledger/apply/asset.go 0.00% 3 Missing ⚠️
data/pools/errors.go 96.29% 1 Missing and 1 partial ⚠️
data/pools/transactionPool.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6526      +/-   ##
==========================================
- Coverage   47.83%   47.71%   -0.12%     
==========================================
  Files         662      655       -7     
  Lines       87991    87906      -85     
==========================================
- Hits        42089    41945     -144     
- Misses      43126    43193      +67     
+ Partials     2776     2768       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cce cce force-pushed the transaction-pool-reevaluation-metrics branch from 4dfd630 to 229b273 Compare January 14, 2026 16:51
Add always-enabled Prometheus metrics to track transaction groups that
were successfully added to the pool but later failed during block
assembly re-evaluation.

New reeval metric (TagCounter):
- algod_tx_pool_reeval_{TAG} where TAG is one of: fee, txn_dead,
  too_large, groupid, txid_eval, lease_eval, eval

This change also refactors the error classification logic to be shared
between txHandler (initial remember) and transactionPool (re-evaluation):
- Moved tag constants to pools.TxPoolErrTag* for shared use
- Added pools.ClassifyTxPoolError() to classify BlockEvaluator errors
- Both txHandler and transactionPool now use the same classification

The reeval metrics are distinct from the existing txHandler metrics
(algod_transaction_messages_txpool_remember_err_{TAG}) which track
initial admission failures.

These metrics are always enabled (no config flag required) unlike the
existing telemetry events which require EnableAssembleStats.
@cce cce force-pushed the transaction-pool-reevaluation-metrics branch from 229b273 to 711d840 Compare January 14, 2026 17:23
algorandskiy
algorandskiy previously approved these changes Jan 14, 2026
Comment thread data/txHandler.go Outdated
Extend the error classification in txPoolErrors.go to distinguish
between different types of evaluation failures that were previously
all lumped into the generic "eval" bucket.

New error tags:
- not_well: TxnNotWellFormedError (malformed transaction)
- teal_err: logic.EvalError (TEAL runtime error)
- teal_reject: TEAL approval returned false
- min_balance: Account balance below minimum requirement
- overspend: Insufficient Algo funds
- asset_bal: Insufficient asset balance

These tags help operators understand the nature of transaction
failures on their relays:
- teal_reject indicates round-check MEV contracts failing at execution
- overspend indicates spray-and-pray attacks with unfunded accounts
- asset_bal distinguishes asset balance failures from Algo overspend

The classification uses type assertions for typed errors (EvalError,
TxnNotWellFormedError) and string matching for fmt.Errorf patterns
("rejected by ApprovalProgram", "balance below min", etc.).
cce added 10 commits January 23, 2026 00:14
- Document why cap/pending_eval excluded from reeval tags
- Make "insufficient" pattern more specific ("insufficient balance")
- Add TestClassifyByErrorMessage for message-based classification
Add metrics to track positive outcomes during re-evaluation:
- algod_tx_pool_reeval_success: txn groups successfully re-evaluated
- algod_tx_pool_reeval_committed: txn groups removed (already in block)

This provides parity with txHandler's transactionMessagesRemember counter.
Replace type switch with errors.As calls in classifyUnwrappedError.
This properly traverses wrapped errors (e.g., from Remember's
fmt.Errorf("TransactionPool.Remember: %w", err)) to find the
underlying typed error.
Use TxPoolErrTags for both txPoolReevalCounter and
transactionMessageTxPoolCheckCounter to ensure all tags that
ClassifyTxPoolError can return are predeclared. This prevents
schema drift where a counter receives an unregistered tag.

Removed the separate TxPoolReevalErrTags since all counters now
use the same complete set.
Move ClassifyTxPoolError, error tags, and metrics counters from
txPoolErrors.go into errors.go alongside the error type declarations.
Rename txPoolErrors_test.go to errors_test.go.
algorandskiy
algorandskiy previously approved these changes Jan 28, 2026
cce added 2 commits January 28, 2026 13:39
Exercise real evaluator/pool errors through RememberOne and
re-evaluation, then verify ClassifyTxPoolError returns the correct
tag. Covers fee escalation, lease re-eval, asset balance,
TEAL reject, and TEAL runtime error paths. Consolidates
min-balance classification into the existing
TestSenderGoesBelowMinBalance test.
…reevaluation-metrics-coverage

# Conflicts:
#	data/pools/transactionPool_test.go
#	data/txHandler.go
@cce cce force-pushed the transaction-pool-reevaluation-metrics branch from bcb70c7 to 7a0b140 Compare January 28, 2026 18:43
@cce cce requested a review from algorandskiy January 28, 2026 18:46
@algorandskiy algorandskiy requested a review from jannotti January 28, 2026 18:47
@algorandskiy algorandskiy merged commit 172f19d into algorand:master Jan 28, 2026
39 checks passed
@cce cce deleted the transaction-pool-reevaluation-metrics branch February 24, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants