Cache per-decision look-1 to skip non-viable speculative paths#5
Conversation
Closes #4. The fast recognizer used to fan out every alternative at every multi-alt state, then dedupe outcomes; on the published Kotlin snippets that walked tens of millions of ATN edges. Cache one look-1 set per outgoing transition at decision states, prune non-consuming transitions whose closure cannot accept the current lookahead, and share recovery contexts and FIRST sets through `Rc` to drop the per-visit `BTreeSet` clone churn. Consuming transitions still go through the existing `matches`+recovery path so error diagnostics, single-token recovery, and the no-prefilter retry stay byte-identical with the no-cache baseline. Kotlin parity dumps remain byte-identical to `antlr4-python3-runtime`, all 357 cases of the runtime testsuite pass, and the Kotlin parse-only timings on the issue's snippets land at min 9.5 ms / avg 10.1 ms for 01-nested-types.kt and min 18.1 ms / avg 19.1 ms for 02-dataframe.kt (was 150 ms+ and 730 ms+ full-process before).
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (10)
📝 WalkthroughWalkthroughReplaces BTree-based memoization with FxHasher-backed FxHashMap/FxHashSet, interns FIRST and recovery symbol sets as shared ChangesFast Recognizer Memoization & Recovery Caching
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Copy/Paste DetectionFound 8 duplication(s) across 3 changed Rust file(s) (threshold: 100 tokens). Show duplicationsFound a 22 line (116 tokens) duplication in the following files:
recovery_symbols: Rc::clone(&recovery_symbols),
recovery_state,
};
if let Some(outcomes) = memo.get(&key) {
return outcomes.clone();
}
let visit_key = key.clone();
if !visiting.insert(visit_key.clone()) {
return Vec::new();
}
let Some(state) = atn.state(state_number) else {
visiting.remove(&visit_key);
return Vec::new();
};
let next_decision_start_index = if starts_prediction_decision(state) {
Some(index)
} else {
decision_start_index
};
let (epsilon_recovery_symbols, epsilon_recovery_state) =Found a 15 line (114 tokens) duplication in the following files:
fn parser_matches_token_and_reports_mismatch() {
let source = Source {
tokens: vec![
CommonToken::new(1).with_text("x"),
CommonToken::eof("parser-test", 1, 1, 1),
],
index: 0,
};
let data = RecognizerData::new(
"Mini.g4",
Vocabulary::new([None, Some("'x'")], [None, Some("X")], [None::<&str>, None]),
);
let mut parser = BaseParser::new(CommonTokenStream::new(source), data);
assert_eq!(
parser.match_token(1).expect("token 1 should match").text(),Found a 32 line (113 tokens) duplication in the following files:
outcomes.extend(
self.recognize_state(
atn,
RecognizeRequest {
state_number: *target,
stop_state,
index,
rule_start_index,
decision_start_index: next_decision_start_index,
init_action_rules,
predicates,
rule_args,
member_actions,
return_actions,
local_int_arg,
member_values: member_values.clone(),
return_values: return_values.clone(),
rule_alt_number: next_alt_number,
track_alt_numbers,
consumed_eof,
precedence,
depth: depth + 1,
recovery_symbols: epsilon_recovery_symbols.clone(),
recovery_state: epsilon_recovery_state,
},
visiting,
memo,
expected,
)
.into_iter()
.map(|mut outcome| {
prepend_decision(&mut outcome, decision);Found a 14 line (110 tokens) duplication in the following files:
fn outcome_ties_keep_later_non_recursive_alternative() {
let first = RecognizeOutcome {
index: 1,
consumed_eof: false,
alt_number: 0,
member_values: BTreeMap::new(),
return_values: BTreeMap::new(),
diagnostics: Vec::new(),
decisions: Vec::new(),
actions: vec![ParserAction::new(1, 0, 0, None)],
nodes: vec![RecognizedNode::Token { index: 0 }],
};
let second = RecognizeOutcome {
actions: vec![ParserAction::new(2, 0, 0, None)],Found a 13 line (109 tokens) duplication in the following files:
fn parser_matches_token_and_reports_mismatch() {
let source = Source {
tokens: vec![
CommonToken::new(1).with_text("x"),
CommonToken::eof("parser-test", 1, 1, 1),
],
index: 0,
};
let data = RecognizerData::new(
"Mini.g4",
Vocabulary::new([None, Some("'x'")], [None, Some("X")], [None::<&str>, None]),
);
let mut parser = BaseParser::new(CommonTokenStream::new(source), data);Found a 25 line (108 tokens) duplication in the following files:
| Transition::Action { target, .. } => {
let boundary = left_recursive_boundary(atn, state, *target);
outcomes.extend(
self.recognize_state_fast(
atn,
FastRecognizeRequest {
state_number: *target,
stop_state,
index,
rule_start_index,
decision_start_index: next_decision_start_index,
precedence,
depth: depth + 1,
recovery_symbols: Rc::clone(&epsilon_recovery_symbols),
recovery_state: epsilon_recovery_state,
},
visiting,
memo,
expected,
)
.into_iter()
.map(|mut outcome| {
if let Some(rule_index) = boundary {
outcome.nodes.prepend(Rc::new(
FastRecognizedNode::LeftRecursiveBoundary { rule_index },Found a 20 line (107 tokens) duplication in the following files:
let start_state = atn
.rule_to_start_state()
.get(rule_index)
.copied()
.ok_or_else(|| {
AntlrError::Unsupported(format!("rule {rule_index} has no start state"))
})?;
let stop_state = atn
.rule_to_stop_state()
.get(rule_index)
.copied()
.filter(|state| *state != usize::MAX)
.ok_or_else(|| {
AntlrError::Unsupported(format!("rule {rule_index} has no stop state"))
})?;
let start_index = self.current_visible_index();
self.clear_prediction_diagnostics();
self.reset_per_parse_caches();
let first_pass = self.fast_recognize_top(atn, start_state, stop_state, start_index);Found a 15 line (103 tokens) duplication in the following files:
FastRecognizedNode::MissingToken {
token_type,
at_index,
text,
} => {
let current = self.token_at(*at_index);
let token = CommonToken::new(*token_type)
.with_text(text)
.with_span(usize::MAX, usize::MAX)
.with_position(
current.as_ref().map(Token::line).unwrap_or_default(),
current.as_ref().map(Token::column).unwrap_or_default(),
);
Ok(ParseTree::Error(ErrorNode::new(token)))
} |
Resolves the `clippy::excessive-nesting` failure CI hit on PR #5: the inline pruning block was eight levels deep once the `match expected.index` branch was added. Lift the body into `should_skip_via_lookahead` plus `record_pruned_transition_expected` so the recognize loop reads the prefilter as a single boolean check. Behavior is unchanged — confirmed by re-running `cargo test --release` and the Kotlin parity dumps; parse-only timings stay at ~10 ms / ~19 ms.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa4beaf327
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/parser.rs`:
- Around line 1157-1186: FastRecognizeKey's PartialEq falls back to value
equality on recovery_symbols while Hash uses only pointer identity (Rc::as_ptr),
violating Hash/Eq; fix by ensuring every Rc<BTreeSet<i32>> that can become
recovery_symbols is interned so identical sets share the same Rc pointer: modify
the producer(s) of these Rc values (notably cached_state_expected_symbols and
fast_next_recovery_context) to run their BTreeSet<i32> through the same interner
used elsewhere (return the interned Rc) before storing or returning, so
Pointer-equality in PartialEq matches the pointer-only Hash and the Hash/Eq
contract holds.
- Around line 2289-2317: Extract the nested logic that updates the diagnostic
accumulator into a helper function (suggested name:
record_pruned_expected_symbols) that takes &mut ExpectedTokens, index: usize,
and symbols: &BTreeSet<i32>, and moves the match/merge logic (checks for empty
set, matching expected.index, extend vs clone_from) into it; then replace the
deeply nested block inside the prune_non_consuming branch (the code that
inspects lookahead_filter, entry.transitions.get(transition_index), and
set.symbols/nullability checks) by calling record_pruned_expected_symbols(&mut
expected, index, &set.symbols) before the continue; also reuse the same helper
to replace the duplicated block in the Rule transition branch to eliminate the
excessive nesting reported by clippy.
In `@tests/kotlin-parity/dumper/src/main.rs`:
- Around line 62-68: The "--iters" arm currently unconditionally consumes the
next token via args.next() and silently falls back to 1 on parse failure; change
it to explicitly check for a present non-flag token and report an error on
missing/invalid values. Make the iterator a Peekable (e.g., args =
args.peekable()), then in the "--iters" match use args.peek() to ensure the next
item exists and does not start with '-' before calling args.next(); attempt to
parse with value.parse::<usize>() and on Err or if missing, print a clear error
(or return Err) and exit (or propagate), instead of silently defaulting to 1;
reference the "--iters" match arm and the iters variable when making this
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0ffd0411-b62e-4202-8aec-9b7a22023718
📒 Files selected for processing (3)
src/parser.rssrc/token_stream.rstests/kotlin-parity/dumper/src/main.rs
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations to the ANTLR parser runtime, focusing on reducing allocations and redundant computations during speculative recognition. Key changes include the introduction of a fast FxHasher, interning of recovery symbol sets using Rc, and the implementation of several internal caches for FIRST sets and decision lookahead. The recognizer now utilizes lookahead-based pruning and optimized token stream access to minimize cursor movement. Review feedback identifies a critical compilation error in the hasher's integer casting, a Hash trait contract violation in FastRecognizeKey, and potential memory issues when expanding large token ranges. Suggestions were also made to use the Entry API for more efficient cache lookups and BTreeMap to ensure deterministic behavior where necessary.
I am having trouble creating individual review comments. Click here to see my feedback.
src/parser.rs (47)
The method cast_unsigned does not exist on the i32 primitive type in Rust. This will cause a compilation error. You should use value as u32 to bit-cast the integer to its unsigned representation for hashing.
self.write_u64(u64::from(value as u32));
src/parser.rs (1157-1186)
The Hash implementation for FastRecognizeKey uses the pointer address of recovery_symbols (Rc::as_ptr), while the PartialEq implementation allows for deep equality comparison (self.recovery_symbols == other.recovery_symbols). This violates the Hash trait contract where a == b must imply hash(a) == hash(b). Since you are interning these sets, you should rely solely on pointer equality in PartialEq for consistency and performance, aligning with the repository's strategy for deduplicating objects with heap-allocated fields.
impl PartialEq for FastRecognizeKey {
fn eq(&self, other: &Self) -> bool {
self.state_number == other.state_number
&& self.stop_state == other.stop_state
&& self.index == other.index
&& self.rule_start_index == other.rule_start_index
&& self.decision_start_index == other.decision_start_index
&& self.precedence == other.precedence
&& self.recovery_state == other.recovery_state
&& Rc::ptr_eq(&self.recovery_symbols, &other.recovery_symbols)
}
}References
- When deduplicating objects with heap-allocated fields, bucket candidates by simple keys and compare complex fields via borrowed slices to minimize clones and allocations.
src/parser.rs (725)
Collecting a large range into a BTreeSet can be extremely expensive in terms of both time and memory. For Unicode grammars, a single Transition::Range could span over a million code points (e.g., 0..=0x10FFFF), leading to massive allocations. Since TransitionLookSet is designed for a look-1 prefilter, consider storing the range itself or using an IntervalSet to perform membership checks without expanding the full set of symbols.
src/parser.rs (3545-3555)
This method performs a get followed by an insert if the key is missing, resulting in two hash lookups. You can use the Entry API to perform this operation in a single lookup. Additionally, if this cache is part of a deterministic state index, consider using a BTreeMap instead of a HashMap to ensure stable behavior.
fn cached_state_expected_symbols(
&mut self,
atn: &Atn,
state_number: usize,
) -> Rc<BTreeSet<i32>> {
self.state_expected_cache
.entry(state_number)
.or_insert_with(|| {
let symbols = state_expected_symbols(atn, state_number);
if symbols.is_empty() {
Rc::clone(&self.empty_recovery_symbols)
} else {
Rc::new(symbols)
}
})
.clone()
}
References
- To avoid performance bottlenecks in hot paths, use a membership index (like BTreeSet) alongside ordered vectors to avoid linear scans while preserving iteration order.
- Use ordered maps (e.g., BTreeMap) instead of hash maps when a deterministic state index is required to preserve stable behavior or traces.
src/parser.rs (3572-3577)
The interner performs a get followed by an insert, leading to redundant hash lookups. Using the Entry API allows you to check and insert in one pass. If deterministic state indexing is required for this interner, BTreeMap is preferred over HashMap.
let candidate = Rc::new(set);
self.recovery_symbols_intern
.entry(Rc::clone(&candidate))
.or_insert(candidate)
.clone()References
- Use ordered maps (e.g., BTreeMap) instead of hash maps when a deterministic state index is required to preserve stable behavior or traces.
…istent PR #5 review (chatgpt-codex-connector): FastRecognizeKey compared recovery_symbols by pointer-or-content but hashed only the pointer, which violates the Hash/Eq contract any time two equal sets were sourced from different allocations. The fix is to make the source deterministic — route cached_state_expected_symbols through intern_recovery_symbols so equal expected-symbol sets always share one Rc, the same invariant fast_next_recovery_context already relied on. Comment threads on both helpers note that every recovery_symbols Rc that ends up in a FastRecognizeKey must come from intern_recovery_symbols (or the empty singleton) so future callers do not reintroduce the divergence. Verified: cargo test, kotlin-parity dumps (still byte-identical), parse-only timings unchanged at ~10ms / ~19ms.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3de2e97e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
PR #5 review (coderabbitai): the dumper's --iters arm consumed the next token unconditionally and fell back to 1 on parse failure, so `--iters --time` silently swallowed --time as the value, parsed it as 1, and dropped the timing flag. Now exit 2 with a clear message when the value is missing, non-numeric, or below 1, matching the dumper's existing handling for missing --input. Verified all four paths: --iters 3 -> uses 3 --iters -> "missing value for --iters <n>", exit 2 --iters --time -> "invalid --iters value: --time ...", exit 2 --iters 0 -> "invalid --iters value: 0 ...", exit 2
Addresses three review threads on PR #5: * Codex P2 (3284250583): `intern_recovery_symbols` accumulated every recovery union into `recovery_symbols_intern` for the parser's lifetime. Long-lived parsers processing many inputs (especially malformed ones that explore wide recovery sets) would grow it monotonically. Add `reset_recovery_symbol_caches`, called from both `parse_atn_rule` and `parse_atn_rule_with_runtime_options` alongside `clear_prediction_diagnostics`. The state-expected cache is cleared in lockstep because its values are produced through the interner and pinning one without the other would re-break the `FastRecognizeKey` identity invariant. * Greptile P1 (3284250809): drop the content-equality fallback from `FastRecognizeKey::eq`. Now compares `recovery_symbols` purely by `Rc::ptr_eq` (matching the pointer-only `Hash` impl) and `debug_assert!`s that two `Rc`s can only differ if their contents differ — so any future caller that forgets to intern produces a loud test failure instead of a silent `Hash`/`Eq` divergence. * Greptile P2 (3284250890): align `FxHasher::write` with the typed methods by folding bytes 8 at a time as little-endian `u64`s. A future key type whose `Hash` impl funnels through `write` (e.g. `String`, `[u8; N]`) now hashes consistently with `write_u64`. Added a unit test that pins this invariant. Verified: `cargo test` (debug + release, 36 lib tests pass — debug asserts active in debug mode), kotlin-parity dumps unchanged, parse-only timings ~9 ms / ~17 ms.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/parser.rs`:
- Around line 3983-3999: The reset_recovery_symbol_caches method currently
clears recovery_symbols_intern and state_expected_cache but misses other
ATN-dependent caches; update reset_recovery_symbol_caches (in the BaseParser
implementation) to also clear first_set_cache and decision_lookahead_cache so
that reusing a parser with a different &Atn cannot reuse stale FIRST/lookahead
entries, or alternatively ensure those caches are keyed/scoped by the AtN
identity before reuse.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ba211a26-d0f5-4b52-a4b2-e32422e577da
📒 Files selected for processing (2)
src/parser.rstests/kotlin-parity/dumper/src/main.rs
…LAUDE.md Captures the reproduce steps a fresh checkout needs: ANTLR jar + grammars-v4 sparse-checkout for the Kotlin parity perf benchmark, the upstream antlr4 checkout for the 357-case runtime testsuite, plus the dumper's --iters/--time stopwatch flow and the "rm -rf target/antlr-runtime-testsuite/*" tip for recovering from a killed run. Also pins the CI clippy invocation so local lint runs stay in sync.
PR #5 review (coderabbitai): `BaseParser::parse_atn_rule` accepts an arbitrary `&Atn` per call, so the same parser can legally be driven against different grammars in sequence. The previous reset only cleared `recovery_symbols_intern` and `state_expected_cache`, but `first_set_cache` and `decision_lookahead_cache` are also keyed by raw ATN coordinates (state numbers, rule indexes) and would silently hand back entries from a previous ATN — pruning lookahead against the wrong transitions. Renamed the helper to `reset_per_parse_caches` to reflect the broader scope and documented the four caches it now covers, calling out which two have to clear in lockstep to preserve `FastRecognizeKey`'s pointer-based hashing invariant. Verified: cargo test (debug + release, 36 lib tests), kotlin parity unchanged, parse-only timings still ~10ms / ~19ms.
Substantial recognizer-internals overhaul shipped on this branch: SLL look-1 prefilter at multi-alt states with per-state caching, ATN-keyed caches reset per parse so the same parser can be driven against different grammars, `Rc<BTreeSet<i32>>` interning for recovery symbols with pointer-keyed `FastRecognizeKey` hashing (debug-asserted to keep the invariant), and an in-tree `FxHasher` whose `write` impl matches `write_u64` for full-word inputs. End result on the issue #4 Kotlin snippets is parse-only timings of ~10 ms / ~19 ms (down from 150 ms / 730 ms full-process), and the runtime testsuite still passes 357/357. No public API removals, but the new caches and per-parse reset behavior are observable enough through `BaseParser` to warrant a minor bump rather than a patch.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd52634fb1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| license = "BSD-3-Clause" | ||
| keywords = ["antlr", "antlr4", "parser", "lexer", "runtime"] | ||
| categories = ["parser-implementations", "development-tools"] | ||
| keywords = [ |
There was a problem hiding this comment.
Reduce package keywords to crates.io maximum of five
This manifest now declares 9 package.keywords, but crates.io accepts at most 5 keywords per crate, so cargo publish/the publish workflow will fail during metadata validation before release. Because this commit also bumps the version to 0.3.0 and updates publishing automation, leaving the keyword count above the limit blocks shipping that release.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d35bf0710
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - name: Authenticate with crates.io | ||
| id: auth | ||
| uses: rust-lang/crates-io-auth-action@b7e9a28eded4986ec6b1fa40eeee8f8f165559ec # v1 | ||
| uses: rust-lang/crates-io-auth-action@v1.0.4 |
There was a problem hiding this comment.
Pin publish workflow actions to immutable SHAs
This change replaces a previously SHA-pinned action with a mutable tag in the publish pipeline, which runs with crates.io publishing privileges. In .github/workflows/publish.yml, using @v1.0.4 (and similarly @v6 elsewhere in this commit) allows upstream retags or compromised releases to change executed code without any change in this repository, undermining reproducibility and increasing supply-chain risk for releases. Keep these uses: entries pinned to full commit SHAs (with version comments) as before.
Useful? React with 👍 / 👎.
Summary
Closes #4. The fast recognizer used to speculatively walk every alternative at every multi-alt state and dedupe outcomes after the fact; on the published Kotlin snippets that meant tens of millions of redundant ATN edge traversals. This PR adds an SLL-style look-1 prefilter — the same primitive ANTLR's adaptive prediction uses in its first phase — plus a few allocation-side wins around it:
DecisionLookahead/TransitionLookSet). At multi-alt non-rule-start states we cache, per outgoing transition, the FIRST set reachable through that transition's epsilon closure (Rule transitions also fold in the follow-state if the callee is nullable). On every visit we prune non-consuming transitions whose look-1 cannot accept the current lookahead. Consuming transitions (Atom/Range/Set/NotSet/Wildcard) still go through the existingmatches+recovery path so single-token deletion / insertion repairs remain reachable. The pruned transition's FIRST is still surfaced intoExpectedTokens, so failed-parse error messages stay byte-identical with the no-prefilter baseline. Cache is built lazily and shared viaRc.recovery_symbolsasRc<BTreeSet<i32>>. Speculative recursion threads the same epsilon-recovery context through hundreds of follow states; sharing one allocation lets clones reduce to a refcount bump and lets the memo key hash by pointer.state_expected_symbolscache. Removes a repeated DFS that fired on every state visit throughnext_recovery_context.Rc<FirstSet>for the FIRST cache. Hits no longer deep-copy the symbolBTreeSet.FxHasher-backedFxHashMap/FxHashSetfor the hot caches (decision lookahead, FIRST sets, state-expected, recovery interner, fast-recognize memo + visiting set). Replaces the priorBTreeMap(whose lookups compared fullBTreeSet<i32>keys) andRandomState(whose SipHash dominated tiny memo lookups). Inner short-lived sets stayBTreeSetbecause theirRandomStateinit was a measurable regression.token_type_atandconsume_indexskip the seek round-trip by reading/advancing absolute indices against the buffered token vector. The committed cursor is still moved exactly once, inparse_atn_rule, after a viable outcome is selected.fold_fast_left_recursive_boundariesearly-out when no boundary markers exist (most rule invocations have none).The dumper grew a
--iters N --timeswitch so parse-only timings can be measured without the ~10 ms process-startup overhead the issue's table called out.Measurements
Kotlin parity snippets, parse-only (
--iters 5 --time):antlr4-python3-runtime(issue baseline)01-nested-types.kt02-dataframe.ktRust now matches Python on the small snippet and beats it ~17x on the larger one.
Test plan
cargo test --release— all 54 unit tests passcargo run --release --bin antlr4-runtime-testsuite— full sweep: 357 passed, 0 failed, 0 skipped, 357 runtests/kotlin-parity/run.sh— Kotlin parse-tree dumps remain byte-identical toantlr4-python3-runtimecargo clippy --release— no new warnings (thedisallowed_typesallow onHashMap/HashSetis scoped to parser-internal caches that are never iterated externally)Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Documentation
Chores