Cache per-decision look-1 to skip non-viable speculative paths by tinovyatkin · Pull Request #5 · ophidiarium/antlr-rust-runtime

tinovyatkin · 2026-05-21T20:34:14Z

Summary

Closes #4. The fast recognizer used to speculatively walk every alternative at every multi-alt state and dedupe outcomes after the fact; on the published Kotlin snippets that meant tens of millions of redundant ATN edge traversals. This PR adds an SLL-style look-1 prefilter — the same primitive ANTLR's adaptive prediction uses in its first phase — plus a few allocation-side wins around it:

Per-decision look-1 cache (DecisionLookahead / TransitionLookSet). At multi-alt non-rule-start states we cache, per outgoing transition, the FIRST set reachable through that transition's epsilon closure (Rule transitions also fold in the follow-state if the callee is nullable). On every visit we prune non-consuming transitions whose look-1 cannot accept the current lookahead. Consuming transitions (Atom/Range/Set/NotSet/Wildcard) still go through the existing matches+recovery path so single-token deletion / insertion repairs remain reachable. The pruned transition's FIRST is still surfaced into ExpectedTokens, so failed-parse error messages stay byte-identical with the no-prefilter baseline. Cache is built lazily and shared via Rc.
Interned recovery_symbols as Rc<BTreeSet<i32>>. Speculative recursion threads the same epsilon-recovery context through hundreds of follow states; sharing one allocation lets clones reduce to a refcount bump and lets the memo key hash by pointer.
Per-state state_expected_symbols cache. Removes a repeated DFS that fired on every state visit through next_recovery_context.
Rc<FirstSet> for the FIRST cache. Hits no longer deep-copy the symbol BTreeSet.
FxHasher-backed FxHashMap/FxHashSet for the hot caches (decision lookahead, FIRST sets, state-expected, recovery interner, fast-recognize memo + visiting set). Replaces the prior BTreeMap (whose lookups compared full BTreeSet<i32> keys) and RandomState (whose SipHash dominated tiny memo lookups). Inner short-lived sets stay BTreeSet because their RandomState init was a measurable regression.
token_type_at and consume_index skip the seek round-trip by reading/advancing absolute indices against the buffered token vector. The committed cursor is still moved exactly once, in parse_atn_rule, after a viable outcome is selected.
fold_fast_left_recursive_boundaries early-out when no boundary markers exist (most rule invocations have none).

The dumper grew a --iters N --time switch so parse-only timings can be measured without the ~10 ms process-startup overhead the issue's table called out.

Measurements

Kotlin parity snippets, parse-only (--iters 5 --time):

Snippet	Python `antlr4-python3-runtime` (issue baseline)	Rust before (issue baseline, full-process w/ ~10 ms startup)	Rust after (parse-only)
`01-nested-types.kt`	min 0.9 ms / avg 14.9 ms	min 150.7 ms / avg 159.2 ms	min 9.5 ms / avg 10.1 ms
`02-dataframe.kt`	min 226.5 ms / avg 317.4 ms	min 732.3 ms / avg 740.3 ms	min 18.1 ms / avg 19.1 ms

Rust now matches Python on the small snippet and beats it ~17x on the larger one.

Test plan

cargo test --release — all 54 unit tests pass
cargo run --release --bin antlr4-runtime-testsuite — full sweep: 357 passed, 0 failed, 0 skipped, 357 run
tests/kotlin-parity/run.sh — Kotlin parse-tree dumps remain byte-identical to antlr4-python3-runtime
cargo clippy --release — no new warnings (the disallowed_types allow on HashMap/HashSet is scoped to parser-internal caches that are never iterated externally)

Summary by CodeRabbit

New Features
- Public token-stream helpers to read token types by absolute index and find the next visible token.
- CLI: added --iters and --time options to control repeated parses and report timing (min/avg).
Refactor
- Parser optimized for lower allocations and faster lookahead/recognition.
Bug Fixes
- CLI validates --iters (invalid → exit code 2) and errors if no iteration produced a parse tree (exit code 1).
Tests
- Added unit test for hasher behavior.
Documentation
- README and CLAUDE.md updates; removed legacy requirements doc.
Chores
- CI/workflow action updates and crate version bump.

Closes #4. The fast recognizer used to fan out every alternative at every multi-alt state, then dedupe outcomes; on the published Kotlin snippets that walked tens of millions of ATN edges. Cache one look-1 set per outgoing transition at decision states, prune non-consuming transitions whose closure cannot accept the current lookahead, and share recovery contexts and FIRST sets through `Rc` to drop the per-visit `BTreeSet` clone churn. Consuming transitions still go through the existing `matches`+recovery path so error diagnostics, single-token recovery, and the no-prefilter retry stay byte-identical with the no-cache baseline. Kotlin parity dumps remain byte-identical to `antlr4-python3-runtime`, all 357 cases of the runtime testsuite pass, and the Kotlin parse-only timings on the issue's snippets land at min 9.5 ms / avg 10.1 ms for 01-nested-types.kt and min 18.1 ms / avg 19.1 ms for 02-dataframe.kt (was 150 ms+ and 730 ms+ full-process before).

coderabbitai · 2026-05-21T20:34:27Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e96e8275-8801-40cd-a14b-832cc80b1754

📥 Commits

Reviewing files that changed from the base of the PR and between d19b5ab and 7d35bf0.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

.github/workflows/antlr-runtime-testsuite.yml
.github/workflows/ci.yml
.github/workflows/cpd.yml
.github/workflows/kotlin-parity.yml
.github/workflows/publish.yml
CLAUDE.md
Cargo.toml
README.md
docs/requirements.md
src/parser.rs

📝 Walkthrough

Walkthrough

Replaces BTree-based memoization with FxHasher-backed FxHashMap/FxHashSet, interns FIRST and recovery symbol sets as shared Rc instances, adds cursor-independent token-stream helpers, threads interned sets through the fast recognizer, and resets per-parse caches at parse entrypoints.

Changes

Fast Recognizer Memoization & Recovery Caching

Layer / File(s)	Summary
FxHasher & cache infrastructure `src/parser.rs`	Introduce `FxHasher`/`FxBuildHasher` aliases, `FxHashMap`/`FxHashSet` use in parser caches, and add BaseParser fields for state-expected caches, decision lookahead, recovery interner, and `empty_recovery_symbols`; include a unit test for `FxHasher::write`.
FIRST-set Interning via Rc `src/parser.rs`	FIRST-set cache now stores `Rc<FirstSet>`; `rule_first_set` returns shared `Rc`, recursion cycles return an empty FIRST without caching; lookahead pruning uses shared FIRST results.
Recovery symbols & expected-symbol helpers `src/parser.rs`	Add interner for `Rc<BTreeSet<i32>>`, `empty_recovery_symbols` singleton, and helpers for cached state expected symbols and cached decision lookahead used by the fast recognizer.
FastRecognizeKey & memo-key pointer identity `src/parser.rs`	`FastRecognizeRequest/Key` carry interned `Rc<BTreeSet<i32>>`; `FastRecognizeKey` `Eq`/`Hash` use pointer identity to avoid iterating sets; visiting/memo containers switch to `FxHashSet`/`FxHashMap`.
recognize_state_fast core & lookahead pruning `src/parser.rs`	Rework `recognize_state_fast` to use `fast_next_recovery_context`/`fast_recovery_expected_symbols`, apply lookahead pruning with cached decision lookahead entries, and thread `Rc`-cloned symbol sets through recursive requests.
Recovery and transition updates `src/parser.rs`	Single-token deletion/insertion/current-token-deletion recovery and non-consuming transitions updated to reuse interned `Rc` symbol sets; many recovery paths now pass `empty_recovery_symbols` when appropriate.
Token stream hot-path & wiring `src/token_stream.rs`, `src/parser.rs`	Add `CommonTokenStream::token_type_at_index` and `next_visible_after`; make parser token access cursor-independent (absolute-index reads) and update `consume_index` to advance without seeking; `BaseParser::new` is non-const and initializes caches; parse entrypoints reset per-parse caches.
Left-recursive boundary early-exit `src/parser.rs`	`fold_fast_left_recursive_boundaries` early-returns when no left-recursive boundary markers exist.
Kotlin-parity dumper timing loop `tests/kotlin-parity/dumper/src/main.rs`	Add `--iters` CLI arg, validate input, run parse in iterations loop with `Instant` timing and print min/avg parse durations; exit codes for invalid iters or no successful parse iterations.

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

ophidiarium/antlr-rust-runtime#1: Adds/changes core CommonTokenStream behavior; this PR extends token-stream API with cursor-independent helpers used by parser optimizations.

Poem

🐰 With Rc hops and FxHash cheer,
Shared sets whisper, "Stay near!"
Memo keys point, light and small,
Parsers run and clones fall.
A rabbit nods—fast paths, hurrah!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main optimization: implementing per-decision look-1 caching to reduce speculative parsing paths.
Linked Issues check	✅ Passed	All objectives from issue `#4` are met: per-decision look-1 cache implemented, FIRST-set prefilter optimized with Rc caching, recovery/expected-symbol interning added, fast-recognizer memoization improved, and measurements show significant performance gains (9.5-19.1 ms vs previous ~150-732 ms).
Out of Scope Changes check	✅ Passed	All changes directly support the core objective of improving per-decision look-1 caching and reducing speculative ATN traversal; the dumper timing additions are supporting infrastructure for validating the performance improvements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/issue-4-decision-lookahead-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-21T20:34:34Z

Copy/Paste Detection

Found 8 duplication(s) across 3 changed Rust file(s) (threshold: 100 tokens).

Show duplications

Found a 22 line (116 tokens) duplication in the following files:

Starting at line 2306 of src/parser.rs
Starting at line 3065 of src/parser.rs

            recovery_symbols: Rc::clone(&recovery_symbols),
            recovery_state,
        };
        if let Some(outcomes) = memo.get(&key) {
            return outcomes.clone();
        }

        let visit_key = key.clone();
        if !visiting.insert(visit_key.clone()) {
            return Vec::new();
        }

        let Some(state) = atn.state(state_number) else {
            visiting.remove(&visit_key);
            return Vec::new();
        };
        let next_decision_start_index = if starts_prediction_decision(state) {
            Some(index)
        } else {
            decision_start_index
        };
        let (epsilon_recovery_symbols, epsilon_recovery_state) =

Found a 15 line (114 tokens) duplication in the following files:

Starting at line 4967 of src/parser.rs
Starting at line 5131 of src/parser.rs

    fn parser_matches_token_and_reports_mismatch() {
        let source = Source {
            tokens: vec![
                CommonToken::new(1).with_text("x"),
                CommonToken::eof("parser-test", 1, 1, 1),
            ],
            index: 0,
        };
        let data = RecognizerData::new(
            "Mini.g4",
            Vocabulary::new([None, Some("'x'")], [None, Some("X")], [None::<&str>, None]),
        );
        let mut parser = BaseParser::new(CommonTokenStream::new(source), data);
        assert_eq!(
            parser.match_token(1).expect("token 1 should match").text(),

Found a 32 line (113 tokens) duplication in the following files:

Starting at line 3136 of src/parser.rs
Starting at line 3197 of src/parser.rs

                        outcomes.extend(
                            self.recognize_state(
                                atn,
                                RecognizeRequest {
                                    state_number: *target,
                                    stop_state,
                                    index,
                                    rule_start_index,
                                    decision_start_index: next_decision_start_index,
                                    init_action_rules,
                                    predicates,
                                    rule_args,
                                    member_actions,
                                    return_actions,
                                    local_int_arg,
                                    member_values: member_values.clone(),
                                    return_values: return_values.clone(),
                                    rule_alt_number: next_alt_number,
                                    track_alt_numbers,
                                    consumed_eof,
                                    precedence,
                                    depth: depth + 1,
                                    recovery_symbols: epsilon_recovery_symbols.clone(),
                                    recovery_state: epsilon_recovery_state,
                                },
                                visiting,
                                memo,
                                expected,
                            )
                            .into_iter()
                            .map(|mut outcome| {
                                prepend_decision(&mut outcome, decision);

Found a 14 line (110 tokens) duplication in the following files:

Starting at line 5175 of src/parser.rs
Starting at line 5198 of src/parser.rs

    fn outcome_ties_keep_later_non_recursive_alternative() {
        let first = RecognizeOutcome {
            index: 1,
            consumed_eof: false,
            alt_number: 0,
            member_values: BTreeMap::new(),
            return_values: BTreeMap::new(),
            diagnostics: Vec::new(),
            decisions: Vec::new(),
            actions: vec![ParserAction::new(1, 0, 0, None)],
            nodes: vec![RecognizedNode::Token { index: 0 }],
        };
        let second = RecognizeOutcome {
            actions: vec![ParserAction::new(2, 0, 0, None)],

Found a 13 line (109 tokens) duplication in the following files:

Starting at line 4967 of src/parser.rs
Starting at line 5106 of src/parser.rs
Starting at line 5131 of src/parser.rs

    fn parser_matches_token_and_reports_mismatch() {
        let source = Source {
            tokens: vec![
                CommonToken::new(1).with_text("x"),
                CommonToken::eof("parser-test", 1, 1, 1),
            ],
            index: 0,
        };
        let data = RecognizerData::new(
            "Mini.g4",
            Vocabulary::new([None, Some("'x'")], [None, Some("X")], [None::<&str>, None]),
        );
        let mut parser = BaseParser::new(CommonTokenStream::new(source), data);

Found a 25 line (108 tokens) duplication in the following files:

Starting at line 2379 of src/parser.rs
Starting at line 2414 of src/parser.rs

                | Transition::Action { target, .. } => {
                    let boundary = left_recursive_boundary(atn, state, *target);
                    outcomes.extend(
                        self.recognize_state_fast(
                            atn,
                            FastRecognizeRequest {
                                state_number: *target,
                                stop_state,
                                index,
                                rule_start_index,
                                decision_start_index: next_decision_start_index,
                                precedence,
                                depth: depth + 1,
                                recovery_symbols: Rc::clone(&epsilon_recovery_symbols),
                                recovery_state: epsilon_recovery_state,
                            },
                            visiting,
                            memo,
                            expected,
                        )
                        .into_iter()
                        .map(|mut outcome| {
                            if let Some(rule_index) = boundary {
                                outcome.nodes.prepend(Rc::new(
                                    FastRecognizedNode::LeftRecursiveBoundary { rule_index },

Found a 20 line (107 tokens) duplication in the following files:

Starting at line 1465 of src/parser.rs
Starting at line 1715 of src/parser.rs

        let start_state = atn
            .rule_to_start_state()
            .get(rule_index)
            .copied()
            .ok_or_else(|| {
                AntlrError::Unsupported(format!("rule {rule_index} has no start state"))
            })?;
        let stop_state = atn
            .rule_to_stop_state()
            .get(rule_index)
            .copied()
            .filter(|state| *state != usize::MAX)
            .ok_or_else(|| {
                AntlrError::Unsupported(format!("rule {rule_index} has no stop state"))
            })?;

        let start_index = self.current_visible_index();
        self.clear_prediction_diagnostics();
        self.reset_per_parse_caches();
        let first_pass = self.fast_recognize_top(atn, start_state, stop_state, start_index);

Found a 15 line (103 tokens) duplication in the following files:

Starting at line 1602 of src/parser.rs
Starting at line 4124 of src/parser.rs

            FastRecognizedNode::MissingToken {
                token_type,
                at_index,
                text,
            } => {
                let current = self.token_at(*at_index);
                let token = CommonToken::new(*token_type)
                    .with_text(text)
                    .with_span(usize::MAX, usize::MAX)
                    .with_position(
                        current.as_ref().map(Token::line).unwrap_or_default(),
                        current.as_ref().map(Token::column).unwrap_or_default(),
                    );
                Ok(ParseTree::Error(ErrorNode::new(token)))
            }

Resolves the `clippy::excessive-nesting` failure CI hit on PR #5: the inline pruning block was eight levels deep once the `match expected.index` branch was added. Lift the body into `should_skip_via_lookahead` plus `record_pruned_transition_expected` so the recognize loop reads the prefilter as a single boolean check. Behavior is unchanged — confirmed by re-running `cargo test --release` and the Kotlin parity dumps; parse-only timings stay at ~10 ms / ~19 ms.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa4beaf327

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/parser.rs`:
- Around line 1157-1186: FastRecognizeKey's PartialEq falls back to value
equality on recovery_symbols while Hash uses only pointer identity (Rc::as_ptr),
violating Hash/Eq; fix by ensuring every Rc<BTreeSet<i32>> that can become
recovery_symbols is interned so identical sets share the same Rc pointer: modify
the producer(s) of these Rc values (notably cached_state_expected_symbols and
fast_next_recovery_context) to run their BTreeSet<i32> through the same interner
used elsewhere (return the interned Rc) before storing or returning, so
Pointer-equality in PartialEq matches the pointer-only Hash and the Hash/Eq
contract holds.
- Around line 2289-2317: Extract the nested logic that updates the diagnostic
accumulator into a helper function (suggested name:
record_pruned_expected_symbols) that takes &mut ExpectedTokens, index: usize,
and symbols: &BTreeSet<i32>, and moves the match/merge logic (checks for empty
set, matching expected.index, extend vs clone_from) into it; then replace the
deeply nested block inside the prune_non_consuming branch (the code that
inspects lookahead_filter, entry.transitions.get(transition_index), and
set.symbols/nullability checks) by calling record_pruned_expected_symbols(&mut
expected, index, &set.symbols) before the continue; also reuse the same helper
to replace the duplicated block in the Rule transition branch to eliminate the
excessive nesting reported by clippy.

In `@tests/kotlin-parity/dumper/src/main.rs`:
- Around line 62-68: The "--iters" arm currently unconditionally consumes the
next token via args.next() and silently falls back to 1 on parse failure; change
it to explicitly check for a present non-flag token and report an error on
missing/invalid values. Make the iterator a Peekable (e.g., args =
args.peekable()), then in the "--iters" match use args.peek() to ensure the next
item exists and does not start with '-' before calling args.next(); attempt to
parse with value.parse::<usize>() and on Err or if missing, print a clear error
(or return Err) and exit (or propagate), instead of silently defaulting to 1;
reference the "--iters" match arm and the iters variable when making this
change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ffd0411-b62e-4202-8aec-9b7a22023718

📥 Commits

Reviewing files that changed from the base of the PR and between a6646fa and fa4beaf.

📒 Files selected for processing (3)

src/parser.rs
src/token_stream.rs
tests/kotlin-parity/dumper/src/main.rs

gemini-code-assist

Code Review

This pull request introduces significant performance optimizations to the ANTLR parser runtime, focusing on reducing allocations and redundant computations during speculative recognition. Key changes include the introduction of a fast FxHasher, interning of recovery symbol sets using Rc, and the implementation of several internal caches for FIRST sets and decision lookahead. The recognizer now utilizes lookahead-based pruning and optimized token stream access to minimize cursor movement. Review feedback identifies a critical compilation error in the hasher's integer casting, a Hash trait contract violation in FastRecognizeKey, and potential memory issues when expanding large token ranges. Suggestions were also made to use the Entry API for more efficient cache lookups and BTreeMap to ensure deterministic behavior where necessary.

I am having trouble creating individual review comments. Click here to see my feedback.

src/parser.rs (47)

The method cast_unsigned does not exist on the i32 primitive type in Rust. This will cause a compilation error. You should use value as u32 to bit-cast the integer to its unsigned representation for hashing.

        self.write_u64(u64::from(value as u32));

src/parser.rs (1157-1186)

The Hash implementation for FastRecognizeKey uses the pointer address of recovery_symbols (Rc::as_ptr), while the PartialEq implementation allows for deep equality comparison (self.recovery_symbols == other.recovery_symbols). This violates the Hash trait contract where a == b must imply hash(a) == hash(b). Since you are interning these sets, you should rely solely on pointer equality in PartialEq for consistency and performance, aligning with the repository's strategy for deduplicating objects with heap-allocated fields.

impl PartialEq for FastRecognizeKey {
    fn eq(&self, other: &Self) -> bool {
        self.state_number == other.state_number
            && self.stop_state == other.stop_state
            && self.index == other.index
            && self.rule_start_index == other.rule_start_index
            && self.decision_start_index == other.decision_start_index
            && self.precedence == other.precedence
            && self.recovery_state == other.recovery_state
            && Rc::ptr_eq(&self.recovery_symbols, &other.recovery_symbols)
    }
}

References

When deduplicating objects with heap-allocated fields, bucket candidates by simple keys and compare complex fields via borrowed slices to minimize clones and allocations.

src/parser.rs (725)

Collecting a large range into a BTreeSet can be extremely expensive in terms of both time and memory. For Unicode grammars, a single Transition::Range could span over a million code points (e.g., 0..=0x10FFFF), leading to massive allocations. Since TransitionLookSet is designed for a look-1 prefilter, consider storing the range itself or using an IntervalSet to perform membership checks without expanding the full set of symbols.

src/parser.rs (3545-3555)

This method performs a get followed by an insert if the key is missing, resulting in two hash lookups. You can use the Entry API to perform this operation in a single lookup. Additionally, if this cache is part of a deterministic state index, consider using a BTreeMap instead of a HashMap to ensure stable behavior.

    fn cached_state_expected_symbols(
        &mut self,
        atn: &Atn,
        state_number: usize,
    ) -> Rc<BTreeSet<i32>> {
        self.state_expected_cache
            .entry(state_number)
            .or_insert_with(|| {
                let symbols = state_expected_symbols(atn, state_number);
                if symbols.is_empty() {
                    Rc::clone(&self.empty_recovery_symbols)
                } else {
                    Rc::new(symbols)
                }
            })
            .clone()
    }

References

To avoid performance bottlenecks in hot paths, use a membership index (like BTreeSet) alongside ordered vectors to avoid linear scans while preserving iteration order.
Use ordered maps (e.g., BTreeMap) instead of hash maps when a deterministic state index is required to preserve stable behavior or traces.

src/parser.rs (3572-3577)

The interner performs a get followed by an insert, leading to redundant hash lookups. Using the Entry API allows you to check and insert in one pass. If deterministic state indexing is required for this interner, BTreeMap is preferred over HashMap.

        let candidate = Rc::new(set);
        self.recovery_symbols_intern
            .entry(Rc::clone(&candidate))
            .or_insert(candidate)
            .clone()

References

Use ordered maps (e.g., BTreeMap) instead of hash maps when a deterministic state index is required to preserve stable behavior or traces.

…istent PR #5 review (chatgpt-codex-connector): FastRecognizeKey compared recovery_symbols by pointer-or-content but hashed only the pointer, which violates the Hash/Eq contract any time two equal sets were sourced from different allocations. The fix is to make the source deterministic — route cached_state_expected_symbols through intern_recovery_symbols so equal expected-symbol sets always share one Rc, the same invariant fast_next_recovery_context already relied on. Comment threads on both helpers note that every recovery_symbols Rc that ends up in a FastRecognizeKey must come from intern_recovery_symbols (or the empty singleton) so future callers do not reintroduce the divergence. Verified: cargo test, kotlin-parity dumps (still byte-identical), parse-only timings unchanged at ~10ms / ~19ms.

greptile-apps · 2026-05-21T20:42:02Z

Confidence Score: 5/5

Safe to merge — the parser logic and Hash/Eq invariants are sound, all 357 conformance tests pass, and the only finding is a CI hardening regression in the workflow files.

The Rust source changes are well-reasoned and correct: the FastRecognizeKey pointer-identity Hash/Eq contract holds because every Rc<BTreeSet> entering a key is routed through the interner; token_type_at_index and next_visible_after are semantically equivalent to the old seek-based approach given the maintained visible-index invariant; and reset_per_parse_caches correctly clears all four caches at each parse entry. The only concern is that the GitHub Actions workflows now use floating version tags instead of the pinned commit SHAs they previously used — a CI supply chain hardening regression — but this does not affect published crate correctness.

.github/workflows/*.yml — all five workflow files switched from SHA-pinned actions to floating version tags, which should be revisited for supply chain hygiene.

Security Review

Supply chain / CI hardening regression (.github/workflows/*.yml): All five workflow files replaced immutable commit-SHA action pins with floating version tags (@v6, @v5, @v1.0.4). A compromised or silently redirected tag would allow arbitrary code execution in CI — including the publish.yml workflow that authenticates to crates.io and runs cargo publish. No other security issues were identified in the Rust source changes.

_{Reviews (7): Last reviewed commit: "chore: fix back" | Re-trigger Greptile}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3de2e97e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

PR #5 review (coderabbitai): the dumper's --iters arm consumed the next token unconditionally and fell back to 1 on parse failure, so `--iters --time` silently swallowed --time as the value, parsed it as 1, and dropped the timing flag. Now exit 2 with a clear message when the value is missing, non-numeric, or below 1, matching the dumper's existing handling for missing --input. Verified all four paths: --iters 3 -> uses 3 --iters -> "missing value for --iters <n>", exit 2 --iters --time -> "invalid --iters value: --time ...", exit 2 --iters 0 -> "invalid --iters value: 0 ...", exit 2

Addresses three review threads on PR #5: * Codex P2 (3284250583): `intern_recovery_symbols` accumulated every recovery union into `recovery_symbols_intern` for the parser's lifetime. Long-lived parsers processing many inputs (especially malformed ones that explore wide recovery sets) would grow it monotonically. Add `reset_recovery_symbol_caches`, called from both `parse_atn_rule` and `parse_atn_rule_with_runtime_options` alongside `clear_prediction_diagnostics`. The state-expected cache is cleared in lockstep because its values are produced through the interner and pinning one without the other would re-break the `FastRecognizeKey` identity invariant. * Greptile P1 (3284250809): drop the content-equality fallback from `FastRecognizeKey::eq`. Now compares `recovery_symbols` purely by `Rc::ptr_eq` (matching the pointer-only `Hash` impl) and `debug_assert!`s that two `Rc`s can only differ if their contents differ — so any future caller that forgets to intern produces a loud test failure instead of a silent `Hash`/`Eq` divergence. * Greptile P2 (3284250890): align `FxHasher::write` with the typed methods by folding bytes 8 at a time as little-endian `u64`s. A future key type whose `Hash` impl funnels through `write` (e.g. `String`, `[u8; N]`) now hashes consistently with `write_u64`. Added a unit test that pins this invariant. Verified: `cargo test` (debug + release, 36 lib tests pass — debug asserts active in debug mode), kotlin-parity dumps unchanged, parse-only timings ~9 ms / ~17 ms.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/parser.rs`:
- Around line 3983-3999: The reset_recovery_symbol_caches method currently
clears recovery_symbols_intern and state_expected_cache but misses other
ATN-dependent caches; update reset_recovery_symbol_caches (in the BaseParser
implementation) to also clear first_set_cache and decision_lookahead_cache so
that reusing a parser with a different &Atn cannot reuse stale FIRST/lookahead
entries, or alternatively ensure those caches are keyed/scoped by the AtN
identity before reuse.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba211a26-d0f5-4b52-a4b2-e32422e577da

📥 Commits

Reviewing files that changed from the base of the PR and between fa4beaf and d19b5ab.

📒 Files selected for processing (2)

src/parser.rs
tests/kotlin-parity/dumper/src/main.rs

…LAUDE.md Captures the reproduce steps a fresh checkout needs: ANTLR jar + grammars-v4 sparse-checkout for the Kotlin parity perf benchmark, the upstream antlr4 checkout for the 357-case runtime testsuite, plus the dumper's --iters/--time stopwatch flow and the "rm -rf target/antlr-runtime-testsuite/*" tip for recovering from a killed run. Also pins the CI clippy invocation so local lint runs stay in sync.

PR #5 review (coderabbitai): `BaseParser::parse_atn_rule` accepts an arbitrary `&Atn` per call, so the same parser can legally be driven against different grammars in sequence. The previous reset only cleared `recovery_symbols_intern` and `state_expected_cache`, but `first_set_cache` and `decision_lookahead_cache` are also keyed by raw ATN coordinates (state numbers, rule indexes) and would silently hand back entries from a previous ATN — pruning lookahead against the wrong transitions. Renamed the helper to `reset_per_parse_caches` to reflect the broader scope and documented the four caches it now covers, calling out which two have to clear in lockstep to preserve `FastRecognizeKey`'s pointer-based hashing invariant. Verified: cargo test (debug + release, 36 lib tests), kotlin parity unchanged, parse-only timings still ~10ms / ~19ms.

Substantial recognizer-internals overhaul shipped on this branch: SLL look-1 prefilter at multi-alt states with per-state caching, ATN-keyed caches reset per parse so the same parser can be driven against different grammars, `Rc<BTreeSet<i32>>` interning for recovery symbols with pointer-keyed `FastRecognizeKey` hashing (debug-asserted to keep the invariant), and an in-tree `FxHasher` whose `write` impl matches `write_u64` for full-word inputs. End result on the issue #4 Kotlin snippets is parse-only timings of ~10 ms / ~19 ms (down from 150 ms / 730 ms full-process), and the runtime testsuite still passes 357/357. No public API removals, but the new caches and per-parse reset behavior are observable enough through `BaseParser` to warrant a minor bump rather than a patch.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd52634fb1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T23:16:08Z

 license = "BSD-3-Clause"
-keywords = ["antlr", "antlr4", "parser", "lexer", "runtime"]
-categories = ["parser-implementations", "development-tools"]
+keywords = [


Reduce package keywords to crates.io maximum of five

This manifest now declares 9 package.keywords, but crates.io accepts at most 5 keywords per crate, so cargo publish/the publish workflow will fail during metadata validation before release. Because this commit also bumps the version to 0.3.0 and updates publishing automation, leaving the keyword count above the limit blocks shipping that release.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d35bf0710

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T23:53:23Z

      - name: Authenticate with crates.io
        id: auth
-        uses: rust-lang/crates-io-auth-action@b7e9a28eded4986ec6b1fa40eeee8f8f165559ec # v1
+        uses: rust-lang/crates-io-auth-action@v1.0.4


Pin publish workflow actions to immutable SHAs

This change replaces a previously SHA-pinned action with a mutable tag in the publish pipeline, which runs with crates.io publishing privileges. In .github/workflows/publish.yml, using @v1.0.4 (and similarly @v6 elsewhere in this commit) allows upstream retags or compromised releases to change executed code without any change in this repository, undermining reproducibility and increasing supply-chain risk for releases. Keep these uses: entries pinned to full commit SHAs (with version comments) as before.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/parser.rs Outdated

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/parser.rs

Comment thread src/parser.rs Outdated

Comment thread tests/kotlin-parity/dumper/src/main.rs

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/parser.rs

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/parser.rs Outdated

Comment thread src/parser.rs Outdated

tinovyatkin added 2 commits May 21, 2026 23:15

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/parser.rs Outdated

tinovyatkin added 7 commits May 22, 2026 00:36

chore: update keywords

cc4573b

chore: update actions

b048756

chore: try arm runner

3afb7f7

chore: try GraalVM

dd52634

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Comment thread Cargo.toml Outdated

chore: fix back

7d35bf0

tinovyatkin merged commit cc7e48a into main May 21, 2026
5 of 6 checks passed

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

coderabbitai Bot mentioned this pull request May 25, 2026

perf: lexer + recognizer micro-optimizations on the C# hot path #10

Merged

5 tasks

Conversation

tinovyatkin commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measurements

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Copy/Paste Detection

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

src/parser.rs (47)

src/parser.rs (1157-1186)

src/parser.rs (725)

src/parser.rs (3545-3555)

src/parser.rs (3572-3577)

Uh oh!

greptile-apps Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Confidence Score: 5/5

Security Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

tinovyatkin commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

greptile-apps Bot commented May 21, 2026 •

edited

Loading