perf: lexer + recognizer micro-optimizations on the C# hot path#10
Conversation
Targeted micro-optimizations to the existing recursive recognizer: - IntervalSet::contains uses binary search (was linear) - Loop-collapse single-epsilon transition chains in recognize_state_fast - Strip recovery/rule-context fields from memo key in pass 1 - Skip memoize for single-transition states - u64-packed visit_id for cycle guard - Conditional cycle guard (multi-alt only) - Single hash op for FIRST set lookup at rule transitions - Cross-parse FIRST-set + decision-lookahead caches keyed by static ATN pointer - Rc<[FastRecognizeOutcome]> in memo so writes reuse Vec allocation Results vs baseline: - Kotlin: 32-53% faster, now 5-11x faster than Go (was 3-7x) - C#: 47-65% faster, now 1.8-4.0x slower than Go (was 3-7x slower) Still doesn't beat Go on C#; the next commit replaces the recursive backtracker with an adaptive-LL ATN simulator + DFA cache.
Lightweight FxHash-style hasher used by lexer DFA-trace maps and the epsilon_closure seen set to avoid SipHash overhead on the hot lexer path. Subsequent lexer commits switch BTreeMap/BTreeSet fields to FxHashMap/FxHashSet built on this hasher.
Lookup hot in C# parsing showed cached_lexer_dfa_state cloning the entire Vec of LexerDfaConfigKey on every lexer step. Wrapping the cached state in Rc makes lookup an Rc bump instead of a deep clone.
Profile showed BTreeSet<LexerConfig> insertions were ~10% of inclusive time on C# parsing. Lexer closure walks dedupe many configs per token; an O(1) hash lookup beats the O(log n) tree comparisons against Vec<usize> stack fields. mono-codegen.cs: 62ms -> 42ms (32% faster) mono-statement.cs: 332ms -> 250ms (25% faster)
The cached_lexer_dfa_transition lookup is the inner-loop call inside the lexer DFA walker; lookup-heavy hash maps win over O(log n) tree maps.
Adds OUTCOMES_RETURN_0/1/N counters to recognize_state_fast and an opt-in dump on ANTLR_PERF_DUMP=1 from fast_recognize_top, so future investigations can see the 0/1/many split without rebuilding.
The recognize_state_fast loop already collapses single-Epsilon chains. Extending it to single-Atom/Range/Set/NotSet/Wildcard transitions saves ~17K recursive calls per C# parse (atom transitions are common in flat grammars). The inline path tracks consumed token indices so token nodes still appear in the resulting parse tree. Limited to pass 1 (no recovery); the recovery path needs the existing no-match/expected-symbols machinery.
The follow-state outcome loop unconditionally cloned and appended the child's diagnostics; on pass 1 (no recovery) child_diagnostics is always empty, so the clone allocated a Vec we then immediately replaced. Skip the dance when the source vec is empty.
Each NodeList::Cons created from an Empty list paid for an Rc::new (NodeList::Empty) allocation. Sharing a thread-local Rc<NodeList::Empty> means Empty -> Cons transitions are now a cheap Rc::clone instead.
The decision_lookahead_cache field on BaseParser was unused; the cache was always taken from the shared thread-local SharedAtnCache, which costs a borrow/entry per multi-trans visit. Use the per-instance cache as a fast path so hot decisions skip the RefCell+hashmap dance.
When a multi-trans decision's FIRST sets are disjoint and no transition is nullable, the lookahead deterministically picks one alternative. Commit to that alt directly and skip the per-transition filter probe loop. Same behavior, fewer probes per visit on dense decisions.
…e lists dedupe_clean_fast_outcomes is called per multi-trans visit on pass 1; with typical N=2-4 outcomes the BTreeSet allocation + balancing was the dominant cost. An inline 8-entry scan avoids the heap allocation entirely for the common case while preserving outcome order for downstream selection logic.
The ll1_unique_alt scan was repeated per visit; now its result is memoized per parser instance keyed by (state_number, lookahead_token). The shared SharedAtnCache field is reserved for cross-parse warming (future commit) — the per-instance cache wins on hot decisions within a single parse already.
…> alloc Most outcomes carry a single node; using NodeList::One(node) directly avoids the Rc::new(NodeList::Empty) allocation that the Cons-from-Empty case used to make. Saves one heap alloc per single-node prepend on the hot path.
The inline-scan optimization only checked the first 8 distinct keys. Beyond that, new outcomes silently passed through without being deduplicated against each other — masking duplicates on grammars with many ambiguous alts (e.g. ktor-openapi Kotlin parse). The duplicates explode speculative work one step up the recursion. Promote to a heap Vec for the overflow case so all kept entries continue to participate in dedup. Also folds in two clippy fixes that surfaced during rebase: re-using a doc comment and collapsing a nested if into a match guard.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds PredictionFxHasher and FxHash aliases, introduces an opt-in ChangesPerformance optimization and instrumentation
Sequence Diagram(s) sequenceDiagram
participant Client as Parser::parse_atn_rule
participant Recognize as recognize_state_fast
participant Shared as SHARED_ATN_CACHES
participant LL1 as ll1_decision_cache
participant Memo as memo_cache
Client->>Recognize: invoke fast-recognition
Recognize->>Shared: with_shared_atn_caches -> check FIRST/lookahead
alt LL(1) deterministic
Recognize->>LL1: consult ll1_decision_cache -> select unique alt
else fallback
Recognize->>Memo: check memo (Rc<[FastRecognizeOutcome]>)
Memo-->>Recognize: return cached outcomes
end
Recognize->>Memo: memoize outcomes as Rc slice
Recognize-->>Client: return recognition outcome(s)
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Copy/Paste DetectionFound 10 duplication(s) across 5 changed Rust file(s) (threshold: 100 tokens). Show duplicationsFound a 20 line (125 tokens) duplication in the following files:
pub fn next_token_with_hooks<I, F, A, P, E>(
lexer: &mut BaseLexer<I, F>,
atn: &Atn,
mut custom_action: A,
mut semantic_predicate: P,
mut accept_adjuster: E,
) -> CommonToken
where
I: CharStream,
F: TokenFactory,
A: FnMut(&mut BaseLexer<I, F>, LexerCustomAction),
P: FnMut(&BaseLexer<I, F>, LexerPredicate) -> bool,
E: FnMut(&mut BaseLexer<I, F>, i32, usize),
{
next_token_with_hooks_impl(
lexer,
atn,
&mut custom_action,
&mut semantic_predicate,
&mut accept_adjuster,Found a 20 line (120 tokens) duplication in the following files:
let Some(state) = atn.state(config.state) else {
continue;
};
for transition in &state.transitions {
if !transition.matches(symbol, MIN_CHAR_VALUE, MAX_CHAR_VALUE) {
continue;
}
let mut advanced = config.clone();
set_config_state(atn, &mut advanced, transition.target());
if symbol == EOF {
advanced.consumed_eof = true;
} else {
advanced.position += 1;
}
next.push(advanced);
}
}
let closure = epsilon_closure(lexer, atn, next, semantic_predicate);
let target_has_semantic_context = closure.has_semantic_context;Found a 30 line (120 tokens) duplication in the following files:
let boundary = left_recursive_boundary(atn, state, *target);
outcomes.extend(
self.recognize_state_fast(
atn,
FastRecognizeRequest {
state_number: *target,
stop_state,
index,
rule_start_index,
decision_start_index: next_decision_start_index,
precedence,
depth: depth + 1,
recovery_symbols: Rc::clone(&epsilon_recovery_symbols),
recovery_state: epsilon_recovery_state,
},
visiting,
memo,
expected,
)
.into_iter()
.map(|mut outcome| {
if let Some(rule_index) = boundary {
outcome.nodes.prepend(Rc::new(
FastRecognizedNode::LeftRecursiveBoundary { rule_index },
));
}
outcome
}),
);
}Found a 15 line (114 tokens) duplication in the following files:
fn parser_matches_token_and_reports_mismatch() {
let source = Source {
tokens: vec![
CommonToken::new(1).with_text("x"),
CommonToken::eof("parser-test", 1, 1, 1),
],
index: 0,
};
let data = RecognizerData::new(
"Mini.g4",
Vocabulary::new([None, Some("'x'")], [None, Some("X")], [None::<&str>, None]),
);
let mut parser = BaseParser::new(CommonTokenStream::new(source), data);
assert_eq!(
parser.match_token(1).expect("token 1 should match").text(),Found a 32 line (113 tokens) duplication in the following files:
outcomes.extend(
self.recognize_state(
atn,
RecognizeRequest {
state_number: *target,
stop_state,
index,
rule_start_index,
decision_start_index: next_decision_start_index,
init_action_rules,
predicates,
rule_args,
member_actions,
return_actions,
local_int_arg,
member_values: member_values.clone(),
return_values: return_values.clone(),
rule_alt_number: next_alt_number,
track_alt_numbers,
consumed_eof,
precedence,
depth: depth + 1,
recovery_symbols: epsilon_recovery_symbols.clone(),
recovery_state: epsilon_recovery_state,
},
visiting,
memo,
expected,
)
.into_iter()
.map(|mut outcome| {
prepend_decision(&mut outcome, decision);Found a 14 line (110 tokens) duplication in the following files:
fn outcome_ties_keep_later_non_recursive_alternative() {
let first = RecognizeOutcome {
index: 1,
consumed_eof: false,
alt_number: 0,
member_values: BTreeMap::new(),
return_values: BTreeMap::new(),
diagnostics: Vec::new(),
decisions: Vec::new(),
actions: vec![ParserAction::new(1, 0, 0, None)],
nodes: vec![RecognizedNode::Token { index: 0 }],
};
let second = RecognizeOutcome {
actions: vec![ParserAction::new(2, 0, 0, None)],Found a 13 line (109 tokens) duplication in the following files:
fn parser_matches_token_and_reports_mismatch() {
let source = Source {
tokens: vec![
CommonToken::new(1).with_text("x"),
CommonToken::eof("parser-test", 1, 1, 1),
],
index: 0,
};
let data = RecognizerData::new(
"Mini.g4",
Vocabulary::new([None, Some("'x'")], [None, Some("X")], [None::<&str>, None]),
);
let mut parser = BaseParser::new(CommonTokenStream::new(source), data);Found a 19 line (106 tokens) duplication in the following files:
let start_state = atn
.rule_to_start_state()
.get(rule_index)
.copied()
.ok_or_else(|| {
AntlrError::Unsupported(format!("rule {rule_index} has no start state"))
})?;
let stop_state = atn
.rule_to_stop_state()
.get(rule_index)
.copied()
.filter(|state| *state != usize::MAX)
.ok_or_else(|| {
AntlrError::Unsupported(format!("rule {rule_index} has no stop state"))
})?;
let start_index = self.current_visible_index();
self.clear_prediction_diagnostics();
self.reset_per_parse_caches();Found a 16 line (104 tokens) duplication in the following files:
FastRecognizedNode::Rule {
rule_index,
invoking_state,
start_index,
stop_index,
children,
} => {
let mut context = ParserRuleContext::new(*rule_index, *invoking_state);
if let Some(token) = self.token_at(*start_index) {
context.set_start(token);
}
if let Some(token) = stop_index.and_then(|index| self.token_at(index)) {
context.set_stop(token);
}
if children.has_left_recursive_boundary() {
let folded = fold_fast_left_recursive_boundaries(children.to_vec());Found a 15 line (103 tokens) duplication in the following files:
FastRecognizedNode::MissingToken {
token_type,
at_index,
text,
} => {
let current = self.token_at(*at_index);
let token = CommonToken::new(*token_type)
.with_text(text)
.with_span(usize::MAX, usize::MAX)
.with_position(
current.as_ref().map(Token::line).unwrap_or_default(),
current.as_ref().map(Token::column).unwrap_or_default(),
);
Ok(ParseTree::Error(ErrorNode::new(token)))
} |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cfd5cd9692
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations to the lexer and parser. Key changes include replacing standard collections with custom-hashed versions (FxHashMap/FxHashSet), optimizing IntervalSet lookups with binary search, and implementing a thread-local cache for ATN-related computations to share data across parses. The recognize_state_fast function was refactored to reduce recursion through loop-unrolling of epsilon chains and an LL(1) fast path. Feedback highlights a potential bug where the new ll1_decision_cache is not reset between parses, safety concerns regarding the use of raw pointers as global cache keys, and redundant logic in FIRST set retrieval.
| fn with_shared_first_set_cache<R>( | ||
| atn: &Atn, | ||
| f: impl FnOnce(&mut FirstSetCache) -> R, | ||
| ) -> R { | ||
| SHARED_ATN_CACHES.with(|cell| { | ||
| let key = std::ptr::from_ref::<Atn>(atn) as usize; | ||
| let mut map = cell.borrow_mut(); | ||
| let cache = map.entry(key).or_default(); | ||
| f(&mut cache.first_set) | ||
| }) | ||
| } | ||
|
|
||
| fn with_shared_atn_caches<R>( | ||
| atn: &Atn, | ||
| f: impl FnOnce(&mut SharedAtnCache) -> R, | ||
| ) -> R { | ||
| SHARED_ATN_CACHES.with(|cell| { | ||
| let key = std::ptr::from_ref::<Atn>(atn) as usize; | ||
| let mut map = cell.borrow_mut(); | ||
| let cache = map.entry(key).or_default(); | ||
| f(cache) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Using raw pointers (std::ptr::from_ref::<Atn>(atn) as usize) as keys in the global SHARED_ATN_CACHES map poses a risk of memory leaks and stale data. Since Atn objects are not guaranteed to be 'static by the type system, a dropped Atn could leave an entry in the map indefinitely. Furthermore, if a new Atn is allocated at the same memory address, the parser might incorrectly use cached data from a different grammar. Consider using a unique identifier within the Atn struct or a safer caching strategy that accounts for object lifetimes.
There was a problem hiding this comment.
Addressed in c704581 — see reply on the same line. The compound key includes a stable secondary identity (states pointer/length, max_token_type) so a reused ATN allocation cannot pick up entries from the previous grammar.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lexer.rs`:
- Around line 125-127: The cache uses Rc which prevents BaseLexer from being
Send/Sync; change FxHashMap<usize, Rc<LexerDfaCachedState>> in the LexerDfaTrace
type to FxHashMap<usize, Arc<LexerDfaCachedState>> (add use std::sync::Arc) and
update any Rc::new / Rc::clone sites to Arc::new / Arc::clone; then update the
helper functions cached_lexer_dfa_state and cache_lexer_dfa_state to
return/store Arc<LexerDfaCachedState> instead of Rc, and adjust any callsites to
accept Arc (or clone the Arc where sharing is needed) so the cache remains
equivalent but uses thread-safe ownership.
In `@src/parser.rs`:
- Around line 958-985: The cache key uses the raw Atn pointer
(std::ptr::from_ref) which can be reused after an Atn is dropped; instead use a
stable, per-Atn unique identifier or require a 'static/owned handle. Modify
SHARED_ATN_CACHES lookups in with_shared_first_set_cache and
with_shared_atn_caches to use a stable key produced by the Atn itself (e.g., add
or use an existing Atn::unique_id() / id() usize field set at Atn construction)
rather than the pointer, or change the APIs to accept an Arc<Atn>/'static
reference and use a guaranteed-unique generation id from that handle; update all
callsites to provide the new stable id/handle and keep using SharedAtnCache,
FirstSetCache, and the same map semantics.
- Around line 442-447: The ll1_decision_cache field (FxHashMap<(usize, i32),
Option<usize>>) is grammar-specific and must be cleared when a parser is reset;
update BaseParser::reset_per_parse_caches() to clear or reinitialize
ll1_decision_cache (e.g., call self.ll1_decision_cache.clear() or assign
Default::default()) so that cached LL(1) alt selections from a previous
Atn/grammar are not reused across parse boundaries.
- Around line 3312-3319: The code assumes rule_first_set populated the shared
cache and unconditionally calls expect on cache.get(&first_key), causing a panic
when rule_first_set skips insertion on cycles; update rule_first_set to return
the computed FIRST set (e.g., Option<Rc<...>> or Rc<...> when produced) and
change the with_shared_first_set_cache closure (the block using first_key,
rule_first_set, and cache.get) to use the returned FIRST value as a fallback
when cache.get(&first_key) is absent instead of calling expect — i.e., call let
maybe_first = rule_first_set(...); then clone cache entry if present else clone
maybe_first to avoid panics on recursive grammars.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3a8c598c-830a-4c4a-b336-22fad6a3e64a
📒 Files selected for processing (6)
Cargo.tomlsrc/atn/lexer.rssrc/atn/mod.rssrc/lexer.rssrc/parser.rssrc/prediction.rs
Match the call style used in src/parser.rs's local FxHasher (`i32::cast_unsigned(value)` rather than method-call form). Same behavior, just keeps the two parallel hasher impls visually aligned. Addresses gemini-code-assist comment on PR #10.
The LL(1) cache is keyed by ATN state numbers, which are grammar-specific. A BaseParser reused with a different Atn would otherwise commit to alts chosen for the previous grammar. Same fix as for first_set_cache and decision_lookahead_cache, which already clear on reset. Addresses comments from chatgpt-codex (P1), gemini-code-assist (high), greptile (P1), and coderabbit (Major) on PR #10.
rule_first_set intentionally skips inserting into the cache when its FIRST-set walk hits a cycle. The previous code probed the cache after the call, which would .expect()-panic on recursive grammars. Use the returned Rc<FirstSet> directly — it's correct in both cycle and non-cycle cases. Addresses gemini-code-assist (high) and coderabbit (Critical) comments on PR #10.
The previous code hashed (state_number, index) into a single u64 with a multiplicative mix. While the collision probability is ~10^-19, a collision would silently treat a fresh visit as a cycle and drop all outcomes — i.e., produce a wrong parse with no diagnostic. Storing the exact (usize, usize) pair preserves O(1) FxHashSet lookup without giving up exactness; the bookkeeping is still tiny because the visiting set holds at most a few hundred entries at any time. Addresses greptile (P2) comment on PR #10.
Use a compound (atn ptr, states ptr, states len, max_token_type) key instead of just the Atn pointer. Real-world generated parsers use 'static OnceLock<Atn>', so collisions are impossible for them. But for non-'static Atns (dropped between parses), the previous key would let a new Atn allocated at the same address pick up the previous grammar's FIRST/lookahead entries. Adding a stable secondary identity catches that pointer collision. Addresses chatgpt-codex (P2), gemini-code-assist (medium), greptile (P1), and coderabbit (Major) comments on PR #10.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c704581b24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let visit_id = (state_number, index); | ||
| if needs_cycle_guard && !visiting.insert(visit_id) { |
There was a problem hiding this comment.
Include parse context in cycle-guard key
The new cycle guard keys visiting only by (state_number, index), but viability at a state also depends on request context (notably precedence, and in recovery mode the recovery context). In this function, precedence is passed through recursive rule calls and directly gates Transition::Precedence handling, so revisiting the same state/index with a different precedence can be a valid path; this guard now returns Vec::new() instead, pruning that path and potentially producing a wrong parse or spurious no-viable error on left-recursive precedence grammars.
Useful? React with 👍 / 👎.
Summary
Targeted micro-optimizations to the lexer and recursive recognizer that
came out of profiling the C# parse benchmark. Each commit is a
self-contained, measurable wins; together they produce a meaningful
speedup on C# while leaving Kotlin (already fast) untouched or slightly
faster.
This is not a full Adaptive-LL implementation. I scaffolded one
in earlier iterations, couldn't make it fast enough on C#'s grammar
shape (closures explode the merge graph past the per-call deadline)
and stripped it out before sending this PR. Closing the remaining gap
to Go on C# requires that architectural change (per-decision DFA
caching across parses, or generated parser methods that dispatch via
AdaptivePredict) — out of scope here.Benchmark deltas (release, 5 iters + 2 warmups)
C#, vs upstream Go ANTLR runtime:
Kotlin: still 9-17× faster than Go; no regression.
What's in the commits
Lexer (cumulative ~30% on C# parse time):
Vec clone
seenset fromBTreeSet<LexerConfig>toFxHashSetLexerDfaTraceBTreeMap fields to FxHashMapPredictionFxHasherproviding the hash functionParser (cumulative ~25-30% on C# parse time):
loop, avoiding a recursive
recognize_state_fastcall per visitproduced no diagnostics (always true on pass 1)
Rc::new(NodeList::Empty)perprepend onto an empty list
decision_lookahead_cache, eliminating thread-local +RefCell + entry overhead on every multi-trans visit
disjoint and not nullable; collapses the per-transition filter loop
to a single hash probe
(with a heap Vec fallback for the overflow case — fixed a bug where
duplicates leaked past 8 distinct entries)
NodeList::Oneinline variant to avoid the Rc tailallocation when the list has a single node
Tooling:
perf-countersfeature exposing outcome-distribution countersvia
ANTLR_PERF_DUMP=1for future investigationTest plan
cargo test --release --lib— 37 passingcargo clippy --locked --all-targets --all-features -- -D warnings— cleancargo build --release --features perf-counters— cleanSummary by CodeRabbit
New Features
Performance
Documentation