Idiomatic Structural Handling for Special Tokens via Macro-Driven Dispatch in Token Library Struct by sempervictus · Pull Request #260 · guoqingbao/xinfer

sempervictus · 2026-03-09T14:06:42Z

@guoqingbao the pain of magic strings not passing through code agent tools is over, this should put that mess to bed once and for all. Quite handy for dynamic grammar generation and quashes the presumption of "there is only one EOS" since q35 has several and one of the weird artifacts we see with things like DeepResearch is actually the re-trained model throwing out string token representation of <|im_start|> due to some SFT quirk when it should have been using the properly masked <\0xFF|im_start|> AddedToken (which iirc also has .special == true).

I cant easily drop this in without #232 but that's all set for merge IMO and actually did a bunch of the work for this PR (the logic/structure/macro stuff is all me but it did a nice job locating callers and splicing in the remedy).

This implements the full llguidance integration enabling grammar-constrained inference for structured outputs, tool calling, and custom constraints. Architecture: - TopLevelGrammar serialized via rmp_serde across RPC boundaries - Grammar flows: Server → params.grammar → Runner → GuidanceState → Matcher - Inline correction via logits masking during sampling - Post-process correction via rollback on validation failure Key components: - params.grammar field in SamplingParams for RPC serialization GuidanceState - GuidanceState::new() with Matcher state management - GuidanceState::reset() for proper state cleanup - Rollback counter (MAX_ROLLBACK_ATTEMPTS=3) preventing infinite loops - guidance_failed/guidance_mismatch sets cleared on rollback - Vocab size validation in build_llg_factory() - Lark grammar generation from tools via build_tool_call_lark_grammar() CLI flags: - --enable-tool-grammar: Auto-build LLG grammar from MCP tools - --allow-constraint-api: Accept client-provided structured_outputs/response_format

Expand SpecialTokens usage to cover EOS uses across the codebase to include the chat template. This gates access to the EOS tokens through a single common API providing an interdiction point to add or remove them as needed per-model or family as requried.

- Replace manual EOS token extraction logic with centralized SpecialTokens::new() and idiomatic accessors - Eliminate EosTokenId enum and related complex serialization logic in favor of direct Vec<u32> - Update all callers to use SpecialTokens for tool start/end token IDs - Remove stop_token_ids from SamplingParams and related logic (now handled via SpecialTokens) - Simplify tokenizer config by replacing EosTokenEntry with Option<String> - Add comprehensive SpecialTokens API with category-based accessors, ID/string sets, and search methods

Improve the binary example to be a handy extractor for models which developers can use to update special_tokens.rs quickly. Add tags extracted from Qwen3.5 0.8B

Narrow the Common category search specifically to find string dups of actually special tokens (handle "aftermarket" models/merges). Add and test Llama4 and Qwen3.5 MoE

sempervictus · 2026-03-11T21:44:01Z

replaced by #262

RageLtMan and others added 4 commits March 9, 2026 10:12

Support Qwen3.5 Dense models on Metal (guoqingbao#258)

9a1ab79

sempervictus force-pushed the specialtokens/pr branch from a908fa2 to 64d9397 Compare March 9, 2026 14:14

sempervictus mentioned this pull request Mar 9, 2026

Implement LLGuidance #232

Closed

RageLtMan added 3 commits March 9, 2026 13:21

More SpecialTokens, Improve Example/Binary

2956d5c

Improve the binary example to be a handy extractor for models which developers can use to update special_tokens.rs quickly. Add tags extracted from Qwen3.5 0.8B

SpecialTokens Strings for Llama4 and Qwen3.5 MoE

0eda75b

Narrow the Common category search specifically to find string dups of actually special tokens (handle "aftermarket" models/merges). Add and test Llama4 and Qwen3.5 MoE

ToolConfig Population w/ SpecialTokens

dbb961d

sempervictus mentioned this pull request Mar 10, 2026

Idiomatic Special Tokenization #259

Open

4 tasks

sempervictus closed this Mar 11, 2026

sempervictus mentioned this pull request Mar 11, 2026

Enable Reasoning via Guided Enforcement #262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idiomatic Structural Handling for Special Tokens via Macro-Driven Dispatch in Token Library Struct#260

Idiomatic Structural Handling for Special Tokens via Macro-Driven Dispatch in Token Library Struct#260
sempervictus wants to merge 7 commits into
guoqingbao:mainfrom
sempervictus:specialtokens/pr

sempervictus commented Mar 9, 2026

Uh oh!

sempervictus commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sempervictus commented Mar 9, 2026

Uh oh!

sempervictus commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants