Skip to content

Idiomatic Structural Handling for Special Tokens via Macro-Driven Dispatch in Token Library Struct#260

Closed
sempervictus wants to merge 7 commits into
guoqingbao:mainfrom
sempervictus:specialtokens/pr
Closed

Idiomatic Structural Handling for Special Tokens via Macro-Driven Dispatch in Token Library Struct#260
sempervictus wants to merge 7 commits into
guoqingbao:mainfrom
sempervictus:specialtokens/pr

Conversation

@sempervictus
Copy link
Copy Markdown
Contributor

@guoqingbao the pain of magic strings not passing through code agent tools is over, this should put that mess to bed once and for all. Quite handy for dynamic grammar generation and quashes the presumption of "there is only one EOS" since q35 has several and one of the weird artifacts we see with things like DeepResearch is actually the re-trained model throwing out string token representation of <|im_start|> due to some SFT quirk when it should have been using the properly masked <\0xFF|im_start|> AddedToken (which iirc also has .special == true).

I cant easily drop this in without #232 but that's all set for merge IMO and actually did a bunch of the work for this PR (the logic/structure/macro stuff is all me but it did a nice job locating callers and splicing in the remedy).

RageLtMan and others added 4 commits March 9, 2026 10:12
This implements the full llguidance integration enabling
grammar-constrained inference for structured outputs, tool calling,
and custom constraints.

Architecture:
- TopLevelGrammar serialized via rmp_serde across RPC boundaries
- Grammar flows: Server → params.grammar → Runner → GuidanceState
 → Matcher
- Inline correction via logits masking during sampling
- Post-process correction via rollback on validation failure

Key components:
- params.grammar field in SamplingParams for RPC serialization
GuidanceState
- GuidanceState::new() with Matcher state management
- GuidanceState::reset() for proper state cleanup
- Rollback counter (MAX_ROLLBACK_ATTEMPTS=3) preventing infinite
loops
- guidance_failed/guidance_mismatch sets cleared on rollback
- Vocab size validation in build_llg_factory()
- Lark grammar generation from tools via
build_tool_call_lark_grammar()

CLI flags:
- --enable-tool-grammar: Auto-build LLG grammar from MCP tools
- --allow-constraint-api: Accept client-provided
structured_outputs/response_format
Expand SpecialTokens usage to cover EOS uses across the codebase
to include the chat template. This gates access to the EOS tokens
through a single common API providing an interdiction point to add
or remove them as needed per-model or family as requried.
- Replace manual EOS token extraction logic with centralized
SpecialTokens::new() and idiomatic accessors
- Eliminate EosTokenId enum and related complex serialization logic
in favor of direct Vec<u32>
- Update all callers to use SpecialTokens for tool start/end token
IDs
- Remove stop_token_ids from SamplingParams and related logic
(now handled via SpecialTokens)
- Simplify tokenizer config by replacing EosTokenEntry with
Option<String>
- Add comprehensive SpecialTokens API with category-based
accessors, ID/string sets, and search methods
RageLtMan added 3 commits March 9, 2026 13:21
Improve the binary example to be a handy extractor for models which
developers can use to update special_tokens.rs quickly.

Add tags extracted from Qwen3.5 0.8B
Narrow the Common category search specifically to find string dups
of actually special tokens (handle "aftermarket" models/merges).

Add and test Llama4 and Qwen3.5 MoE
@sempervictus
Copy link
Copy Markdown
Contributor Author

replaced by #262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants