Skip to content

feat: Logprobs API + structural tag constraint for tool calls#5

Merged
raullenchai merged 1 commit intomainfrom
feat/grammar-error-handling
Feb 25, 2026
Merged

feat: Logprobs API + structural tag constraint for tool calls#5
raullenchai merged 1 commit intomainfrom
feat/grammar-error-handling

Conversation

@raullenchai
Copy link
Copy Markdown
Owner

Summary

  • Logprobs API (fix: init crash, JSON corruption, GC leak, cloud routing gaps #11): Propagate mlx-lm per-token logprobs through the full stack (StreamingOutput → GenerationOutput → API response). Supports both streaming and non-streaming chat/completion endpoints with logprobs: true and top_logprobs: 0-20 per OpenAI spec.
  • Structural tag constraint (feat: TTFT cache fix, MiniMax reasoning parser, logprobs API, tool logits #7): Extend MiniMaxToolLogitsProcessor with parameter value schema tracking — biases toward valid JSON types (string/number/boolean/object/array) at the start of parameter values during generation. SimpleEngine gets post-generation validation with warning logs for schema mismatches.
  • Fix: Prevent server crash from malformed response_format schemas (wraps build_json_system_prompt and parse_json_output in try/except).

Changes

File Change
api/models.py Add TopLogProb, TokenLogProb, ChoiceLogProbs models; add logprobs/top_logprobs to request/response types
models/llm.py Pass response.logprobs (mx.array) through StreamingOutput
engine/base.py Add logprobs: Any to GenerationOutput
engine/simple.py Propagate logprobs + token ID in streaming
api/tool_logits.py Parameter value state tracking, JSON type bias, validate_param_value(), _extract_param_schemas()
server.py _extract_token_logprob() helper, top_logprobs validation, streaming/non-streaming logprobs wiring, _validate_tool_call_params() post-generation check
api/tool_calling.py Wrap json.dumps(schema) to prevent crash on non-serializable schemas

Test plan

  • pytest tests/test_api_models.py — model schema tests pass (72 tests)
  • pytest tests/test_tool_logits.py — tool logits processor tests pass (9 tests)
  • pytest tests/test_server.py — server endpoint tests pass (34 tests)
  • Full test suite: 904 passed, 14 skipped (4 pre-existing failures: missing torch/async framework)
  • Manual: curl with logprobs: true, top_logprobs: 5 returns token-level logprobs
  • Manual: streaming with logprobs returns per-chunk logprobs
  • Manual: tool calls with schema-typed params get correct JSON bias

🤖 Generated with Claude Code

@raullenchai raullenchai force-pushed the feat/grammar-error-handling branch from 57cadf4 to ecedbf5 Compare February 25, 2026 16:27
Logprobs: Propagate mlx-lm per-token logprobs through the full stack
(StreamingOutput → GenerationOutput → API response). Supports both
streaming and non-streaming chat completions with top_logprobs (0-20).

Structural tags: Extend MiniMaxToolLogitsProcessor with parameter value
schema tracking — biases toward valid JSON types (string/number/boolean/
object/array) at the start of parameter values. SimpleEngine gets
post-generation validation with warning logs for schema mismatches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@raullenchai raullenchai force-pushed the feat/grammar-error-handling branch from ecedbf5 to ccceab9 Compare February 25, 2026 16:29
@raullenchai raullenchai merged commit d164355 into main Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant