Inference: gemma4 tool/reasoning parsers; add flatclaw-inference-dev image by skytruax · Pull Request #2 · skytruax/FlatClaw

skytruax · 2026-05-04T21:50:16Z

Summary

infra/inference/entrypoint.sh: `--tool-call-parser pythonic` → `gemma4`, plus `--reasoning-parser gemma4`. Source-of-truth fix to match what production has been running — past session JSONLs show prod's deployed pod was already using `gemma4` (125+ structured toolCall blocks captured, zero raw `<|tool_call|>` envelope leaks). The repo had drifted to "pythonic" which is wrong for Gemma 4 and would have broken the next image republish. See SGLang's Gemma 4 cookbook and sgl-project/sglang#21952.
infra/inference-dev/{Dockerfile,entrypoint.sh}: new dev-inference image. Mirrors the prod Dockerfile/entrypoint shape exactly — same SGLang base, same crane-mutate publish pattern, same parser flags. Diverges only on dtype (`bfloat16` vs `fp8`) and the `GEMMA_DIR_NAME` default (`gemma-4-e4b-it` vs `gemma-4-31b-it`). Same plumbing both lanes; switching never breaks.
.github/workflows/publish-inference-dev.yml: companion publish workflow for `ghcr.io/skytruax/flatclaw-inference-dev:latest`. Same shape as `publish-inference.yml`, auto-runs on push-to-main when the inference-dev files change.

Notes on thinking mode

Per Gemma 4's chat template, `enable_thinking` is OFF by default. Even with `--reasoning-parser gemma4` set, callers must pass `extra_body.chat_template_kwargs.enable_thinking=true` per request to activate it. openclaw's `thinkingDefault: "high"` config does not (today) translate to that extra_body field — that's a follow-up for the openclaw config plumbing, separate from this PR.

Test plan

Workflow `publish-inference` runs on merge → republishes `ghcr.io/skytruax/flatclaw-inference:latest` with the corrected parser flags.
Workflow `publish-inference-dev` runs on merge → publishes `ghcr.io/skytruax/flatclaw-inference-dev:latest` for the first time.
After merge, redeploy inference-dev and inference services pulling `:latest` and verify a tool-required chat completion returns structured `tool_calls` (not `<|tool_call|>` raw text).

Source-of-truth fix to match the parser flags that production has been running. Past session JSONLs show prod's deployed pod was already using --tool-call-parser gemma4 (125+ structured toolCall blocks captured, zero raw <|tool_call|> envelope leaks). The repo's infra/inference/entrypoint.sh had drifted to "pythonic" — wrong for Gemma 4, would have broken the next image republish. - infra/inference/entrypoint.sh: --tool-call-parser pythonic → gemma4, plus --reasoning-parser gemma4. Per SGLang's published Gemma 4 cookbook (PR sgl-project/sglang#21952). Thinking still defaults OFF in the chat template — callers must pass extra_body.chat_template_kwargs.enable_thinking=true to activate. - infra/inference-dev/{Dockerfile,entrypoint.sh}: new dev-inference image, mirrors the prod Dockerfile shape exactly. Same SGLang base, same crane-mutate publish pattern, same parser flags. Diverges only on dtype (bfloat16 vs fp8) and the GEMMA_DIR_NAME default (gemma-4-e4b-it vs gemma-4-31b-it). - .github/workflows/publish-inference-dev.yml: companion publish workflow for ghcr.io/<owner>/flatclaw-inference-dev:latest. Same shape as publish-inference.yml, runs on push-to-main when the inference-dev files change.

skytruax merged commit f597261 into main May 4, 2026

skytruax deleted the public/inference-gemma4-parser branch May 21, 2026 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference: gemma4 tool/reasoning parsers; add flatclaw-inference-dev image#2

Inference: gemma4 tool/reasoning parsers; add flatclaw-inference-dev image#2
skytruax merged 1 commit into
mainfrom
public/inference-gemma4-parser

skytruax commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skytruax commented May 4, 2026

Summary

Notes on thinking mode

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant