studio: auto-continue when model stops mid-plan#5549
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8a648c216
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| r"let me|now let me|i['’]ll now|next,?\s*i['’]ll|" | ||
| r"i['’]m going to|i will now|let['’]s now" | ||
| r")[^.!?\n]*[.!?]?\s*$" |
There was a problem hiding this comment.
Exclude closing 'let me know' from auto-continue
When a complete answer ends with a common closing such as If you need anything else, let me know., this regex still matches the trailing let me phrase because it is searched over the final window and does not exclude let me know. That sends the turn through _trailing_plan_hit() and can add up to three spurious Continue. generations after an already-finished response, increasing latency/cost and appending unrelated text for normal user-facing completions.
Useful? React with 👍 / 👎.
| f"model ended turn mid-plan " | ||
| f"({len(_stripped)} chars)" | ||
| ) | ||
| _nudge = "Continue." |
There was a problem hiding this comment.
Reset Anthropic cursors before auto-continued text
For Anthropic Messages requests with tools enabled, this new Continue. branch starts another assistant generation after yielding only a status event; however AnthropicStreamEmitter and _anthropic_tool_non_streaming ignore status and reset their cumulative-text cursor only on tool_end. When auto-continue fires without a tool call, the continuation is diffed against the previous assistant turn length, so shorter/equal continuations are dropped and longer ones lose their prefix. Either emit a boundary the Anthropic adapters handle or reset their cursor on this empty status.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to detect and handle mid-plan End-Of-Sequence (EOS) events by implementing auto-continue logic alongside existing tool-coercive re-prompts. It adds regex-based detectors for trailing intents, lists, and colons, and updates the inference loop to nudge the model with a "Continue." message when these patterns are detected. Feedback was provided to refactor the _trailing_plan_hit function using any() with a generator expression to improve conciseness and align with efficiency guidelines.
| if _TRAILING_PLAN_INTENT.search(tail) is not None: | ||
| return True | ||
| if _TRAILING_PLAN_LIST.search(tail) is not None: | ||
| return True | ||
| if _TRAILING_PLAN_COLON.search(tail) is not None: | ||
| return True | ||
| return False |
There was a problem hiding this comment.
For improved conciseness and readability, you can use any() with a generator expression to check against the multiple regex patterns. This avoids the series of if statements and combines checks into a single iteration, aligning with repository efficiency guidelines regarding redundant data iterations.
| if _TRAILING_PLAN_INTENT.search(tail) is not None: | |
| return True | |
| if _TRAILING_PLAN_LIST.search(tail) is not None: | |
| return True | |
| if _TRAILING_PLAN_COLON.search(tail) is not None: | |
| return True | |
| return False | |
| return any( | |
| regex.search(tail) is not None | |
| for regex in (_TRAILING_PLAN_INTENT, _TRAILING_PLAN_LIST, _TRAILING_PLAN_COLON) | |
| ) |
References
- To improve efficiency, avoid redundant data iterations. Combine checks and transformations into a single loop and return computed values for callers to reuse.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e6090d5f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| r")[^.!?\n]*[.!?]?\s*$" | ||
| ) | ||
| _TRAILING_PLAN_LIST = re.compile( | ||
| r"(?ims)" |
There was a problem hiding this comment.
Remove multiline mode from trailing-list end detector
_TRAILING_PLAN_LIST is compiled with (?ims) and then searched with a terminal \s*$; the m flag makes $ match end-of-line, so this regex can match a list block even when more response text follows later in the same turn. That causes _trailing_plan_hit() to fire on already-complete answers (for example a list followed by a concluding sentence), triggering unnecessary Continue. retries and potentially appending unrelated extra output while increasing latency/cost.
Useful? React with 👍 / 👎.
| _extra = _MAX_REPROMPTS + _MAX_CONTINUES if max_tool_iterations > 0 else 0 | ||
| for iteration in range(max_tool_iterations + _extra): |
There was a problem hiding this comment.
Avoid inflating tool-iteration cap with unused continue budget
The loop bound now always adds _MAX_CONTINUES whenever max_tool_iterations > 0, so the agent can run up to max_tool_iterations + 6 full iterations even if no auto-continue is used. Because each extra iteration can still execute tool calls, this weakens the caller-provided cap (max_tool_iterations) and can produce unexpected extra tool executions/cost during long tool loops.
Useful? React with 👍 / 👎.
Codex review on unslothai#5549 flagged that endings like "If you need anything else, let me know." match the trailing-plan intent pattern (the regex matches "let me <anything>." at end of buffer). On a finished, user-facing closing this fires the auto- continue branch up to three times, costing latency / tokens and appending unrelated text after the response. Add a negative lookahead so "let me" only counts as a mid-plan signal when it is NOT immediately followed by "know". Other intent phrases ("now let me", "i'll now", "i'm going to", "i will now", "let's now") already require a planning verb so they are unaffected. Verified against `scripts/r6_trailing_plan_regex_test.py`: closing "let me know" variants no longer match; "let me clone/check/run …" still does.
Codex P2 on unslothai#5549 flagged that the auto-continue branch yields only a `{"type":"status","text":""}` event between turns; the Anthropic streaming emitter (`AnthropicStreamEmitter`) and the non-streaming tool path (`_anthropic_tool_non_streaming`) both ignore `status` events, so their cumulative-text cursor still holds the previous turn's full length when the continuation starts streaming. Shorter continuations get dropped entirely and longer ones lose their prefix. Treat the empty-text status as an auto-continue boundary in both paths: - `AnthropicStreamEmitter`: close any open text block, open a fresh one (matches the `tool_end` reset pattern), and clear `_prev_text`. - `_anthropic_tool_non_streaming`: clear `prev_text` so the next `content` event's diff baseline is empty. Non-empty status events (tool progress text) keep their existing no-op semantics.
Codex P2 review on #5549 surfaced two related risks in the auto-continue plumbing: 1. `_TRAILING_PLAN_LIST` was compiled with `(?ims)`. The `m` flag makes the terminal `\s*$` match end-of-line, so a complete answer like "Here's my plan:\n- a\n- b\n\nDone, that should work." still matched the list-block sub-pattern and tripped a spurious `Continue.` retry. Drop the `m` (and the unused `s`) flag and re-anchor with `\Z` so the list pattern only fires when the list is genuinely the last thing in the buffer. 2. The agent loop pre-reserved `_MAX_REPROMPTS + _MAX_CONTINUES` (= 6) extra iterations on top of the caller's `max_tool_iterations` unconditionally. That weakens the caller-provided budget: a turn that never trips the reprompt or continue path could still run up to N+6 full iterations and execute their tool calls. Switch the bound to a dynamic cap that grows only as reprompts / continues are actually consumed: `iteration < max_tool_iterations + _reprompt_count + _continue_count`. With both counters at zero the loop honors the caller cap exactly; once a continue or reprompt fires it earns its own slot back. Implemented with `itertools.count()` so the existing `continue` statements in the loop body keep their semantics. Regex behaviour pinned by `scripts/r6_trailing_plan_regex_test.py` (updated separately for the new list-tail case).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2784872494
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| elif etype == "status" and not event.get("text"): | ||
| # Auto-continue boundary marker emitted by | ||
| # generate_chat_completion_with_tools — the next "content" | ||
| # event resets to a fresh cumulative baseline, so close any | ||
| # open text block and clear the diff cursor. Without this | ||
| # the next continuation gets diffed against the previous | ||
| # turn's length (shorter continuations are dropped, longer | ||
| # ones lose their prefix). | ||
| return self._handle_boundary() |
There was a problem hiding this comment.
Don’t treat every empty status as a content boundary
Empty status events are emitted for more than auto-continue boundaries: the GGUF tool loop also yields them to clear the status badge at normal completion and after each tool call (llama_cpp.py lines 4501 and 4766). With Anthropic streaming, this branch now closes the currently open text block and immediately opens a new one for all of those cases, so a normal final answer or any tool call produces extra empty content_block_start/content_block_stop pairs in the Messages SSE stream. Reset the diff cursor only for the actual continuation boundary, or make the generator emit a distinct boundary event instead of overloading every empty status.
Useful? React with 👍 / 👎.
…am cleanly Codex P2 on #5549 flagged that the cycle-5 cursor-reset (efa43c4) keyed on every empty-status event, but generate_chat_completion_with_tools emits empty status events in five places, only two of which are real iteration boundaries: - Line 4497: emitted right before `continue` after a re-prompt / auto-continue. The next iteration starts a fresh assistant turn. BOUNDARY: reset cursor. - Line 4766: emitted right before `continue` after a tool call. The next iteration regenerates with tool results in history. BOUNDARY: reset cursor. - Line 4501: emitted at metadata-yield after normal streaming. Stream is about to end, no new iteration follows. NOT a boundary. - Line 4584: emitted in DRAINING-no-tool-call fallback path. Stream is about to end with buffered content_accum. NOT a boundary. - Line 4794: emitted at the final exit of the generator. NOT a boundary. Treating all five as boundaries gave every Anthropic-streaming response an extra content_block_stop + content_block_start pair around its final text and around every tool call. Fix by tagging the two real boundary sites with `"boundary": True` and tightening both the Anthropic emitter (`anthropic_compat.py`) and the OpenAI-compat tool path + Anthropic non-streaming path (`routes/inference.py`) to reset the cumulative-text cursor only when that flag is set. Plain empty-status events keep their existing badge- clear semantics on the frontend (`tool_status` SSE with content ""). Add two regression tests in `backend/tests/test_anthropic_messages.py::TestAnthropicStreamEmitter`: - test_boundary_flag_closes_block_and_resets_cursor: a boundary=True status closes the open text block and the next content delta streams from zero. - test_empty_status_without_boundary_does_not_close_block: a plain empty status leaves block_index unchanged and the next content delta is diffed against the previous text length. 107 tests pass across the four anthropic + trailing-plan test files.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 610c3873b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Clear tool status badge before next generation iteration. | ||
| # boundary=True: the model is about to start a fresh turn | ||
| # so cumulative-text adapters must reset their cursor. | ||
| yield {"type": "status", "text": "", "boundary": True} |
There was a problem hiding this comment.
Avoid marking post-tool badge clears as boundaries
Fresh evidence in this revision is that the post-tool badge clear is now explicitly emitted with boundary=True. In the Anthropic streaming path, tool_end already resets the cursor and opens the next text block, so this immediately sends the same event through _handle_boundary(), closing that empty block and opening another one before any model text arrives; any streamed Anthropic request that executes a tool now gets an extra empty text content block in the SSE sequence before the real post-tool response.
Useful? React with 👍 / 👎.
… tool_end Codex 02:42Z on #5549 caught that flagging the post-tool empty-status event with boundary=True (cycle-12 commit 610c387) double-handles the cursor reset in the Anthropic streaming path. AnthropicStreamEmitter._handle_tool_end already: - closes the open tool_use block, - emits tool_result, - increments block_index, - opens a fresh text block, and - resets _prev_text = "". When llama_cpp.py then yielded boundary=True on the very next event, _handle_boundary fired _close_block + _open_text_block on that freshly opened (still-empty) text block. Result: every tool call produced a spurious content_block_stop + content_block_start pair before the post-tool model text streamed. Fix: - llama_cpp.py: drop boundary=True from the post-tool status emit (line ~4768). Keep boundary=True only at the auto-continue site (line ~4500), which has no preceding tool_end to do the cursor work. - routes/inference.py OpenAI-compat tool stream: mirror the Anthropic semantics by resetting prev_text on BOTH tool_start AND tool_end, so the post-tool empty-status no longer needs to do it. Add backend/tests/test_anthropic_messages.py::TestAnthropicStreamEmitter:: test_post_tool_empty_status_does_not_double_close as a regression test: content -> tool_start -> tool_end -> empty status -> content must not bump block_index past tool_end's increment, and the post-tool content must land in the text block tool_end opened. 95 tests pass across the anthropic + trailing-plan suites.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 003ae8a3c6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # not a plan signal -- excluded via negative lookahead. | ||
| _TRAILING_PLAN_INTENT = re.compile( | ||
| r"(?i)(" | ||
| r"let me(?!\s+know\b)|now let me|i['’]ll now|next,?\s*i['’]ll|" |
There was a problem hiding this comment.
Exclude prefixed 'now let me know' closers
Fresh evidence in this revision is that only the bare let me alternative has the (?!\s+know\b) guard; the new now let me alternative still matches complete closers such as Now let me know if you want another example.. When a GGUF response ends with that common closing, _trailing_plan_hit() treats the turn as mid-plan and can issue up to three unnecessary Continue. generations, adding latency/cost and potentially appending unrelated text after an already-finished answer.
Useful? React with 👍 / 👎.
The existing intent-signal re-prompt fires only when tools are armed and the response is short. Models often stop mid-plan in other shapes too: a trailing "Let me clone the repo.", a "Let me ...:" header followed by a numbered list, or a bare trailing colon. When this happens the turn ends with the structured workload only partially delivered. Add a neutral "Continue." nudge that runs alongside the tool-coercive re-prompt: - _TRAILING_PLAN_INTENT, _TRAILING_PLAN_LIST, _TRAILING_PLAN_COLON cover the three observed shapes, scanned over the last 600 chars. - _trailing_plan_hit() returns True if any of them match. - _MAX_CONTINUES (3) is independent of _MAX_REPROMPTS so the two paths cannot starve each other. - _continue_count threads through the agentic loop; auto-continue fires with "Continue." regardless of tool armament. Regex tested against the patterns above plus negative controls (complete sentences, benign "let me" earlier in the buffer) before landing.
for more information, see https://pre-commit.ci
Codex review on #5549 flagged that endings like "If you need anything else, let me know." match the trailing-plan intent pattern (the regex matches "let me <anything>." at end of buffer). On a finished, user-facing closing this fires the auto- continue branch up to three times, costing latency / tokens and appending unrelated text after the response. Add a negative lookahead so "let me" only counts as a mid-plan signal when it is NOT immediately followed by "know". Other intent phrases ("now let me", "i'll now", "i'm going to", "i will now", "let's now") already require a planning verb so they are unaffected. Verified against `scripts/r6_trailing_plan_regex_test.py`: closing "let me know" variants no longer match; "let me clone/check/run …" still does.
Codex P2 on #5549 flagged that the auto-continue branch yields only a `{"type":"status","text":""}` event between turns; the Anthropic streaming emitter (`AnthropicStreamEmitter`) and the non-streaming tool path (`_anthropic_tool_non_streaming`) both ignore `status` events, so their cumulative-text cursor still holds the previous turn's full length when the continuation starts streaming. Shorter continuations get dropped entirely and longer ones lose their prefix. Treat the empty-text status as an auto-continue boundary in both paths: - `AnthropicStreamEmitter`: close any open text block, open a fresh one (matches the `tool_end` reset pattern), and clear `_prev_text`. - `_anthropic_tool_non_streaming`: clear `prev_text` so the next `content` event's diff baseline is empty. Non-empty status events (tool progress text) keep their existing no-op semantics.
Codex P2 review on #5549 surfaced two related risks in the auto-continue plumbing: 1. `_TRAILING_PLAN_LIST` was compiled with `(?ims)`. The `m` flag makes the terminal `\s*$` match end-of-line, so a complete answer like "Here's my plan:\n- a\n- b\n\nDone, that should work." still matched the list-block sub-pattern and tripped a spurious `Continue.` retry. Drop the `m` (and the unused `s`) flag and re-anchor with `\Z` so the list pattern only fires when the list is genuinely the last thing in the buffer. 2. The agent loop pre-reserved `_MAX_REPROMPTS + _MAX_CONTINUES` (= 6) extra iterations on top of the caller's `max_tool_iterations` unconditionally. That weakens the caller-provided budget: a turn that never trips the reprompt or continue path could still run up to N+6 full iterations and execute their tool calls. Switch the bound to a dynamic cap that grows only as reprompts / continues are actually consumed: `iteration < max_tool_iterations + _reprompt_count + _continue_count`. With both counters at zero the loop honors the caller cap exactly; once a continue or reprompt fires it earns its own slot back. Implemented with `itertools.count()` so the existing `continue` statements in the loop body keep their semantics. Regex behaviour pinned by `scripts/r6_trailing_plan_regex_test.py` (updated separately for the new list-tail case).
for more information, see https://pre-commit.ci
While drafting backend/tests/test_trailing_plan.py for the changes landed in b4e0985, the new tests surfaced a deeper false-positive the earlier regex tightening missed. For input "Let me explain:\n1. The function returns 42.\n\nThat's the answer." the previous `(?:\s*(?:[-*•]|\d+\.)\s+[^\n]+\n?)+` allowed the regex engine to backtrack and treat the in-prose "42." substring as a second list-item marker: iter 1 consumed "1. The function returns 4" and iter 2 consumed "2.\n\nThat's the answer." (with `\s+` greedily crossing the empty-line break). The pattern then satisfied `\s*\Z` and the buffer fired a spurious `Continue.` retry on what was already a fully-formed answer. Tighten the per-item boundary: - `[ \t]*` before the marker (no newlines): forces the marker to sit at the start of its own line. A mid-prose "42." cannot satisfy this because the engine cannot rewind past the preceding `\n` without invalidating the previous iteration's `[^\n]+\n` close. - `[ \t]+` between the marker and content: blocks an `\s+`-driven cross-newline reach into a closing paragraph. - `(?:\n|\Z)` at end of item: a real line break OR end of buffer. Preserves the "list at EOB with no trailing newline" case while eliminating the backtrack route. Land backend/tests/test_trailing_plan.py at the same time, covering: - `_TRAILING_PLAN_INTENT` "let me know" closer exclusion (cycle-3 fix). - `_TRAILING_PLAN_LIST` correctly fires on genuine trailing lists (dash, asterisk, unicode bullet, numeric). - `_TRAILING_PLAN_LIST` does NOT fire on list + closing paragraph, list + closing sentence, list embedded mid-text, or the "42." in-prose digit case. - `_TRAILING_PLAN_COLON` fires on bare trailing intent-colons only. - `_trailing_plan_hit` composite cases. - The 600-char window slicing. 31 cases, all pass. Pins the regex behaviour against future regressions inside the repo (the prior pin script lived only in the probing workspace, not the studio tree).
…am cleanly Codex P2 on #5549 flagged that the cycle-5 cursor-reset (efa43c4) keyed on every empty-status event, but generate_chat_completion_with_tools emits empty status events in five places, only two of which are real iteration boundaries: - Line 4497: emitted right before `continue` after a re-prompt / auto-continue. The next iteration starts a fresh assistant turn. BOUNDARY: reset cursor. - Line 4766: emitted right before `continue` after a tool call. The next iteration regenerates with tool results in history. BOUNDARY: reset cursor. - Line 4501: emitted at metadata-yield after normal streaming. Stream is about to end, no new iteration follows. NOT a boundary. - Line 4584: emitted in DRAINING-no-tool-call fallback path. Stream is about to end with buffered content_accum. NOT a boundary. - Line 4794: emitted at the final exit of the generator. NOT a boundary. Treating all five as boundaries gave every Anthropic-streaming response an extra content_block_stop + content_block_start pair around its final text and around every tool call. Fix by tagging the two real boundary sites with `"boundary": True` and tightening both the Anthropic emitter (`anthropic_compat.py`) and the OpenAI-compat tool path + Anthropic non-streaming path (`routes/inference.py`) to reset the cumulative-text cursor only when that flag is set. Plain empty-status events keep their existing badge- clear semantics on the frontend (`tool_status` SSE with content ""). Add two regression tests in `backend/tests/test_anthropic_messages.py::TestAnthropicStreamEmitter`: - test_boundary_flag_closes_block_and_resets_cursor: a boundary=True status closes the open text block and the next content delta streams from zero. - test_empty_status_without_boundary_does_not_close_block: a plain empty status leaves block_index unchanged and the next content delta is diffed against the previous text length. 107 tests pass across the four anthropic + trailing-plan test files.
… tool_end Codex 02:42Z on #5549 caught that flagging the post-tool empty-status event with boundary=True (cycle-12 commit 610c387) double-handles the cursor reset in the Anthropic streaming path. AnthropicStreamEmitter._handle_tool_end already: - closes the open tool_use block, - emits tool_result, - increments block_index, - opens a fresh text block, and - resets _prev_text = "". When llama_cpp.py then yielded boundary=True on the very next event, _handle_boundary fired _close_block + _open_text_block on that freshly opened (still-empty) text block. Result: every tool call produced a spurious content_block_stop + content_block_start pair before the post-tool model text streamed. Fix: - llama_cpp.py: drop boundary=True from the post-tool status emit (line ~4768). Keep boundary=True only at the auto-continue site (line ~4500), which has no preceding tool_end to do the cursor work. - routes/inference.py OpenAI-compat tool stream: mirror the Anthropic semantics by resetting prev_text on BOTH tool_start AND tool_end, so the post-tool empty-status no longer needs to do it. Add backend/tests/test_anthropic_messages.py::TestAnthropicStreamEmitter:: test_post_tool_empty_status_does_not_double_close as a regression test: content -> tool_start -> tool_end -> empty status -> content must not bump block_index past tool_end's increment, and the post-tool content must land in the text block tool_end opened. 95 tests pass across the anthropic + trailing-plan suites.
003ae8a to
a1d79b7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1d79b761e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self._text_block_open: | ||
| events.append(self._close_block()) | ||
| self.block_index += 1 | ||
| events.extend(self._open_text_block()) |
There was a problem hiding this comment.
Defer opening boundary text blocks until content arrives
When a server-tool Anthropic stream first emits intent text and the reprompt then produces a tool call, the boundary status emitted by llama_cpp.py line 4463 enters _handle_boundary(). Opening a new text block here means the following tool_start immediately closes that just-opened block before emitting the tool_use, so clients receive a zero-length text content block between the intent text and the tool call. Reset the cursor and close the current block at the boundary, but wait to open a text block until _handle_content() actually sees text.
Useful? React with 👍 / 👎.
…undary PR 5549 cycle-15 codex review surfaced two P2 issues: 1. Trailing-plan regex matched "Now let me know if you want another example" (the closer "let me know" was negative-lookahead-guarded only on the bare alternative, not on "now let me"). Refactor to a single `(?:now\s+)?let me(?!\s+know\b)` alternative so both prefixes share the same guard. 2. Anthropic adapter opened a new text block immediately on the iteration boundary marker. If the reprompted iteration started with a tool_call rather than text, that just-opened text block was immediately closed, emitting a zero-length text content block between intent text and tool_use. Defer the open: close the current block + bump block_index + reset cursor at the boundary, and let `_handle_content` lazy-open when real text arrives.
Summary
The existing intent-signal re-prompt at
studio/backend/core/inference/llama_cpp.pyfires only when (a) tools are armed and (b) the response is short. In practice models often stop mid-plan in three other shapes that this path misses:"Let me clone the repo.""Let me now proceed to:\n1. ...\n6. Run validation""Let me check the repo:"When the model stops on any of these the turn ends with the workload only partially delivered.
Changes
_TRAILING_PLAN_INTENT,_TRAILING_PLAN_LIST,_TRAILING_PLAN_COLON) covering the observed shapes, with a 600-char trailing window so list endings are caught even when the cue is several lines above the last item._trailing_plan_hit()helper that ORs the three matchers._MAX_CONTINUES = 3and a separate_continue_countso the auto-continue budget does not steal from the tool-coercive re-prompt budget.generate_chat_completion_with_tools, branch the existing_S_STREAMINGfollow-up block: tool-coercive re-prompt keeps the old behavior, the new auto-continue path nudges with a neutral"Continue."and works with or without tools.Test plan
python -c 'import ast; ast.parse(open(path).read())'clean"Let me check the repo structure."-> match"Let me now proceed to:\n1. Clone\n...\n6. Run validation"-> match"I have the answer. Let me show you the diff:"-> match"Here's my plan:\n- Investigate\n- Fix\n- Test"-> match"The answer is 42. I have addressed all 9 sections."-> no match"In the first place, let me clarify. The fix is X. The tests all pass. We are done."-> no match