Skip to content

Studio: agentic web_search action variants + image overlay polish (OpenAI)#5787

Open
danielhanchen wants to merge 5 commits into
mainfrom
studio/openai-web-search-actions-and-image-overlay
Open

Studio: agentic web_search action variants + image overlay polish (OpenAI)#5787
danielhanchen wants to merge 5 commits into
mainfrom
studio/openai-web-search-actions-and-image-overlay

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

Three Studio fixes for OpenAI gpt-5.x chat, all scoped to OpenAI paths (Anthropic / local llama.cpp paths untouched):

  1. Empty Web Search cards (Searching for ""): gpt-5.5 agentic search emits three action.type variants per the Responses-API docs: search (carries query), open_page (carries url), and find_in_page (carries url + pattern). The previous code only read action.query, so 6/6 or 15/15 invocations rendered as Searching for "" whenever the model browsed pages. The backend now dispatches on action.type and forwards the right fields (query / url / pattern / action_type) to the frontend, with fallbacks for action.queries[0] and item.query shapes plus a backfill from any prior event for the same id. The frontend renders three distinct labels:

    action.type collapsed running
    search (or older query) Searched "<query>" Searching for "<query>"...
    open_page Read <domain> Reading <domain>...
    find_in_page Found "<pattern>" in <domain> Finding "<pattern>" in <domain>...
    unknown / empty Web Search Searching...
  2. OpenAI citation markers leaking into final text: the backend rewriter (_replace_openai_citation_markers, added in Studio: rewrite OpenAI Responses citation markers to markdown links #5713) handles complete streams correctly, but if the SSE connection drops between the last text delta and response.completed / [DONE], deferred segments in pending_citation_segments can leak the raw cite<sid> markers into the persisted message. Added a defensive scrub in MarkdownText that drops any cite... block plus orphan PUA bytes before they ever reach Streamdown, so no garbled glyph reaches the user regardless of stream timing.

  3. Image overlay polish (follow-up to Improve image generation UI #5784):

    • Download and close buttons on the expanded generated-image overlay enlarged from size-7 + size-3.5 icons to size-10 + size-5 icons, with bg-primary/10 text-primary ring-primary/30 styling so they visually match the "Type edits below, then send" pill rather than being barely-visible muted ghost buttons.
    • Show more / Show less on the image generation prompt caption now uses ChevronRight (collapsed) and ChevronDown (expanded) icons, matching the other tool cards.

Stats

+192 / -44 across 6 files (3 backend, 3 frontend).

Test plan

  • pytest studio/backend/tests/test_openai_tool_result_fallbacks.py tests/test_openai_citation_markers.py tests/test_openai_citation_markers_edge.py — 55/55 pass (3 new web_search action-variant tests added)
  • tsc -b and bun run build — clean (no new warnings)
  • Local smoke: ./install.sh --local then unsloth studio -H 0.0.0.0 -p 8888 confirms editable install picks up changes; main and Web Search cards now show meaningful labels for every invocation
  • Did not touch the Anthropic web_search / web_fetch path, the Kimi server-side search path, or any local llama.cpp streaming handlers

Schema reference

OpenAI Web Search guide confirms action.type in {search, open_page, find_in_page} for gpt-5.x agentic search.

…enAI)

Backend: dispatch OpenAI Responses web_search_call on action.type
({search, open_page, find_in_page}) so agentic gpt-5.x calls render
meaningful per-card labels (`Read <url>`, `Find "<pattern>" in <url>`)
instead of empty quotes. Probe action.queries[0]/item.query as
fallbacks for older shapes. Backfill query from any prior event for the
same id.

Frontend: web-search card renders all three action variants in both
collapsed trigger and running states; falls back to `Searching...` when
nothing is known. Defensive scrubber in MarkdownText strips any
leftover U+E200/U+E201/U+E202 citation markers that survive the
backend rewriter (e.g. SSE dropped before the end-of-stream flush).

Image overlay: download + close buttons enlarged to size-10 with green
primary styling so they match the "Type edits below, then send" pill.
Image generation Show more/less now uses ChevronRight/Down icons to
match the other tool cards.

Tests: 3 new web_search action-variant tests; 55/55 pass.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for agentic web search actions (such as open_page and find_in_page) across the backend and frontend, adds defensive scrubbing of OpenAI citation markers in markdown text, and refines UI components for image generation and overlay buttons. The review feedback suggests replacing literal Private Use Area (PUA) characters in the citation scrubbing regex and string checks with explicit Unicode escape sequences to improve readability and prevent encoding issues.

Comment on lines +416 to +422
const OPENAI_CITE_MARKER_RE = /cite[^]*/g;
const OPENAI_PUA_ORPHAN_RE = /[]/g;
function scrubOpenAICitationMarkers(text: string): string {
if (!text) return text;
if (!text.includes("") && !text.includes("")) return text;
return text.replace(OPENAI_CITE_MARKER_RE, "").replace(OPENAI_PUA_ORPHAN_RE, "");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using literal Private Use Area (PUA) characters in regular expressions and string checks can lead to readability and maintainability issues, as these characters are often invisible or rendered as placeholder boxes in many editors and IDEs. They can also be prone to corruption or accidental modification during git operations or file encoding changes.

Using explicit Unicode escape sequences (\uE200, \uE201, \uE202) is much more robust, readable, and safe.

Suggested change
const OPENAI_CITE_MARKER_RE = /cite[^]*/g;
const OPENAI_PUA_ORPHAN_RE = /[]/g;
function scrubOpenAICitationMarkers(text: string): string {
if (!text) return text;
if (!text.includes("") && !text.includes("")) return text;
return text.replace(OPENAI_CITE_MARKER_RE, "").replace(OPENAI_PUA_ORPHAN_RE, "");
}
const OPENAI_CITE_MARKER_RE = /\uE200cite\uE202[^\uE201]*\uE201/g;
const OPENAI_PUA_ORPHAN_RE = /[\uE200\uE201\uE202]/g;
function scrubOpenAICitationMarkers(text: string): string {
if (!text) return text;
if (!text.includes("\uE201") && !text.includes("\uE200")) return text;
return text.replace(OPENAI_CITE_MARKER_RE, "").replace(OPENAI_PUA_ORPHAN_RE, "");
}

@danielhanchen

Copy link
Copy Markdown
Member Author

Follow-up commit ade719b9 addressing reviewer feedback:

Per-call web search sources — request now opts into include=[web_search_call.action.sources, web_search_call.results]. Each card carries its own URLs + snippets formatted as Title/URL/Snippet blocks, so every search card surfaces real sources instead of repeating the trigger label.

Code-execution card body — full untruncated command now rendered in its own block above the output, so long heredocs clipped in the trigger are reachable when expanded. Exit code is always shown (not only on failure). stdout is now read from stdout / text / content for tolerance, and an unrecognised entry shape is dumped as raw JSON so the user always sees what OpenAI returned. Empty output renders as italic "Command completed with no output." instead of nothing.

Image overlay — container is now transparent (no muted background, no border ring); the image itself carries the rounded corners and the buttons float over its top-right corner, so no yellow frame appears around narrow images.

Tests: +2 new web_search source-formatting tests. 66/66 OpenAI tests pass.

danielhanchen and others added 3 commits May 26, 2026 12:53
…_code, transparent image overlay

Backend (external_provider.py):
- Request `include=[web_search_call.action.sources, web_search_call.results]`
  so each web_search card has its consulted URLs and (for reasoning
  models) the search-result snippets attached. Format them as the
  Title/URL/Snippet blocks the frontend's source-pill parser already
  understands, so every search card now surfaces real sources instead
  of repeating the trigger label.
- _format_shell_output now always emits exit_code (not just non-zero),
  reads stdout from `stdout`/`text`/`content` for tolerance across
  shell_call_output revisions, and dumps the raw entry dict as a
  fallback when no recognised text fields are present so the user
  still sees what OpenAI sent. Drops the literal "(no output)" string
  in favour of an empty result; the frontend renders a friendlier
  placeholder.

Frontend:
- tool-ui-code-execution.tsx: expanded card now renders the full
  untruncated command in its own labelled block above the output, so
  long heredocs clipped in the trigger are reachable. Empty output is
  rendered as italic "Command completed with no output." instead of
  showing nothing.
- tool-ui-web-search.tsx: removed the <pre> fallback that just echoed
  the trigger label back; non-search cards now link to the page URL
  when no sources are present.
- thread.tsx: image overlay container is now transparent (no muted
  background, no border ring); the image itself carries the rounded
  corners and the buttons float over its top-right corner so no
  yellow frame appears around narrow images.

Tests: 2 new web_search source-formatting tests; 66/66 pass.
Previous overlay used `max-h-full max-w-full object-contain`, which fills
whatever flex area is available. On large viewports that scales a
generated image past its intrinsic resolution and turns it soft.

Explicit caps: `h-auto`, `w-auto`, `max-w-[520px]`, `max-h-[min(70vh, 620px)]`.
Aspect ratio preserved, only downscales. Wrapped the image in an
`inline-block` relative box so the download/close buttons anchor to the
image's actual top-right corner instead of the wider flex container.
assert starts[0]["arguments"]["pattern"] == "population"
assert starts[0]["arguments"]["action_type"] == "find_in_page"
assert "population" in ends[0]["result"]
assert "en.wikipedia.org" in ends[0]["result"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants