Skip to content

Conversation

@aharvard
Copy link
Collaborator

@aharvard aharvard commented Jun 16, 2025

Integrate MCP UI Resources

This PR integrates https://mcpui.dev/ in order to enable MCP servers to render UIs within the goose message thread (and implements: #3562).

mcp-ui-allbirds-compressed.mov

Demo video shows https://mcpstorefront.com/?store=allbirds.com&style=default set up as a goose extension:

Demo MCP Server

Want to test this out locally?

  1. Add a new goose extension with the following (extension type: HTTP; endpoint: https://mcpstorefront.com/?store=allbirds.com&style=default)
  2. Ask goose to help you shop for some allbirds
image

Important

For more context, this PR aims to advance this popular MCP discussion on a new content type for "UI": https://github.com/orgs/modelcontextprotocol/discussions/287

@aharvard aharvard marked this pull request as ready for review June 16, 2025 18:13
@opdich
Copy link
Contributor

opdich commented Jun 16, 2025

I think this might warrant its own messaging content type (like we do with MarkdownContent or ToolCall). Barring that, you could leverage the annotations.priority field to be > 0.5 in order to force the toolcalls to open on default.

/>
)}
</div>
{result.type === 'resource' && result.resource.uri?.startsWith('ui://') ? (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we HTML escape the result text/data anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No escaping as of now. Does an MCP server returning a resource open us up to prompt injection even though we send the html content to an iframe?

I'd love to learn more about how to be safe here.

Also, we're supporting more than just HTML strings as text. There are four kinds of data that are passed to @mcp-ui/client's iframe component — HTML strings are sent to the srcDoc attribute and URL strings are sent to the src attribute.

HTML string as text

{
    "uri": "ui://component-html-as-text",
    "mimeType": "text/html",
    "text": "<style>\n    * {\n      box-sizing: border-box;\n    }\n    body   {\n      margin: 0;\n      padding: 0;\n    }\n    main {\n      background: var(--bg, black);\n      min-height: 100vh;\n      padding: 20px;\n    }\n    main.a {\n      --bg: linear-gradient(to right, red, orange, yellow, green, blue, indigo, violet);\n    }\n    main.b {\n      --bg: linear-gradient(to right, #1a1a1a, #2d2d2d, #404040);\n    }\n    .wrapper {\n      background: white;\n      padding: 20px;\n      border-radius: 10px;\n    }\n    .button {\n      background: blue;\n      color: white;\n      padding: 10px;\n      border-radius: 5px;\n      cursor: pointer;\n    }\n    .buttons {\n      display: flex;\n      gap: 10px;\n    }\n    .button-actionA {\n      background: green;\n    }\n    .button-actionB {\n      background: red;\n    }\n    .photo-gallery {\n      display: grid;\n      grid-template-columns: repeat(auto-fill, minmax(300px, 1fr));\n      gap: 10px;\n    }\n    .photo-gallery img {\n      width: 100%;\n      height: 100%;\n      object-fit: contain;\n    }\n  </style><main class=\"a\"><div class=\"wrapper wrapper-a\"><h1>Hello World</h1><p>this HTML is in text format</p><button class=\"button button-actionA\" onclick=\"   \n    window.parent.postMessage(\n      { \n        type: 'tool', \n        payload: { \n          toolName: 'some_tool_name', \n          params: { \n            value: Date.now() \n          } \n        } \n      }, '*')\n    \">Tool Call</button> <button class=\"button button-actionB\" onclick=\"\n    window.parent.postMessage(\n      { \n        type: 'intent', \n        payload: {         \n          intent: 'resizeIframe', \n          params: {\n            minHeight: '100vh',\n            value: Date.now() \n          }\n        } \n      }, '*')\">Resize Iframe</button></div></main>"
}

HTML string as base64 blob

{
    "uri": "ui://component-html-as-blob",
    "mimeType": "text/html",
    "blob": "PHN0eWxlPgogICAgKiB7CiAgICAgIGJveC1zaXppbmc6IGJvcmRlci1ib3g7CiAgICB9CiAgICBib2R5ICAgewogICAgICBtYXJnaW46IDA7CiAgICAgIHBhZGRpbmc6IDA7CiAgICB9CiAgICBtYWluIHsKICAgICAgYmFja2dyb3VuZDogdmFyKC0tYmcsIGJsYWNrKTsKICAgICAgbWluLWhlaWdodDogMTAwdmg7CiAgICAgIHBhZGRpbmc6IDIwcHg7CiAgICB9CiAgICBtYWluLmEgewogICAgICAtLWJnOiBsaW5lYXItZ3JhZGllbnQodG8gcmlnaHQsIHJlZCwgb3JhbmdlLCB5ZWxsb3csIGdyZWVuLCBibHVlLCBpbmRpZ28sIHZpb2xldCk7CiAgICB9CiAgICBtYWluLmIgewogICAgICAtLWJnOiBsaW5lYXItZ3JhZGllbnQodG8gcmlnaHQsICMxYTFhMWEsICMyZDJkMmQsICM0MDQwNDApOwogICAgfQogICAgLndyYXBwZXIgewogICAgICBiYWNrZ3JvdW5kOiB3aGl0ZTsKICAgICAgcGFkZGluZzogMjBweDsKICAgICAgYm9yZGVyLXJhZGl1czogMTBweDsKICAgIH0KICAgIC5idXR0b24gewogICAgICBiYWNrZ3JvdW5kOiBibHVlOwogICAgICBjb2xvcjogd2hpdGU7CiAgICAgIHBhZGRpbmc6IDEwcHg7CiAgICAgIGJvcmRlci1yYWRpdXM6IDVweDsKICAgICAgY3Vyc29yOiBwb2ludGVyOwogICAgfQogICAgLmJ1dHRvbnMgewogICAgICBkaXNwbGF5OiBmbGV4OwogICAgICBnYXA6IDEwcHg7CiAgICB9CiAgICAuYnV0dG9uLWFjdGlvbkEgewogICAgICBiYWNrZ3JvdW5kOiBncmVlbjsKICAgIH0KICAgIC5idXR0b24tYWN0aW9uQiB7CiAgICAgIGJhY2tncm91bmQ6IHJlZDsKICAgIH0KICAgIC5waG90by1nYWxsZXJ5IHsKICAgICAgZGlzcGxheTogZ3JpZDsKICAgICAgZ3JpZC10ZW1wbGF0ZS1jb2x1bW5zOiByZXBlYXQoYXV0by1maWxsLCBtaW5tYXgoMzAwcHgsIDFmcikpOwogICAgICBnYXA6IDEwcHg7CiAgICB9CiAgICAucGhvdG8tZ2FsbGVyeSBpbWcgewogICAgICB3aWR0aDogMTAwJTsKICAgICAgaGVpZ2h0OiAxMDAlOwogICAgICBvYmplY3QtZml0OiBjb250YWluOwogICAgfQogIDwvc3R5bGU+PG1haW4gY2xhc3M9InBob3RvLWdhbGxlcnkiPjxpbWcgc3JjPSJodHRwczovL3BsYWNlaG9sZC5jby84MDB4NjAwL3BuZz90ZXh0PXBob3RvKzEiIGFsdD0icGhvdG8gMSI+PGltZyBzcmM9Imh0dHBzOi8vcGxhY2Vob2xkLmNvLzgwMHg2MDAvcG5nP3RleHQ9cGhvdG8rMiIgYWx0PSJwaG90byAyIj48aW1nIHNyYz0iaHR0cHM6Ly9wbGFjZWhvbGQuY28vODAweDYwMC9wbmc/dGV4dD1waG90byszIiBhbHQ9InBob3RvIDMiPjxpbWcgc3JjPSJodHRwczovL3BsYWNlaG9sZC5jby84MDB4NjAwL3BuZz90ZXh0PXBob3RvKzQiIGFsdD0icGhvdG8gNCI+PGltZyBzcmM9Imh0dHBzOi8vcGxhY2Vob2xkLmNvLzgwMHg2MDAvcG5nP3RleHQ9cGhvdG8rNSIgYWx0PSJwaG90byA1Ij48L21haW4+"
}

URL string as text

{
    "uri": "ui://website-url-as-text/https://www.wikipedia.com",
    "mimeType": "text/uri-list",
    "text": "https://www.wikipedia.com"
}

URL string as base64 blob

{
    "uri": "ui://website-url-as-blob/https://en.wikipedia.org/wiki/Model_Context_Protocol",
    "mimeType": "text/uri-list",
    "blob": "aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTW9kZWxfQ29udGV4dF9Qcm90b2NvbA=="
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation seems pretty vulnerable to XSS or Redirects via prompt injection with this implementation.

The following is an example but there are quite a lot of ways to attack this but the Iframe example I've pasted below (as @michaelneale mentioned the example)

<!DOCTYPE html>
<html lang="en">
<body>
  <h1>Iframe XSS Demo</h1>

  <iframe style="width: 100%; height: 200px;" srcdoc='
    <!DOCTYPE html>
    <html>
    <head>
      <title>Iframe Script</title>
    </head>
    <body>
      <script>
        // Blank script
        alert("bad stuff")
      </script>
    </body>
    </html>
  '></iframe>

</body>
</html>

A much safer implementation would be building a component library (or using an existing one) and then allowing goose to provide input vars to those components (we could expand that component set pretty fast over time)

@tobinsouth
Copy link

Currently, this just renders html, which has security risks as things get complicated. mcp-ui is looking to support remotedom or web components. Let's pull in some discussion here with @idosal and @liady.

@michaelneale
Copy link
Collaborator

yeah I like this a lot, I am sure there are ways to avoid it pulling in things which would be risky (if suitably iframe rendered then it won't have access that any other tab would say in a browser, ideally?) - if that would still work?

but this is really cool and important I think

@michaelneale
Copy link
Collaborator

@tobinsouth mcp-ui looks really interesting would love to see where it goes if there is a way to do this securely, seems amazing.

@idosal
Copy link

idosal commented Jun 19, 2025

Thanks for bootstrapping this @aharvard ! I love the demo 😄

The security discussions are great. To clarify the current status, the initial mcp-ui implementation focused on proof of value with varying degrees of security -

  • Raw HTML - rendered in <iframe sandbox="allow-scripts"> (no forms, popups, etc.). It doesn't have access to the origin and parent, so XSS isn't possible. Having said that, while the strictly sandboxed iframe offers isolation, it can still execute scripts, so it's not 100% bulletproof.
  • External app - This option was developed to demonstrate the possibilities of rich UI flows quickly. It's also in a sandboxed iframe (that has allow-same-origin). Ror 3rd party URLs, this option is as secure as the raw HTML method, since it can't access the parent.

As far as prompt injection goes -

  • The host decides if the LLM is exposed to the resource content. For example, Goose may render the resource using mcp-ui without passing the resource content to the LLM chat
  • UI components can trigger the onUiAction callback (via post messages). Goose can decide if and how it wants to follow up on intents (e.g., invoke tool calls if it deems them secure, etc.)

We're currently working on significantly better delivery methods. The north star -

  • Enable servers to deliver rich, interactive UIs with ergonomic APIs
  • Allow hosts to own the look and feel
  • Eliminate security concerns (limit/remove local code execution)

As @tobinsouth mentioned, the current focus is on a web components or remote-dom implementation that builds on host-provided component libraries. This should allow the server to set the UI with the host's look and feel without executing untrusted code.

The new version allows the host to whitelist acceptable content types (in the futhre it'll also be communicated to the server as part of a larger move for content type support).
Goose can choose to start by whitelisting the secure raw HTML method and add the richer web components/remote-dom implementations when they become available (soon!). @liady WDYT?

We'd love to hear your thoughts!

@aharvard
Copy link
Collaborator Author

@idosal, thanks for providing the context and roadmap!

I'd love to hear your thoughts (@liady too) on an idea we're considering.

At Block, we've had some success getting LLMs to generate UIs on demand within a chat. It’s similar to how your typical build-a-UI-with-an-LLM tool kinda works.

Considering how this might work for Goose, a sketch of the architecture could be:

  1. The MCP server offers a description or schema of an ideal UI (without returning HTML).
  2. The MCP host (Goose) supplies a component registry with details about props, usage, composition, etc.
  3. The LLM acts as the "layout engine," mapping the server's UI request to available components in the host and rendering the UI dynamically.

Building on that approach, I have been considering how we could utilize the MCP sampling spec (2025-06-18). This could help us follow the conventions of keeping a human-in-the-loop at two key points in time.

  1. Users need to be able to approve/reject a request for an MCP server to spend model tokens
  2. Users need to be able to approve/reject the risk of allowing the MCP host (Goose) to render UI from a trusted (and untrusted) MCP server

A new sampling/generateUI method could support a workflow. I'm unsure if we need it; the sampling/createMessage method might work just fine.

Here's what it might look like for a user to use Goose to search for photos in a digital asset manager.

sequenceDiagram
    participant User
    participant Goose
    participant LLM
    participant Server
    
    Note over User,Server: Typical Tool Call Flow
    User->>Goose: prompt: "find photos of sellers <br>with Square handhelds"
    Goose->>LLM: forward message to LLM
    LLM-->>Goose: try the search_assets tool
    Goose->>Server: call search_assets tool
    Server-->>Goose: return tool results

    rect rgb(234 243 246)
      Note over User,Server: Leverage sampling for UI generation
      Server->>Goose: initiate sampling/generateUI request (render a photo gallery UI)
      Note over User,Goose: Human-in-the-loop (part 1 of 2)
      Goose->>User: present request for sampling/generateUI
      User-->>Goose: review and approve/modify
      Note over Goose,LLM: Reconcile UI Request with Goose UI component registry
      Goose->>LLM: forward approved sampling/generateUI request
      LLM-->>Goose: return generated UI
      Note over User,Goose: Human-in-the-loop (part 2 of 2)
      Goose->>User: present request to render UI 
      User-->>Goose: review and approve/modify
      Goose-->>User: render UI (a photo gallery)
    end

    Note over User,Goose: UI Interaction
    User->>Goose: click on UI element
      alt UI state change
      Goose->>Goose: re-render UI
      else UI-driven LLM messaging
      Goose->>Goose: dispatch event
      Goose->>Goose: capture event
      Goose->>LLM: forward message to LLM
      LLM-->>Goose: respond based on message recieved 
      Note over Goose,LLM: Call tools, send prompts, send resources, etc
    end
Loading

Curious what folks think!

@idosal
Copy link

idosal commented Jul 6, 2025

Thanks @aharvard! Sorry about the delay, things have been a little hectic.
Giving more control to the host is a great direction! We have two main challenges -

  1. What’s the optimal way for servers to define the UI?
    It should be expressive enough for servers to control their brand and UX. Preferably, it should be understandable to LLMs for further manipulation. HTML/JS/DOM trees are existing options, but we can develop a new custom protocol.
  2. How should the hosts render it?
    If the schema is very abstract, we’ll need something like generative UI to fill in the gaps (I’d love to hear your experience with it and learn more). That does feel like something that calls for user approval since it "costs" tokens, which Sampling can facilitate.
    However, if we’re going with a richer schema, we can let the server use the host’s component library statically without relying on the LLM as the layout engine. For example, we can leverage RemoteDOM to allow the server to respond with a JS script that builds the UI using schematic components. The host renders its own components in place of the schematic ones, within its own tree. In the future, it may be possible to replace server-sent JavaScript with the schematic tree or any other schema. The "registry" can be set by convention and capability negotiation.

To see something like that in action, there's an mcp-ui demo for the RemoteDOM implementation, where the same server tool response is rendered with different component libraries (React/Web Components) -

remote-dom-demo.2.mp4

For UI rendering approval, we can either use Sampling or opt to use a custom client-side mechanism instead (e.g., an approval button before the component is rendered).

Overall, both directions have their advantages, and probably optimize for slightly different use cases. The exciting part is that we can experiment with both options within the library and see which one works best. What do you think?

P.S We revised the security model, and now all content types restrict remote code execution to sandboxed iframes, preventing XSS and parent-access altogether.

@michaelneale
Copy link
Collaborator

what if we had out of the box widgets for json-schema based form rendering (and other assets that are not arbitrary) and then an iframe/browser like escape hatch for the rest? (which I guess could have the MCP process serving up a localhost) - that browser/iframe would have no access to goose itself in the electron app/GUI, be just a conveniently located web view?

@aharvard
Copy link
Collaborator Author

Hey folks, we've identified a few work streams related to unlocking UI rendering in Goose Desktop. I've opened up an issue to mind-meld and find a path forward. Please share your thoughts! #3562
(cc @idosal, @liady, @tobinsouth)

@aharvard
Copy link
Collaborator Author

I just force-pushed a lot of updates to this PR to bring it current with all the new changes in Goose Desktop and to align it with phase one of our strategy outlined here: #3562. I also updated the main description above to match phase one of our strategy.

Reposting the same video as shown above:

mcp-ui-allbirds-compressed.mov

@michaelneale
Copy link
Collaborator

this change is looking nice and lean @aharvard - what would be the downside of bringing it in while it is so early? Is it usable in early state and safe enough? Would be great with some canonical MCP server examples or even a template that lets people play with it. Worst case could be a feature flag?

@DOsinga @zanesq - for consideration.

@DOsinga
Copy link
Collaborator

DOsinga commented Jul 31, 2025

My main concern here is that I don't think we're ready to commit that we want to keep this. At the same time, I'd love to see what people do with this and if we see marvelous stuff, that's a good reason to leave it in. Is there a place we can put a warning up here? Like maybe add a little icon to the renderer that says experimental might go away?

@DOsinga
Copy link
Collaborator

DOsinga commented Jul 31, 2025

also /cc @alexhancock & @jamadeo for MCP integration

@aharvard aharvard force-pushed the feat/integrate-mcp-ui branch from fbb842a to e8f193e Compare July 31, 2025 11:57
@acekyd
Copy link
Contributor

acekyd commented Jul 31, 2025

As @michaelneale mentioned, starting off with it as an experimental feature flag could be a great starting point that covers the areas being considered at the moment. We've always had one or two (experimental features) like this across desktop and CLI. To @DOsinga's point, it's also the only way for us to be able to see what people can do with it enough to keep it in. We have MCP night coming up in a few days and having this in even as an experimental feature would have some decent impact as well.

@aharvard aharvard force-pushed the feat/integrate-mcp-ui branch 3 times, most recently from ccd5c6a to 54e0673 Compare July 31, 2025 15:54
@aharvard
Copy link
Collaborator Author

aharvard commented Jul 31, 2025

@DOsinga, I added a message to the UI:

image image

@aharvard aharvard closed this Jul 31, 2025
@aharvard aharvard reopened this Jul 31, 2025
Copy link
Contributor

@Kvadratni Kvadratni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is extremely lightweight and save for now as it doesn't implement the actions from the UI's .
Ship--it!

@michaelneale michaelneale added p0 Priority 0 - Critical/Urgent ready labels Jul 31, 2025
@aharvard aharvard force-pushed the feat/integrate-mcp-ui branch 2 times, most recently from 1a0c6d4 to 63eead0 Compare August 1, 2025 00:48
@aharvard aharvard force-pushed the feat/integrate-mcp-ui branch from 63eead0 to 5bda0e9 Compare August 1, 2025 00:53
}

#[tokio::test]
#[ignore = "Databricks context truncation tests are flaky - skip in CI"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem ok - do you want to get rid of these ignores for now @aharvard ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed and checks are passing. ty!

id: generateId(),
role: apiMessage.role as Role,
created: apiMessage.created ?? 0,
created: apiMessage.created ?? Math.floor(Date.now() / 1000),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is better than my change, thanks

mod tests {
use super::ModelConfig;
use super::*;
use temp_env::with_var;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if relevant change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change occurred when I ran cargo fmt to fix one of the failed merge checks that look for formatting issues. I recall it being mentioned as an unused import.

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is nice and lean and something to get out there. am satisfied with the work done and the integration is very low friction if/as things change.

@aharvard aharvard merged commit 9006987 into block:main Aug 1, 2025
8 checks passed
zanesq added a commit that referenced this pull request Aug 1, 2025
…ipe-chat-via-deeplink

* 'main' of github.com:block/goose:
  Ensure more client (#3787)
  fix(ui): extension command text overflow (#3785)
  No tool role means we should not collapse messages (#3778)
  fix: bundle workflows (#3780)
  Update goose hints (#3758)
  integrate MCP UI (#2948)
  Fix claude model names (#3765)
  fix: don't return full shell output when very large (#3750)
  fix: cli tool logging (#3749)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

p0 Priority 0 - Critical/Urgent ready

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants