Add a way to push data to Mesop client when using Web Sockets. #1175

richard-to · 2025-01-05T02:25:22Z

With Gemini 2.0's bidirectional API, it would be nice if we can have a way to push data directly to the client when using web sockets.

One thing I've experimented with is just using a really long running async event handler. So on click, start the Gemini Live connection. Basically I would like to be able to stream the audio responses to the Mesop client. Right now still working out how to yield the audio responses from the async loop. I think it could be possible, so still trying.

Alternatives considered:

Just use the normal input and response flow and return audio, which I think for my use case should be ok since I would be able to stream the audio output to a custom web component using AudioContext API. However, I'm curious if we can get something like in AI Studio.

Other considerations:

So far I've only focused on audio, but I maybe have a screenshare aspect to my demo as well, but I think it should be doable without WebRTC.

richard-to · 2025-01-06T16:15:20Z

Actually, I was able to get the audio streaming to work with essentially a handler that continuously loops and yields. Still pretty hacky in terms of implementation. Need to click a button to trigger the initialization of the gemini API websocket connection, but I think that's actually ok in this case.

richard-to · 2025-01-10T05:39:08Z

Haven't been able to work on this much this week. But current issue is needing to handle streaming the audio input to Mesop server and then forwarding it to the Gemini API websocket. The issue is that the audio loop that is streaming out is on a different async loop. I was hoping that I could push the audio stream into the input queue, but seems like not able to add to the input queue from a different async loop essentially. I feel like there should be a way to do this though. But haven't had a chance to look into it further.

richard-to · 2025-01-17T01:46:59Z

So finally got a demo somewhat usable working. Only audio input / output for now.

I was planning to add the demo to this repo, but for some reason the code doesn't work with 3.10 which I think is still the minimum Mesop python version.

So created a separate repo (I need to update the readme with some usage instructions -- but basically just need to enable websockets and set a Google API Key): https://github.com/richard-to/mesop-gemini-2-experiments

richard-to · 2025-01-18T23:42:12Z

After looking at the code in https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide, it seems I didn't need to proxy the streams to the python side.

I guess the drawback is that right now web components can't directly communicate with each other, so we'd have to create one giant web component which slightly defeats the purposes of using web components and Mesop. So maybe for now proxying the streams to a web socket may be the easier option still for Mesop. I guess we could create web component that creates the web socket connection, but then we'd be sending data from client to Mesop to client and then Gemini API. I guess it still may be better to have the web socket connection on backend just to hide the API key.

richard-to added the feature request New feature / API label Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to push data to Mesop client when using Web Sockets. #1175

Add a way to push data to Mesop client when using Web Sockets. #1175

richard-to commented Jan 5, 2025

richard-to commented Jan 6, 2025

richard-to commented Jan 10, 2025

richard-to commented Jan 17, 2025

richard-to commented Jan 18, 2025

Add a way to push data to Mesop client when using Web Sockets. #1175

Add a way to push data to Mesop client when using Web Sockets. #1175

Comments

richard-to commented Jan 5, 2025

richard-to commented Jan 6, 2025

richard-to commented Jan 10, 2025

richard-to commented Jan 17, 2025

richard-to commented Jan 18, 2025