Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to push data to Mesop client when using Web Sockets. #1175

Open
richard-to opened this issue Jan 5, 2025 · 4 comments
Open

Add a way to push data to Mesop client when using Web Sockets. #1175

richard-to opened this issue Jan 5, 2025 · 4 comments
Labels
feature request New feature / API

Comments

@richard-to
Copy link
Collaborator

With Gemini 2.0's bidirectional API, it would be nice if we can have a way to push data directly to the client when using web sockets.


One thing I've experimented with is just using a really long running async event handler. So on click, start the Gemini Live connection. Basically I would like to be able to stream the audio responses to the Mesop client. Right now still working out how to yield the audio responses from the async loop. I think it could be possible, so still trying.

Alternatives considered:

  1. Just use the normal input and response flow and return audio, which I think for my use case should be ok since I would be able to stream the audio output to a custom web component using AudioContext API. However, I'm curious if we can get something like in AI Studio.

Other considerations:

So far I've only focused on audio, but I maybe have a screenshare aspect to my demo as well, but I think it should be doable without WebRTC.

@richard-to richard-to added the feature request New feature / API label Jan 5, 2025
@richard-to
Copy link
Collaborator Author

Actually, I was able to get the audio streaming to work with essentially a handler that continuously loops and yields. Still pretty hacky in terms of implementation. Need to click a button to trigger the initialization of the gemini API websocket connection, but I think that's actually ok in this case.

@richard-to
Copy link
Collaborator Author

Haven't been able to work on this much this week. But current issue is needing to handle streaming the audio input to Mesop server and then forwarding it to the Gemini API websocket. The issue is that the audio loop that is streaming out is on a different async loop. I was hoping that I could push the audio stream into the input queue, but seems like not able to add to the input queue from a different async loop essentially. I feel like there should be a way to do this though. But haven't had a chance to look into it further.

@richard-to
Copy link
Collaborator Author

So finally got a demo somewhat usable working. Only audio input / output for now.

I was planning to add the demo to this repo, but for some reason the code doesn't work with 3.10 which I think is still the minimum Mesop python version.

So created a separate repo (I need to update the readme with some usage instructions -- but basically just need to enable websockets and set a Google API Key): https://github.com/richard-to/mesop-gemini-2-experiments

Image

@richard-to
Copy link
Collaborator Author

After looking at the code in https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide, it seems I didn't need to proxy the streams to the python side.

I guess the drawback is that right now web components can't directly communicate with each other, so we'd have to create one giant web component which slightly defeats the purposes of using web components and Mesop. So maybe for now proxying the streams to a web socket may be the easier option still for Mesop. I guess we could create web component that creates the web socket connection, but then we'd be sending data from client to Mesop to client and then Gemini API. I guess it still may be better to have the web socket connection on backend just to hide the API key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature / API
Projects
None yet
Development

No branches or pull requests

1 participant