-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to push data to Mesop client when using Web Sockets. #1175
Comments
Actually, I was able to get the audio streaming to work with essentially a handler that continuously loops and yields. Still pretty hacky in terms of implementation. Need to click a button to trigger the initialization of the gemini API websocket connection, but I think that's actually ok in this case. |
Haven't been able to work on this much this week. But current issue is needing to handle streaming the audio input to Mesop server and then forwarding it to the Gemini API websocket. The issue is that the audio loop that is streaming out is on a different async loop. I was hoping that I could push the audio stream into the input queue, but seems like not able to add to the input queue from a different async loop essentially. I feel like there should be a way to do this though. But haven't had a chance to look into it further. |
So finally got a demo somewhat usable working. Only audio input / output for now. I was planning to add the demo to this repo, but for some reason the code doesn't work with 3.10 which I think is still the minimum Mesop python version. So created a separate repo (I need to update the readme with some usage instructions -- but basically just need to enable websockets and set a Google API Key): https://github.com/richard-to/mesop-gemini-2-experiments |
After looking at the code in https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide, it seems I didn't need to proxy the streams to the python side. I guess the drawback is that right now web components can't directly communicate with each other, so we'd have to create one giant web component which slightly defeats the purposes of using web components and Mesop. So maybe for now proxying the streams to a web socket may be the easier option still for Mesop. I guess we could create web component that creates the web socket connection, but then we'd be sending data from client to Mesop to client and then Gemini API. I guess it still may be better to have the web socket connection on backend just to hide the API key. |
With Gemini 2.0's bidirectional API, it would be nice if we can have a way to push data directly to the client when using web sockets.
One thing I've experimented with is just using a really long running async event handler. So on click, start the Gemini Live connection. Basically I would like to be able to stream the audio responses to the Mesop client. Right now still working out how to yield the audio responses from the async loop. I think it could be possible, so still trying.
Alternatives considered:
Other considerations:
So far I've only focused on audio, but I maybe have a screenshare aspect to my demo as well, but I think it should be doable without WebRTC.
The text was updated successfully, but these errors were encountered: