-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Adds Audio Capability to Conversable Agents #2098
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2098 +/- ##
==========================================
+ Coverage 38.17% 48.04% +9.87%
==========================================
Files 78 81 +3
Lines 7812 8123 +311
Branches 1669 1865 +196
==========================================
+ Hits 2982 3903 +921
+ Misses 4581 3894 -687
- Partials 249 326 +77
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Amazing PR!! |
#1929 also introduces audio transcription in the context of documents in the websurfer. I'm willing to adjust that PR to use the same libraries or configurations or helper methods, but we should coordinate. I'm also super leery of adding an HTML-like prompting DSL right now, as explained here #2026 (comment) |
Sounds good. 1929 is big, and slightly unwieldly. Here's the relevant bit: autogen/autogen/browser_utils/mdconvert.py Lines 560 to 676 in 852ee33
I'm not super-attached to this solution, so happy to adopt any other approach that works reasonably well. I've found this transcription to better than Whisper etc. For me, file format support is very important, but I can handle any conversions needed. |
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
12853598 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | e43a86c | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | bdb40d7 | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | 954ca45 | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
10404662 | Triggered | Generic CLI Secret | eff19ac | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | 06a0a5d | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | 0524c77 | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | d7ea410 | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | e43a86c | .github/workflows/dotnet-build.yml | View secret |
10404662 | Triggered | Generic CLI Secret | 841ed31 | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | 802f099 | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | 9a484d8 | .github/workflows/dotnet-build.yml | View secret |
10404662 | Triggered | Generic CLI Secret | e973ac3 | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | 89650e7 | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | e07b06b | .github/workflows/dotnet-release.yml | View secret |
10404662 | Triggered | Generic CLI Secret | abe4c41 | .github/workflows/dotnet-build.yml | View secret |
10404662 | Triggered | Generic CLI Secret | 7362fb9 | .github/workflows/dotnet-release.yml | View secret |
12853599 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
10404694 | Triggered | Generic High Entropy Secret | e43a86c | test/oai/test_utils.py | View secret |
10404694 | Triggered | Generic High Entropy Secret | 954ca45 | test/oai/test_utils.py | View secret |
10404694 | Triggered | Generic High Entropy Secret | bdb40d7 | test/oai/test_utils.py | View secret |
10404695 | Triggered | Generic High Entropy Secret | abad9ff | test/oai/test_utils.py | View secret |
10404695 | Triggered | Generic High Entropy Secret | 954ca45 | test/oai/test_utils.py | View secret |
10404695 | Triggered | Generic High Entropy Secret | c7bb588 | test/oai/test_utils.py | View secret |
10404695 | Triggered | Generic High Entropy Secret | b97b99d | test/oai/test_utils.py | View secret |
10404695 | Triggered | Generic High Entropy Secret | e43a86c | test/oai/test_utils.py | View secret |
12853600 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
12853601 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
10493810 | Triggered | Generic Password | 49e8053 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 501610b | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 49e8053 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 501610b | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 97fa339 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 49e8053 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 97fa339 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 97fa339 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 501610b | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10404696 | Triggered | Generic High Entropy Secret | 954ca45 | test/oai/test_utils.py | View secret |
10404696 | Triggered | Generic High Entropy Secret | bdb40d7 | test/oai/test_utils.py | View secret |
10404696 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
10404696 | Triggered | Generic High Entropy Secret | e43a86c | test/oai/test_utils.py | View secret |
10422482 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
10422482 | Triggered | Generic High Entropy Secret | bdb40d7 | test/oai/test_utils.py | View secret |
12853602 | Triggered | Generic High Entropy Secret | 79dbb7b | test/oai/test_utils.py | View secret |
11616921 | Triggered | Generic High Entropy Secret | a86d0fd | notebook/agentchat_agentops.ipynb | View secret |
11616921 | Triggered | Generic High Entropy Secret | 394561b | notebook/agentchat_agentops.ipynb | View secret |
11616921 | Triggered | Generic High Entropy Secret | 3eac646 | notebook/agentchat_agentops.ipynb | View secret |
11616921 | Triggered | Generic High Entropy Secret | f45b553 | notebook/agentchat_agentops.ipynb | View secret |
11616921 | Triggered | Generic High Entropy Secret | 6563248 | notebook/agentchat_agentops.ipynb | View secret |
12853598 | Triggered | Generic High Entropy Secret | 2b3a9ae | test/oai/test_utils.py | View secret |
12853598 | Triggered | Generic High Entropy Secret | c03558f | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | c03558f | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | 2b3a9ae | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | 0a3c6c4 | test/oai/test_utils.py | View secret |
10404693 | Triggered | Generic High Entropy Secret | 76f5f5a | test/oai/test_utils.py | View secret |
10404662 | Triggered | Generic CLI Secret | 954ca45 | .github/workflows/dotnet-build.yml | View secret |
12853599 | Triggered | Generic High Entropy Secret | 2b3a9ae | test/oai/test_utils.py | View secret |
12853599 | Triggered | Generic High Entropy Secret | c03558f | test/oai/test_utils.py | View secret |
10404694 | Triggered | Generic High Entropy Secret | 76f5f5a | test/oai/test_utils.py | View secret |
10404694 | Triggered | Generic High Entropy Secret | 0a3c6c4 | test/oai/test_utils.py | View secret |
10404695 | Triggered | Generic High Entropy Secret | 3b79cc6 | test/oai/test_utils.py | View secret |
10404695 | Triggered | Generic High Entropy Secret | 11baa52 | test/oai/test_utils.py | View secret |
12853600 | Triggered | Generic High Entropy Secret | c03558f | test/oai/test_utils.py | View secret |
12853600 | Triggered | Generic High Entropy Secret | 2b3a9ae | test/oai/test_utils.py | View secret |
12853601 | Triggered | Generic High Entropy Secret | c03558f | test/oai/test_utils.py | View secret |
12853601 | Triggered | Generic High Entropy Secret | 2b3a9ae | test/oai/test_utils.py | View secret |
10493810 | Triggered | Generic Password | 3b79cc6 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 11baa52 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 11baa52 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 3b79cc6 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10404696 | Triggered | Generic High Entropy Secret | 0a3c6c4 | test/oai/test_utils.py | View secret |
10404696 | Triggered | Generic High Entropy Secret | 76f5f5a | test/oai/test_utils.py | View secret |
10404696 | Triggered | Generic High Entropy Secret | c03558f | test/oai/test_utils.py | View secret |
10404696 | Triggered | Generic High Entropy Secret | 2b3a9ae | test/oai/test_utils.py | View secret |
10422482 | Triggered | Generic High Entropy Secret | 2b3a9ae | test/oai/test_utils.py | View secret |
10422482 | Triggered | Generic High Entropy Secret | c03558f | test/oai/test_utils.py | View secret |
and 16 others.
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Why are these changes needed?
This PR introduces the
SpeechToText
andTextToSpeech
capabilities. These new capabilities allow the agent to process audio inputs from users by transcribing the audio into text or produce audio outputs by synthesizing text into speech.How does it work?
An agent can pass an audio tag to a message to indicate there's an audio file that needs transcription, or a text_file/text that needs to be converted to an audio file.
Transcription example:
Hello what does this audio file say? <audio file_path=''hello_autogen.mp3' task='transcribe'>
It says hello AutoGen!
Generation example:
Can you synthesize this audio? <audio text_file=''hello_autogen.txt' task='generate'>
The audio file is generated and can be found in output.mp3
Design choices
I had to make some interesting design choices that were not used in the rest of the codebase. It was mainly guided by the need to handle a variety of failure modes gracefully.
Pydantic for Configuration Validation: Used Pydantic models to validate the audio transcriber and generator configurations provided by agents. This ensures LLM agents don't hallucinate random parameters and allows us to fail fast when the configuration is invalid. Reduces the risk of runtime errors due to bad configurations.
Returning None for Exception Handling: In many places in the code, there’s a risk of exceptions being raised. To handle these situations gracefully, the functions are designed to return None when an exception occurs. This approach allows the agent to continue operating even when one part of the system encounters an error. It’s up to the calling code to check for a None return value and handle it appropriately.
Modular Design for Audio Processing Components: Initially, I had conceptualized a single class,
AudioCapability
, to handle all audio-related functionalities. However, as the development progressed, I noticed that this approach led to an unwieldy class with too many responsibilities. I decided to break down theAudioCapability
into smaller classes, namelySpeechToText
andTextToSpeech
. This also opens up possibilities for future enhancements, such as adding translation capabilities.Related issue number
Checks
Tasks