Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure readme for better and easier use #104

Merged
merged 10 commits into from
Jul 31, 2024
51 changes: 26 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,16 @@ docker run --restart=always -itd -p 8080:8080 \
--name astra_agents_server \
agoraio/astra_agents_server:latest

# Here are two TTS options, either one will work
# Make sure to comment out the one you don't use
# 1. using Azure
-e TTS_VENDOR_CHINESE=azure
-e AZURE_TTS_KEY=<your_azure_tts_key>
-e AZURE_TTS_REGION=<your_azure_tts_region>

# 2. using ElevenLabs
-e TTS_VENDOR_ENGLISH=elevenlabs
-e ELEVENLABS_TTS_KEY=<your_elevanlabs_tts_key>
# For Chinese, using either Azure or ElevenLabs
-e TTS_VENDOR_CHINESE=azure
# -e TTS_VENDOR_CHINESE=elevenlabs
-e AZURE_TTS_KEY=<your_azure_tts_key>
-e AZURE_TTS_REGION=<your_azure_tts_region>
# For English, using either ElevenLabs or Azure
-e TTS_VENDOR_ENGLISH=elevenlabs
# -e TTS_VENDOR_ENGLISH=azure
-e ELEVENLABS_TTS_KEY=<your_elevenlabs_tts_key>
```

This should start an agent server running on port 8080.
Expand Down Expand Up @@ -112,17 +112,19 @@ npm i && npm run dev
<br>
<h2>Agent Customization</h2>


To explore further, the ASTRA voice agent is an excellent starting point. It incorporates the following extensions, some of which will be interchangeable in the near future. Feel free to choose the ones that best suit your needs and maximize ASTRA’s capabilities.

| Extension | Feature | Description |
| ------------------ | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Extension | Feature | Description |
| ------------------ | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| openai_chatgpt | LLM | [ GPT-4o ](https://platform.openai.com/docs/models/gpt-4o), [ GPT-4 Turbo ](https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4), [ GPT-3.5 Turbo ](https://platform.openai.com/docs/models/gpt-3-5-turbo) |
| elevenlabs_tts | Text-to-speech | [ElevanLabs text to speech](https://elevenlabs.io/) converts text to audio |
| azure_tts | Text-to-speech | [Azure text to speech](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) converts text to audio |
| azure_stt | Speech-to-text | [Azure speech to text](https://azure.microsoft.com/en-us/products/ai-services/speech-to-text) converts audio to text |
| chat_transcriber | Transcriber | A utility ext to forward chat logs into channel |
| agora_rtc | Transporter | A low latency transporter powered by agora_rtc |
| interrupt_detector | Interrupter | A utility ext to help interrupt agent |
| elevenlabs_tts | Text-to-speech | [ElevenLabs text to speech](https://elevenlabs.io/) converts text to audio |
| azure_tts | Text-to-speech | [Azure text to speech](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) converts text to audio |
| azure_stt | Speech-to-text | [Azure speech to text](https://azure.microsoft.com/en-us/products/ai-services/speech-to-text) converts audio to text |
| chat_transcriber | Transcriber | A utility extension to forward chat logs into channel |
| agora_rtc | Transporter | A low latency transporter powered by agora_rtc |
| interrupt_detector | Interrupter | A utility extension to help interrupt agent |


<h3>Voice Agent Diagram</h3>

Expand Down Expand Up @@ -167,17 +169,16 @@ export OPENAI_API_KEY=<your_openai_api_key>
export AZURE_STT_KEY=<your_azure_stt_key>
export AZURE_STT_REGION=<your_azure_stt_region>

# Here are two TTS options, either one will work
# Make sure to comment out the one you don't use

# 1. using Azure
# For Chinese, using either Azure or ElevenLabs
export TTS_VENDOR_CHINESE=azure
# export TTS_VENDOR_CHINESE=elevenlabs
export AZURE_TTS_KEY=<your_azure_tts_key>
export AZURE_TTS_REGION=<your_azure_tts_region>

# 2. using ElevenLabs
# For English, using either ElevenLabs or Azure
export TTS_VENDOR_ENGLISH=elevenlabs
export ELEVENLABS_TTS_KEY=<your_elevanlabs_tts_key>
# export TTS_VENDOR_ENGLISH=azure
export ELEVENLABS_TTS_KEY=<your_elevenlabs_tts_key>

# agent is ready to start on port 8080

Expand Down