Skip to content

Commit

Permalink
Merge pull request #1 from cyfyifanchen/main
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
plutoless authored Jun 20, 2024
2 parents d6149d1 + b30814a commit 00afc10
Showing 1 changed file with 76 additions and 43 deletions.
119 changes: 76 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,42 @@
<img alt="astra.ai" width="300px" height="auto" src="https://github.com/rte-design/ASTRA.ai/assets/471561/ef098c57-9e5c-479d-8ca5-0ad62a1a1423">
</div>

# ASTRA.ai
ASTRA.ai is an agent framework that supports the creation of real-time multimodal AI Agents. It enables the rapid orchestration and reuse of the latest large model capabilities, achieving low-latency, real-time multimodal interaction with AI Agents.
<h1 align="center">Astra AI</h1>

ASTRA.ai is the perfect framework for building multimodal AI agents that communicate through text, vision, and audio using the latest AI capabilities, such as those from OpenAI, in real time.
<div align="center">

[![](https://dcbadge.limes.pink/api/server/6k6xtWtF)](https://discord.gg/6k6xtWtF)

</div>

<div align="center">
🎉 Creation of real-time multi-modal AI Agents 🎉

Enables the rapid orchestration and reuses of the latest large model capabilities, achieves low-latency, real-time multi-modal interactions with AI Agents.

</div>

</br>

## Quick Start
### Try out ASTRA.ai playground demo we deployed
We provide [a web playground](https://astra-agents.agora.io/) for you to experience.

### Run the example agent locally
Currently, the agent we build runs on Linux only, while we have prepared a Docker image so that you can build and run the agent on Windows / MacOS too.
### Playground

We have prepared a prebuilt agent to help you get started right away.
We provide a [playground](https://astra-agents.agora.io/) for you to play with.

To start, ensure you have following prepared,
- [Docker](https://www.docker.com/)
- We use [Agora](https://console.agora.io/) as RTC transport, so we need an agora APP_ID / APP_CERTIFICATE.
### Local Agent

Currently, the agent we have built runs only on Linux. However, we have a Docker image ready for you to build and run the agent on Windows and macOS.

To start, make sure you have:

- Agora App ID and App Certificate([Read here on how](https://docs.agora.io/en/3.x/video-calling/reference/manage-agora-account?platform=android))
- Azure's [speech-to-text](https://azure.microsoft.com/en-us/products/ai-services/speech-to-text) and [text-to-speech](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) API keys
- [OpenAI](https://openai.com/index/openai-api/) API keys.
- [Docker](https://www.docker.com/)


```
# run the prebuilt agent image
```shell
# run the pre-built agent image
docker run --restart=always -itd -p 8080:8080 \
-v /tmp:/tmp \
-e AGORA_APP_ID=<your_agora_appid> \
Expand All @@ -40,45 +53,52 @@ docker run --restart=always -itd -p 8080:8080 \

This should start an agent server running on port 8080.

### Use ASTRA.ai playground to connect to your agent
### Use Astra AI playground to connect to your agent

You can use the playground project to test with the server you just started.

The playground project is built based on Next.js, it requires Node.js 18.17 or above.
The playground project is built on NextJS 14, hence it needs Node 18+.

```
```shell
# set up an .env file
cp ./playground/.env.example ./playground/.env
cd playground

# install npm dependencies & start
npm i
npm run dev
npm i && npm run
```

Greetings ASTRA.ai Agent!
🎉 Now you have our Astra Agent.

</br>

## Concepts
The ASTRA App/Service is built from various ASTRA extensions developed in different programming languages. The concept of a graph is used to describe the relationships between these extensions and illustrate the flow of data. Additionally, sharing and downloading extensions are made easy through the ASTRA cloud store and ASTRA package manager.

The Astra Service is built from various Astra extensions developed in different programming languages. The concept of a graph is used to describe the relationships between these extensions and illustrate the flow of data. Additionally, sharing and downloading extensions are made easy through the Astra cloud store and Astra package manager.

<div align="center">
<img src="https://github.com/AgoraIO-Community/ASTRA.ai/assets/471561/9fd7fa08-4eff-46b0-bd50-012c8dccfd9a" width="800">
</div>

### Extension
An extension is the fundamental unit of composition. Developers can create extensions in various languages and combine them in different ways to build diverse scenarios and applications. The ASTRA.ai framework emphasizes cross-language collaboration, allowing extensions written in different programming languages to seamlessly work together within the same application or service.

An extension is the fundamental unit of composition. Developers can create extensions in various languages and combine them in different ways to build diverse scenarios and applications. The Astra framework emphasizes cross-language collaboration, allowing extensions written in different programming languages to seamlessly work together within the same application or service.

For example, if an application requires real-time communication (RTC) features and advanced AI capabilities, a developer might choose to write RTC-related extensions in C++ for its performance advantages in processing audio and video data. At the same time, they could develop AI extensions in Python to leverage its extensive libraries and frameworks for data analysis and machine learning tasks.

#### Supported Languages

Up until June 2024, we support extensions written in following languages,

- C++
- Golang
- Python (Planned in July)

### Graph
A graph is used to describe the data flow between extensions, a graph in ASTRA orchestrates how different extensions interact. For example, the text output from a speech-to-text (STT) extension might be directed to a large language model (LLM) extension. Essentially, a graph defines which extensions are involved and the direction of data flow between them. Developers can customize this flow, directing outputs from one extension, such as an STT, into another, like an LLM.

In ASTRA, there are four main types of data flow between extensions:
A graph is used to describe the data flow between extensions, a graph in Astra orchestrates how different extensions interact. For example, the text output from a speech-to-text (STT) extension might be directed to a large language model (LLM) extension. Essentially, a graph defines which extensions are involved and the direction of data flow between them. Developers can customize this flow, directing outputs from one extension, such as an STT, into another, like an LLM.

In Astra, there are four main types of data flow between extensions:

- Command
- Data
Expand All @@ -88,36 +108,43 @@ In ASTRA, there are four main types of data flow between extensions:
By specifying the direction of these data types in the graph, developers can enable mutual invocation and unidirectional data flow between plugins. This is especially useful for PCM and image data types, making audio and video processing simpler and more intuitive.

### Agent App
A runnable server-side participant application compiled to combine multiple **Extensions** following **Graph** rules to accomplish more sophisticated operations.

A runnable server-side participant application compiled to combine multiple **Extensions** following **Graph** rules to accomplish more sophisticated operations.

### Cloud Store
Cloud Store is a hub for developers to share their extensions or use extensions from other developers.

Cloud Store is a centralized platform for developers to share their extensions and access those created by others.

### Package Manager
Simplifies the process of uploading, sharing, downloading, and installing ASTRA extensions. Extensions can specify dependencies on other extensions and the environment, and the package manager automatically manages these dependencies, making the installation and release of extensions extremely convenient.

Simplifies the process of uploading, sharing, downloading, and installing Astra extensions. Extensions can specify dependencies on other extensions and the environment, and the package manager automatically manages these dependencies, making the installation and release of extensions extremely convenient.

</br>

# Fine-tune your agent

## Fine-tune your agent
### Example

This project provides an example Agent App to help you get started.
It uses following Extensions:
- *agora_rtc* / [Agora](https://docs.agora.io/en) for RTC transport + VAD + Azure speech-to-text (STT)
- *azure_tts* / [Azure](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) for text-to-speech (TTS)
- *openai_chatgpt* / [OpenAI](https://openai.com/index/openai-api/) for LLM
- *chat_transcriber* / A utility ext to forward chat logs into channel
- *interrupt_detector* / A utility ext to help interrupt agent

- _agora_rtc_ / [Agora](https://docs.agora.io/en) for RTC transport + VAD + Azure speech-to-text (STT)
- _azure_tts_ / [Azure](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) for text-to-speech (TTS)
- _openai_chatgpt_ / [OpenAI](https://openai.com/index/openai-api/) for LLM
- _chat_transcriber_ / A utility ext to forward chat logs into channel
- _interrupt_detector_ / A utility ext to help interrupt agent

<div align="left">
<img src="https://github.com/AgoraIO-Community/ASTRA.ai/assets/471561/bff35c13-e19b-43f7-ba1f-f9f0d7ec095f" width="600">
</div>

### Customize your own agent
We might want to add more flavours and customizations to make the agent better suited to our needs. To achieve this, we need to change the source code of extensions and build the agent ourselves.

We need to prepare the proper `manifest.json` file first.
You might want to add more flavors to make the agent better suited to your needs. To achieve this, you need to change the source code of extensions and build the agent yourselves.

```
You need to prepare the proper `manifest.json` file first.

```shell
# rename manifest example
cp ./agents/manifest.json.example ./agents/manifest.json

Expand All @@ -131,11 +158,12 @@ docker exec -it astra_agents_dev bash
make build
```

This will generate an agent executable. We can change the source code in `agents/addon/extension/openai_chatgpt/openai_chatgpt.go` for instance to adjust your prompt and openai parameters.
This will generate an agent executable. We can change the source code in `agents/addon/extension/openai_chatgpt/openai_chatgpt.go` for instance to adjust your prompts and OpenAI parameters.

Once done, we can use the following command to start a server which you can test out with ASTRA.ai playground like we did in previous steps.
Once done, we can use the following commands to start a server you then can test out with Astra Agent playground like we did in previous steps.

```shell

```
export AGORA_APP_ID=<your_agora_appid>
export AGORA_APP_CERTIFICATE=<your_agora_app_certificate>
export AZURE_STT_KEY=<your_azure_stt_key>
Expand All @@ -145,17 +173,22 @@ export AZURE_TTS_KEY=<your_azure_tts_key>
export AZURE_TTS_REGION=<your_azure_tts_region>

# agent is ready to start on port 8080

make run-server
```

</br>

## TODO

- [ ] Extension Language Support: Python
- [ ] Extension: elevenlabs, google, whisper, moondream
- [ ] Extension: Elevenlabs, Google, Whisper and Moondream
- [ ] Example Agent: real-time video agent
- [ ] Extension Store
- [ ] UI Graph Editor
- ...
Stay tuned!

</br>

## Code Contributors
Thanks to all contributors!

A heartfelt thanks to all contributors!

0 comments on commit 00afc10

Please sign in to comment.