Skip to content
This repository has been archived by the owner on Jun 9, 2024. It is now read-only.

Add voice command plugin #136

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@
/src/autogpt_plugins/wikipedia_search @pierluigi-failla
/src/autogpt_plugins/api_tools @sidewaysthought
/src/autogpt_plugins/random_values @sidewaysthought
/src/autogpt_plugins/bluesky @hermanschutte
/src/autogpt_plugins/bluesky @hermanschutte
/src/autogpt_plugins/voice_command @armmarov
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ You can also see the plugins here:
| Random Values | Enable AutoGPT to generate various random numbers and strings. | [autogpt_plugins/random_values](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/random_values) |
| SceneX | Explore image storytelling beyond pixels with the Auto-GPT SceneX Plugin. | [autogpt_plugins/scenex](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/scenex) |
| Twitter | AutoGPT is capable of retrieving Twitter posts and other related content by accessing the Twitter platform via the v1.1 API using Tweepy. | [autogpt_plugins/twitter](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/twitter) |
| Voice Command | Enable two-ways conversation via voice command plugin integration into Auto-GPT. | [autogpt_plugins/voice_command](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/voice_command) |
| Wikipedia Search | This allows AutoGPT to use Wikipedia directly. | [autogpt_plugins/wikipedia_search](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/wikipedia_search) |

Some third-party plugins have been created by contributors that are not included in this repository. For more information about these plugins, please visit their respective GitHub pages.
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,7 @@ validators
pytest
pytest-cov
atproto
requests
bs4
sounddevice
vosk
playsound
62 changes: 62 additions & 0 deletions src/autogpt_plugins/voice_command/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# autogpt-voice-command

A plugin adding voice command integration into Auto GPT

## Features (more coming soon!)

- speak through microphone with auto-gpt
- support kaldi recognizer library

## Installation

1. Clone this repo as instructed in the main repository
2. Add this chunk of code along with your voice command API information to the `.env` file within AutoGPT:

```
CHAT_MESSAGES_ENABLED=True

...

################################################################################
### VOICE COMMAND
################################################################################

VOICE_COMMAND_ENABLE=True
VOICE_COMMAND_SDK=kaldi
VOICE_COMMAND_INITCALL=hello
VOICE_COMMAND_CONFIRM=True
```

- VOICE_COMMAND_ENABLE is used to enable to voice command plugin
- VOICE_COMMAND_SDK is used to determine which library used for the speech recognition. Currently only kaldi is
available and fully tested
- VOICE_COMMAND_INITCALL is used to wake the system up before providing any question
- VOICE_COMMAND_CONFIRM is used to enable confirmation on user's question before sending to autogpt. Due to the accent
or vocabulary limitation, the library may wrongly recognize speech text from user, so user can repeat the question if
necessary

3. Download vosk model from https://alphacephei.com/vosk/models to the autogpt root directory
4. Extract the model and rename the directory to 'model'

```
For example:

Change directory to Auto-GPT based folder
# cd Auto-GPT

Copy the downloaded model
# cp ~/vosk-model-small-en-us-0.15.zip .

Unzip the model file
# unzip vosk-model-small-en-us-0.15.zip

Rename the model's name
# mv vosk-model-small-en-us-0.15 model
```

## Usage

1. It is more interactive to use this plugin along with TTS enabled (--speak)
2. To authorize commands in auto-gpt with a simple yes or no, user can just say 'yes' or 'no'. The plugin will automatically change the wording to character 'y' or 'n' which understood by auto-gpt to execute the command
3. To ensure the system will only process based on user's intention, user needs to initiate the call by using wording defined by VOICE_COMMAND_INITCALL. System will reply 'yes sir', before user can start any conversation
4. To prevent any wrong data being processed, user can enable the VOICE_COMMAND_CONFIRM flag. System will double confirm the question or command from user. User needs to reply 'yes' or 'no' accordingly. If 'no', then user can directly provide the command again
286 changes: 286 additions & 0 deletions src/autogpt_plugins/voice_command/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
import os
from typing import Any, Dict, List, Optional, Tuple, TypedDict, TypeVar

from auto_gpt_plugin_template import AutoGPTPluginTemplate
from colorama import Fore
from .voice_command_kaldi import VoiceCommandKaldi

PromptGenerator = TypeVar("PromptGenerator")


class Message(TypedDict):
role: str
content: str


class AutoGPTVoiceCommand(AutoGPTPluginTemplate):
"""
Voice Command plugin for Auto-GPT.
"""

def __init__(self):
super().__init__()
self._name = "autogpt-voice-command"
self._version = "0.1.0"
self._description = "Enable Auto-GPT with voice command."

self.voice_command_enable = os.getenv("VOICE_COMMAND_ENABLE")
self.voice_command_sdk = os.getenv("VOICE_COMMAND_SDK")

if self.voice_command_sdk and self.voice_command_sdk.lower() == "kaldi":
self.voice_command = VoiceCommandKaldi()
else:
print("Voice command SDK is not selected")

def can_handle_on_response(self) -> bool:
"""This method is called to check that the plugin can
handle the on_response method.
Returns:
bool: True if the plugin can handle the on_response method."""
return False

def on_response(self, response: str, *args, **kwargs) -> str:
"""This method is called when a response is received from the model."""
pass

def can_handle_post_prompt(self) -> bool:
"""This method is called to check that the plugin can
handle the post_prompt method.
Returns:
bool: True if the plugin can handle the post_prompt method."""
return False

def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
"""This method is called just after the generate_prompt is called,
but actually before the prompt is generated.

Args:
prompt (PromptGenerator): The prompt generator.

Returns:
PromptGenerator: The prompt generator.
"""
pass

def can_handle_on_planning(self) -> bool:
"""This method is called to check that the plugin can
handle the on_planning method.
Returns:
bool: True if the plugin can handle the on_planning method."""
return False

def on_planning(
self, prompt: PromptGenerator, messages: List[str]
) -> Optional[str]:
"""This method is called before the planning chat completion is done.
Args:
prompt (PromptGenerator): The prompt generator.
messages (List[str]): The list of messages.
"""
pass

def can_handle_post_planning(self) -> bool:
"""This method is called to check that the plugin can
handle the post_planning method.
Returns:
bool: True if the plugin can handle the post_planning method."""
return False

def post_planning(self, response: str) -> str:
"""This method is called after the planning chat completion is done.
Args:
response (str): The response.
Returns:
str: The resulting response.
"""
pass

def can_handle_pre_instruction(self) -> bool:
"""This method is called to check that the plugin can
handle the pre_instruction method.
Returns:
bool: True if the plugin can handle the pre_instruction method."""
return False

def pre_instruction(self, messages: List[str]) -> List[str]:
"""This method is called before the instruction chat is done.
Args:
messages (List[str]): The list of context messages.
Returns:
List[str]: The resulting list of messages.
"""
pass

def can_handle_on_instruction(self) -> bool:
"""This method is called to check that the plugin can
handle the on_instruction method.
Returns:
bool: True if the plugin can handle the on_instruction method."""
return False

def on_instruction(self, messages: List[str]) -> Optional[str]:
"""This method is called when the instruction chat is done.
Args:
messages (List[str]): The list of context messages.
Returns:
Optional[str]: The resulting message.
"""
pass

def can_handle_post_instruction(self) -> bool:
"""This method is called to check that the plugin can
handle the post_instruction method.
Returns:
bool: True if the plugin can handle the post_instruction method."""
return False

def post_instruction(self, response: str) -> str:
"""This method is called after the instruction chat is done.
Args:
response (str): The response.
Returns:
str: The resulting response.
"""
pass

def can_handle_pre_command(self) -> bool:
"""This method is called to check that the plugin can
handle the pre_command method.
Returns:
bool: True if the plugin can handle the pre_command method."""
return False

def pre_command(
self, command_name: str, arguments: Dict[str, Any]
) -> Tuple[str, Dict[str, Any]]:
"""This method is called before the command is executed.
Args:
command_name (str): The command name.
arguments (Dict[str, Any]): The arguments.
Returns:
Tuple[str, Dict[str, Any]]: The command name and the arguments.
"""
pass

def can_handle_post_command(self) -> bool:
"""This method is called to check that the plugin can
handle the post_command method.
Returns:
bool: True if the plugin can handle the post_command method."""
return False

def post_command(self, command_name: str, response: str) -> str:
"""This method is called after the command is executed.
Args:
command_name (str): The command name.
response (str): The response.
Returns:
str: The resulting response.
"""
pass

def can_handle_chat_completion(
self,
messages: list[Dict[Any, Any]],
model: str,
temperature: float,
max_tokens: int,
) -> bool:
"""This method is called to check that the plugin can
handle the chat_completion method.
Args:
messages (Dict[Any, Any]): The messages.
model (str): The model name.
temperature (float): The temperature.
max_tokens (int): The max tokens.
Returns:
bool: True if the plugin can handle the chat_completion method."""
return False

def handle_chat_completion(
self,
messages: list[Dict[Any, Any]],
model: str,
temperature: float,
max_tokens: int,
) -> str:
"""This method is called when the chat completion is done.
Args:
messages (Dict[Any, Any]): The messages.
model (str): The model name.
temperature (float): The temperature.
max_tokens (int): The max tokens.
Returns:
str: The resulting response.
"""
return None

def can_handle_text_embedding(
self, text: str
) -> bool:
"""This method is called to check that the plugin can
handle the text_embedding method.
Args:
text (str): The text to be converted to embedding.
Returns:
bool: True if the plugin can handle the text_embedding method."""
return False

def handle_text_embedding(
self, text: str
) -> list:
"""This method is called when the chat completion is done.
Args:
text (str): The text to be converted to embedding.
Returns:
list: The text embedding.
"""
pass

def can_handle_user_input(self, user_input: str) -> bool:
"""This method is called to check that the plugin can
handle the user_input method.

Args:
user_input (str): The user input.

Returns:
bool: True if the plugin can handle the user_input method."""
if self.voice_command_enable == "True" and self.voice_command:
return True

else:
print(
Fore.RED
+ f"{self._name} - {self._version} - Voice command plugin not loaded, because VOICE_COMMAND_ENABLE or "
f"VOICE_COMMAND_SDK were not set in env."
)
return False

def user_input(self, user_input: str) -> str:
"""This method is called to request user input to the user.

Args:
user_input (str): The question or prompt to ask the user.

Returns:
str: The user input.
"""

return self.voice_command.run(is_test=False, force_state=None)

def can_handle_report(self) -> bool:
"""This method is called to check that the plugin can
handle the report method.

Returns:
bool: True if the plugin can handle the report method."""
return False

def report(self, message: str) -> None:
"""This method is called to report a message to the user.

Args:
message (str): The message to report.
"""
pass
Loading