Significant-Gravitas · armmarov · May 11, 2023 · May 11, 2023 · May 11, 2023 · May 11, 2023
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -9,4 +9,5 @@
 /src/autogpt_plugins/wikipedia_search @pierluigi-failla
 /src/autogpt_plugins/api_tools @sidewaysthought
 /src/autogpt_plugins/random_values @sidewaysthought
-/src/autogpt_plugins/bluesky @hermanschutte
+/src/autogpt_plugins/bluesky @hermanschutte
+/src/autogpt_plugins/voice_command @armmarov
diff --git a/README.md b/README.md
@@ -71,6 +71,7 @@ You can also see the plugins here:
 | Random Values    | Enable AutoGPT to generate various random numbers and strings.                                                    | [autogpt_plugins/random_values](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/random_values) |
 | SceneX           | Explore image storytelling beyond pixels with the Auto-GPT SceneX Plugin.                                        | [autogpt_plugins/scenex](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/scenex)               |
 | Twitter          | AutoGPT is capable of retrieving Twitter posts and other related content by accessing the Twitter platform via the v1.1 API using Tweepy.               | [autogpt_plugins/twitter](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/twitter)           |
+| Voice Command    | Enable two-ways conversation via voice command plugin integration into Auto-GPT. | [autogpt_plugins/voice_command](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/voice_command) |
 | Wikipedia Search | This allows AutoGPT to use Wikipedia directly.                                                                    | [autogpt_plugins/wikipedia_search](https://github.com/Significant-Gravitas/Auto-GPT-Plugins/tree/master/src/autogpt_plugins/wikipedia_search) |
 
 Some third-party plugins have been created by contributors that are not included in this repository. For more information about these plugins, please visit their respective GitHub pages.

diff --git a/requirements.txt b/requirements.txt
@@ -18,5 +18,7 @@ validators
 pytest
 pytest-cov
 atproto
-requests
 bs4
+sounddevice
+vosk
+playsound
diff --git a/src/autogpt_plugins/voice_command/README.md b/src/autogpt_plugins/voice_command/README.md
@@ -0,0 +1,62 @@
+# autogpt-voice-command
+
+A plugin adding voice command integration into Auto GPT
+
+## Features (more coming soon!)
+
+- speak through microphone with auto-gpt
+- support kaldi recognizer library
+
+## Installation
+
+1. Clone this repo as instructed in the main repository
+2. Add this chunk of code along with your voice command API information to the `.env` file within AutoGPT:
+
+```
+CHAT_MESSAGES_ENABLED=True
+
+... 
+
+################################################################################
+### VOICE COMMAND
+################################################################################
+
+VOICE_COMMAND_ENABLE=True
+VOICE_COMMAND_SDK=kaldi
+VOICE_COMMAND_INITCALL=hello
+VOICE_COMMAND_CONFIRM=True
+```
+
+- VOICE_COMMAND_ENABLE is used to enable to voice command plugin
+- VOICE_COMMAND_SDK is used to determine which library used for the speech recognition. Currently only kaldi is
+  available and fully tested
+- VOICE_COMMAND_INITCALL is used to wake the system up before providing any question
+- VOICE_COMMAND_CONFIRM is used to enable confirmation on user's question before sending to autogpt. Due to the accent
+  or vocabulary limitation, the library may wrongly recognize speech text from user, so user can repeat the question if
+  necessary
+
+3. Download vosk model from https://alphacephei.com/vosk/models to the autogpt root directory
+4. Extract the model and rename the directory to 'model'
+
+```
+For example:
+
+Change directory to Auto-GPT based folder
+# cd Auto-GPT  
+
+Copy the downloaded model
+# cp ~/vosk-model-small-en-us-0.15.zip .
+
+Unzip the model file
+# unzip vosk-model-small-en-us-0.15.zip
+
+Rename the model's name
+# mv vosk-model-small-en-us-0.15 model
+```
+
+## Usage
+
+1. It is more interactive to use this plugin along with TTS enabled (--speak)
+2. To authorize commands in auto-gpt with a simple yes or no, user can just say 'yes' or 'no'. The plugin will automatically change the wording to character 'y' or 'n' which understood by auto-gpt to execute the command
+3. To ensure the system will only process based on user's intention, user needs to initiate the call by using wording defined by VOICE_COMMAND_INITCALL. System will reply 'yes sir', before user can start any conversation
+4. To prevent any wrong data being processed, user can enable the VOICE_COMMAND_CONFIRM flag. System will double confirm the question or command from user. User needs to reply 'yes' or 'no' accordingly. If 'no', then user can directly provide the command again
diff --git a/src/autogpt_plugins/voice_command/__init__.py b/src/autogpt_plugins/voice_command/__init__.py
@@ -0,0 +1,286 @@
+import os
+from typing import Any, Dict, List, Optional, Tuple, TypedDict, TypeVar
+
+from auto_gpt_plugin_template import AutoGPTPluginTemplate
+from colorama import Fore
+from .voice_command_kaldi import VoiceCommandKaldi
+
+PromptGenerator = TypeVar("PromptGenerator")
+
+
+class Message(TypedDict):
+    role: str
+    content: str
+
+
+class AutoGPTVoiceCommand(AutoGPTPluginTemplate):
+    """
+    Voice Command plugin for Auto-GPT.
+    """
+
+    def __init__(self):
+        super().__init__()
+        self._name = "autogpt-voice-command"
+        self._version = "0.1.0"
+        self._description = "Enable Auto-GPT with voice command."
+
+        self.voice_command_enable = os.getenv("VOICE_COMMAND_ENABLE")
+        self.voice_command_sdk = os.getenv("VOICE_COMMAND_SDK")
+
+        if self.voice_command_sdk and self.voice_command_sdk.lower() == "kaldi":
+            self.voice_command = VoiceCommandKaldi()
+        else:
+            print("Voice command SDK is not selected")
+
+    def can_handle_on_response(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the on_response method.
+        Returns:
+            bool: True if the plugin can handle the on_response method."""
+        return False
+
+    def on_response(self, response: str, *args, **kwargs) -> str:
+        """This method is called when a response is received from the model."""
+        pass
+
+    def can_handle_post_prompt(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the post_prompt method.
+        Returns:
+            bool: True if the plugin can handle the post_prompt method."""
+        return False
+
+    def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
+        """This method is called just after the generate_prompt is called,
+            but actually before the prompt is generated.
+
+        Args:
+            prompt (PromptGenerator): The prompt generator.
+
+        Returns:
+            PromptGenerator: The prompt generator.
+        """
+        pass
+
+    def can_handle_on_planning(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the on_planning method.
+        Returns:
+            bool: True if the plugin can handle the on_planning method."""
+        return False
+
+    def on_planning(
+            self, prompt: PromptGenerator, messages: List[str]
+    ) -> Optional[str]:
+        """This method is called before the planning chat completion is done.
+        Args:
+            prompt (PromptGenerator): The prompt generator.
+            messages (List[str]): The list of messages.
+        """
+        pass
+
+    def can_handle_post_planning(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the post_planning method.
+        Returns:
+            bool: True if the plugin can handle the post_planning method."""
+        return False
+
+    def post_planning(self, response: str) -> str:
+        """This method is called after the planning chat completion is done.
+        Args:
+            response (str): The response.
+        Returns:
+            str: The resulting response.
+        """
+        pass
+
+    def can_handle_pre_instruction(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the pre_instruction method.
+        Returns:
+            bool: True if the plugin can handle the pre_instruction method."""
+        return False
+
+    def pre_instruction(self, messages: List[str]) -> List[str]:
+        """This method is called before the instruction chat is done.
+        Args:
+            messages (List[str]): The list of context messages.
+        Returns:
+            List[str]: The resulting list of messages.
+        """
+        pass
+
+    def can_handle_on_instruction(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the on_instruction method.
+        Returns:
+            bool: True if the plugin can handle the on_instruction method."""
+        return False
+
+    def on_instruction(self, messages: List[str]) -> Optional[str]:
+        """This method is called when the instruction chat is done.
+        Args:
+            messages (List[str]): The list of context messages.
+        Returns:
+            Optional[str]: The resulting message.
+        """
+        pass
+
+    def can_handle_post_instruction(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the post_instruction method.
+        Returns:
+            bool: True if the plugin can handle the post_instruction method."""
+        return False
+
+    def post_instruction(self, response: str) -> str:
+        """This method is called after the instruction chat is done.
+        Args:
+            response (str): The response.
+        Returns:
+            str: The resulting response.
+        """
+        pass
+
+    def can_handle_pre_command(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the pre_command method.
+        Returns:
+            bool: True if the plugin can handle the pre_command method."""
+        return False
+
+    def pre_command(
+            self, command_name: str, arguments: Dict[str, Any]
+    ) -> Tuple[str, Dict[str, Any]]:
+        """This method is called before the command is executed.
+        Args:
+            command_name (str): The command name.
+            arguments (Dict[str, Any]): The arguments.
+        Returns:
+            Tuple[str, Dict[str, Any]]: The command name and the arguments.
+        """
+        pass
+
+    def can_handle_post_command(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the post_command method.
+        Returns:
+            bool: True if the plugin can handle the post_command method."""
+        return False
+
+    def post_command(self, command_name: str, response: str) -> str:
+        """This method is called after the command is executed.
+        Args:
+            command_name (str): The command name.
+            response (str): The response.
+        Returns:
+            str: The resulting response.
+        """
+        pass
+
+    def can_handle_chat_completion(
+            self,
+            messages: list[Dict[Any, Any]],
+            model: str,
+            temperature: float,
+            max_tokens: int,
+    ) -> bool:
+        """This method is called to check that the plugin can
+        handle the chat_completion method.
+        Args:
+            messages (Dict[Any, Any]): The messages.
+            model (str): The model name.
+            temperature (float): The temperature.
+            max_tokens (int): The max tokens.
+        Returns:
+            bool: True if the plugin can handle the chat_completion method."""
+        return False
+
+    def handle_chat_completion(
+            self,
+            messages: list[Dict[Any, Any]],
+            model: str,
+            temperature: float,
+            max_tokens: int,
+    ) -> str:
+        """This method is called when the chat completion is done.
+        Args:
+            messages (Dict[Any, Any]): The messages.
+            model (str): The model name.
+            temperature (float): The temperature.
+            max_tokens (int): The max tokens.
+        Returns:
+            str: The resulting response.
+        """
+        return None
+
+    def can_handle_text_embedding(
+            self, text: str
+    ) -> bool:
+        """This method is called to check that the plugin can
+          handle the text_embedding method.
+        Args:
+            text (str): The text to be converted to embedding.
+          Returns:
+              bool: True if the plugin can handle the text_embedding method."""
+        return False
+
+    def handle_text_embedding(
+            self, text: str
+    ) -> list:
+        """This method is called when the chat completion is done.
+        Args:
+            text (str): The text to be converted to embedding.
+        Returns:
+            list: The text embedding.
+        """
+        pass
+
+    def can_handle_user_input(self, user_input: str) -> bool:
+        """This method is called to check that the plugin can
+        handle the user_input method.
+
+        Args:
+            user_input (str): The user input.
+
+        Returns:
+            bool: True if the plugin can handle the user_input method."""
+        if self.voice_command_enable == "True" and self.voice_command:
+            return True
+
+        else:
+            print(
+                Fore.RED
+                + f"{self._name} - {self._version} - Voice command plugin not loaded, because VOICE_COMMAND_ENABLE or "
+                  f"VOICE_COMMAND_SDK were not set in env."
+            )
+        return False
+
+    def user_input(self, user_input: str) -> str:
+        """This method is called to request user input to the user.
+
+        Args:
+            user_input (str): The question or prompt to ask the user.
+
+        Returns:
+            str: The user input.
+        """
+
+        return self.voice_command.run(is_test=False, force_state=None)
+
+    def can_handle_report(self) -> bool:
+        """This method is called to check that the plugin can
+        handle the report method.
+
+        Returns:
+            bool: True if the plugin can handle the report method."""
+        return False
+
+    def report(self, message: str) -> None:
+        """This method is called to report a message to the user.
+
+        Args:
+            message (str): The message to report.
+        """
+        pass