Restore TeachableAgent tests (microsoft#761)

rickyloynd-microsoft · web-flow · commit 767d06c63678 · 2023-11-27T02:10:02.000Z
* Update chat_with_teachable_agent.py to v2.

* Update agentchat_teachability.ipynb to v2.

* Add test of teachability accuracy.

* Update installation instructions.

* Add to contrib tests.

* pre-commit fixes

* Apply reviewer suggestions to test workflows.
diff --git a/.github/workflows/contrib-openai.yml b/.github/workflows/contrib-openai.yml
@@ -138,3 +138,41 @@ jobs:
         with:
           file: ./coverage.xml
           flags: unittests
+  TeachableAgent:
+    strategy:
+      matrix:
+        os: [ubuntu-latest]
+        python-version: ["3.11"]
+    runs-on: ${{ matrix.os }}
+    environment: openai1
+    steps:
+      # checkout to pr branch
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          ref: ${{ github.event.pull_request.head.sha }}
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install packages and dependencies
+        run: |
+          docker --version
+          python -m pip install --upgrade pip wheel
+          pip install -e .[teachable]
+          python -c "import autogen"
+          pip install coverage
+      - name: Coverage
+        env:
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
+          AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
+          OAI_CONFIG_LIST: ${{ secrets.OAI_CONFIG_LIST }}
+        run: |
+          coverage run -a -m pytest test/agentchat/contrib/test_teachable_agent.py
+          coverage xml
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v3
+        with:
+          file: ./coverage.xml
+          flags: unittests
diff --git a/.github/workflows/contrib-tests.yml b/.github/workflows/contrib-tests.yml
@@ -109,6 +109,30 @@ jobs:
           pip install -e .
           pip uninstall -y openai
       - name: Test GPTAssistantAgent
-        if: matrix.python-version != '3.10'
         run: |
           pytest test/agentchat/contrib/test_gpt_assistant.py
+
+  TeachableAgent:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-2019]
+        python-version: ["3.8", "3.9", "3.10", "3.11"]
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install packages and dependencies for all tests
+        run: |
+          python -m pip install --upgrade pip wheel
+          pip install pytest
+      - name: Install packages and dependencies for TeachableAgent
+        run: |
+          pip install -e .[teachable]
+          pip uninstall -y openai
+      - name: Test TeachableAgent
+        run: |
+          pytest test/agentchat/contrib/test_teachable_agent.py
diff --git a/notebook/agentchat_teachability.ipynb b/notebook/agentchat_teachability.ipynb
@@ -21,7 +21,7 @@
     "\n",
     "In making decisions about memo storage and retrieval, `TeachableAgent` calls an instance of `TextAnalyzerAgent` to analyze pieces of text in several different ways. This adds extra LLM calls involving a relatively small number of tokens. These calls can add a few seconds to the time a user waits for a response.\n",
     "\n",
-    "This notebook demonstrates how `TeachableAgent` can learn facts, preferences, and skills from users. To chat with `TeachableAgent` yourself, run [chat_with_teachable_agent.py](../test/agentchat/chat_with_teachable_agent.py).\n",
+    "This notebook demonstrates how `TeachableAgent` can learn facts, preferences, and skills from users. To chat with `TeachableAgent` yourself, run [chat_with_teachable_agent.py](../test/agentchat/contrib/chat_with_teachable_agent.py).\n",
     "\n",
     "## Requirements\n",
     "\n",
@@ -38,7 +38,7 @@
    "outputs": [],
    "source": [
     "%%capture --no-stderr\n",
-    "# %pip install \"pyautogen[teachable]"
+    "# %pip install \"pyautogen[teachable]\""
    ]
   },
   {
@@ -142,9 +142,9 @@
     "from autogen import UserProxyAgent\n",
     "\n",
     "llm_config = {\n",
-    "    \"timeout\": 60,\n",
     "    \"config_list\": config_list,\n",
-    "    \"use_cache\": True,  # Use False to explore LLM non-determinism.\n",
+    "    \"timeout\": 60,\n",
+    "    \"cache_seed\": None,  # Use an int to seed the response cache. Use None to disable caching.\n",
     "}\n",
     "\n",
     "teach_config={\n",
@@ -157,6 +157,7 @@
     "try:\n",
     "    from termcolor import colored\n",
     "except ImportError:\n",
+    "\n",
     "    def colored(x, *args, **kwargs):\n",
     "        return x\n",
     "    \n",
@@ -170,8 +171,7 @@
     "    human_input_mode=\"NEVER\",\n",
     "    is_termination_msg=lambda x: True if \"TERMINATE\" in x.get(\"content\") else False,\n",
     "    max_consecutive_auto_reply=0,\n",
-    ")\n",
-    "\n"
+    ")"
    ]
   },
   {
@@ -781,7 +781,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.17"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,
diff --git a/test/agentchat/contrib/chat_with_teachable_agent.py b/test/agentchat/contrib/chat_with_teachable_agent.py
@@ -1,6 +1,12 @@
 from autogen import UserProxyAgent, config_list_from_json
 from autogen.agentchat.contrib.teachable_agent import TeachableAgent
 
+import os
+import sys
+
+sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
+from test_assistant_agent import OAI_CONFIG_LIST, KEY_LOC  # noqa: E402
+
 
 try:
     from termcolor import colored
@@ -12,21 +18,24 @@ def colored(x, *args, **kwargs):
 
 verbosity = 0  # 0 for basic info, 1 to add memory operations, 2 for analyzer messages, 3 for memo lists.
 recall_threshold = 1.5  # Higher numbers allow more (but less relevant) memos to be recalled.
-use_cache = False  # If True, cached LLM calls will be skipped and responses pulled from cache. False exposes LLM non-determinism.
+cache_seed = None  # Use an int to seed the response cache. Use None to disable caching.
 
 # Specify the model to use. GPT-3.5 is less reliable than GPT-4 at learning from user input.
+# filter_dict = {"model": ["gpt-4-0613"]}
+# filter_dict = {"model": ["gpt-3.5-turbo-0613"]}
 filter_dict = {"model": ["gpt-4"]}
+# filter_dict = {"model": ["gpt-35-turbo-16k", "gpt-3.5-turbo-16k"]}
 
 
 def create_teachable_agent(reset_db=False):
     """Instantiates a TeachableAgent using the settings from the top of this file."""
     # Load LLM inference endpoints from an env variable or a file
     # See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints
     # and OAI_CONFIG_LIST_sample
-    config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST", filter_dict=filter_dict)
+    config_list = config_list_from_json(env_or_file=OAI_CONFIG_LIST, filter_dict=filter_dict, file_location=KEY_LOC)
     teachable_agent = TeachableAgent(
         name="teachableagent",
-        llm_config={"config_list": config_list, "timeout": 120, "use_cache": use_cache},
+        llm_config={"config_list": config_list, "timeout": 120, "cache_seed": cache_seed},
         teach_config={
             "verbosity": verbosity,
             "reset_db": reset_db,
diff --git a/test/agentchat/contrib/test_teachable_agent.py b/test/agentchat/contrib/test_teachable_agent.py
@@ -1,3 +1,11 @@
+import pytest
+import os
+import sys
+from autogen import ConversableAgent, config_list_from_json
+
+sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
+from test_assistant_agent import OAI_CONFIG_LIST, KEY_LOC  # noqa: E402
+
 try:
     from openai import OpenAI
     from autogen.agentchat.contrib.teachable_agent import TeachableAgent
@@ -6,11 +14,6 @@
 else:
     skip = False
 
-import pytest
-import sys
-from autogen import ConversableAgent, config_list_from_json
-from test_assistant_agent import OAI_CONFIG_LIST, KEY_LOC
-
 try:
     from termcolor import colored
 except ImportError:
@@ -25,8 +28,7 @@ def colored(x, *args, **kwargs):
 
 assert_on_error = False  # GPT-4 nearly always succeeds on these unit tests, but GPT-3.5 is a bit less reliable.
 recall_threshold = 1.5  # Higher numbers allow more (but less relevant) memos to be recalled.
-cache_seed = None
-# If int, cached LLM calls will be skipped and responses pulled from cache. None exposes LLM non-determinism.
+cache_seed = None  # Use an int to seed the response cache. Use None to disable caching.
 
 # Specify the model to use by uncommenting one of the following lines.
 # filter_dict={"model": ["gpt-4-0613"]}
@@ -139,10 +141,10 @@ def use_task_advice_pair_phrasing():
 
 
 @pytest.mark.skipif(
-    skip or not sys.version.startswith("3.11"),
-    reason="do not run if dependency is not installed or py!=3.11",
+    skip,
+    reason="do not run if dependency is not installed",
 )
-def test_all():
+def test_teachability_code_paths():
     """Runs this file's unit tests."""
     total_num_errors, total_num_tests = 0, 0
 
@@ -169,6 +171,49 @@ def test_all():
         )
 
 
+@pytest.mark.skipif(
+    skip,
+    reason="do not run if dependency is not installed",
+)
+def test_teachability_accuracy():
+    """A very cheap and fast test of teachability accuracy."""
+    print(colored("\nTEST TEACHABILITY ACCURACY", "light_cyan"))
+
+    num_trials = 10  # The expected probability of failure is about 0.3 on each trial.
+    for trial in range(num_trials):
+        teachable_agent = create_teachable_agent(
+            reset_db=True, verbosity=0
+        )  # For a clean test, clear the agent's memory.
+        user = ConversableAgent("user", max_consecutive_auto_reply=0, llm_config=False, human_input_mode="NEVER")
+
+        # Prepopulate memory with a few arbitrary memos, just to make retrieval less trivial.
+        teachable_agent.prepopulate_db()
+
+        # Tell the teachable agent something it wouldn't already know.
+        user.initiate_chat(recipient=teachable_agent, message="My favorite color is teal.")
+
+        # Let the teachable agent remember things that should be learned from this chat.
+        teachable_agent.learn_from_user_feedback()
+
+        # Now start a new chat to clear the context, and ask the teachable agent about the new information.
+        print(colored("\nSTARTING A NEW CHAT WITH EMPTY CONTEXT", "light_cyan"))
+        user.initiate_chat(recipient=teachable_agent, message="What's my favorite color?")
+        num_errors = check_agent_response(teachable_agent, user, "teal")
+
+        print(colored(f"\nTRIAL {trial + 1} OF {num_trials} FINISHED", "light_cyan"))
+
+        # Wrap up.
+        teachable_agent.close_db()
+
+        # Exit on the first success.
+        if num_errors == 0:
+            return
+
+    # All trials failed.
+    assert False, "test_teachability_accuracy() failed on all {} trials.".format(num_trials)
+
+
 if __name__ == "__main__":
     """Runs this file's unit tests from the command line."""
-    test_all()
+    test_teachability_code_paths()
+    test_teachability_accuracy()
diff --git a/website/blog/2023-10-26-TeachableAgent/index.mdx b/website/blog/2023-10-26-TeachableAgent/index.mdx
@@ -23,11 +23,11 @@ In order to make effective decisions about memo storage and retrieval, `Teachabl
 
 AutoGen contains three code examples that use `TeachableAgent`.
 
-1. Run [chat_with_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/chat_with_teachable_agent.py) to converse with `TeachableAgent`.
+1. Run [chat_with_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/contrib/chat_with_teachable_agent.py) to converse with `TeachableAgent`.
 
 2. Use the Jupyter notebook [agentchat_teachability.ipynb](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb) to step through examples discussed below.
 
-3. Run [test_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/test_teachable_agent.py) for quick unit testing of `TeachableAgent`.
+3. Run [test_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/contrib/test_teachable_agent.py) for quick unit testing of `TeachableAgent`.
 
 
 ## Basic Usage of TeachableAgent
diff --git a/website/docs/Installation.md b/website/docs/Installation.md
@@ -55,7 +55,7 @@ openai v1 is a total rewrite of the library with many breaking changes. For exam
 Therefore, some changes are required for users of `pyautogen<0.2`.
 
 - `api_base` -> `base_url`, `request_timeout` -> `timeout` in `llm_config` and `config_list`. `max_retry_period` and `retry_wait_time` are deprecated. `max_retries` can be set for each client.
-- MathChat, TeachableAgent are unsupported until they are tested in future release.
+- MathChat is unsupported until it is tested in future release.
 - `autogen.Completion` and `autogen.ChatCompletion` are deprecated. The essential functionalities are moved to `autogen.OpenAIWrapper`:
 ```python
 from autogen import OpenAIWrapper
@@ -118,6 +118,17 @@ Example notebooks:
 [Automated Code Generation and Question Answering with Qdrant based Retrieval Augmented Agents](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_qdrant_RetrieveChat.ipynb)
 
 
+- #### TeachableAgent
+
+To use TeachableAgent, please install AutoGen with the [teachable] option.
+```bash
+pip install "pyautogen[teachable]"
+```
+
+Example notebook:  [Chatting with TeachableAgent](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb)
+
+
+
 - #### Large Multimodal Model (LMM) Agents
 
 We offered Multimodal Conversable Agent and LLaVA Agent. Please install with the [lmm] option to use it.