Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code executors #1405

Merged
merged 72 commits into from
Feb 10, 2024
Merged

Code executors #1405

merged 72 commits into from
Feb 10, 2024

Conversation

ekzhu
Copy link
Collaborator

@ekzhu ekzhu commented Jan 25, 2024

Why are these changes needed?

The default code execution is done in a command line environment in a docker container. This has following limitations:

  1. It cannot save variables in memory, as each code execution is performed by running a script from disk.
  2. It cannot support easy data in/out the code execution environment. E.g., a lot of people asking where can they find the plots or code script that agents generated.
  3. It only supports limited code runner (shell, bash, sh and python)

This is why we introduce code executors to allow users to select and configure the code execution environment, and at the same time making it easy for people to write their own code executors.

This requires changes to the schema of code_execution_config configuration -- those changes will be backward compatible so existing code will be using the legacy code execution module with no change in behavior.

Here is an example of specifying ipython-embedded code executor for a user proxy.

user_proxy = UserProxyAgent(name="proxy", code_execution_config={"executor": "ipython-embedded", "ipython-embedded": {"output_dir": "coding_output"}})
# Now code messages received by user proxy will be running in an embedded IPython kernel.

Or the local command line code executor:

user_proxy = UserProxyAgent(name="proxy", code_execution_config={"executor": "commandline-local", "commandline-local": {"work_dir": "coding"}})

In some cases, the user of the code executing agent needs to know how to use the code executor. E.g., the agent needs to know that it is interacting with an IPython notebook to make use of the notebook related features such as display, preloaded modules, and ! pip install ..., and expecting rich messages like formatted tables and plots. This requires the user agent to be equipped with a capability. This can be accomplished as following:

agent = ConversableAgent("agent", ...)
user_proxy.code_executor.user_capability.add_to_agent(agent)

Here the user_capability is an AgentCapability type that modifies agent's system message to add instructions related to usage of Ipython code executor.

User-defined code executor.

It is also possible to use user-supplied code executor. So advanced user can use their own executor without modifying the framework. Here is an example of a customized notebook executor that execute LLM generated code within the same notebook it is running on.

from typing import List
from IPython import get_ipython
from autogen.agentchat.agent import LLMAgent
from autogen.agentchat.user_proxy_agent import UserProxyAgent
from autogen.coding import CodeExecutor, MarkdownCodeExtractor, CodeExtractor, CodeBlock, CodeResult

class NotebookExecutor(CodeExecutor):

    class UserCapability:

        def add_to_agent(self, agent: LLMAgent):
            agent.update_system_message(agent.system_message + "\nInstruction on coding.")

    @property
    def code_extractor(self) -> CodeExtractor:
        return MarkdownCodeExtractor()

    @property
    def user_capability(self) -> "NotebookExecutor.UserCapability":
        return NotebookExecutor.UserCapability()

    def __init__(self) -> None:
        self._ipython = get_ipython()

    def execute_code_blocks(self, code_blocks: List[CodeBlock]) -> CodeResult:
        log = ""
        for code_block in code_blocks:
            result = self._ipython.run_cell("%%capture --no-display cap\n" + code_block.code)
            log += self._ipython.ev("cap.stdout")
            log += self._ipython.ev("cap.stderr")
            if result.result is not None:
                log += str(result.result)
            exitcode = 0 if result.success else 1
            if result.error_before_exec is not None:
                log += f"\n{result.error_before_exec}"
                exitcode = 1
            if result.error_in_exec is not None:
                log += f"\n{result.error_in_exec}"
                exitcode = 1
            if exitcode != 0:
                break
        return CodeResult(exit_code=exitcode, output=log)


# Equip the UserProxyAgent with the ExampleExecutor.
proxy = UserProxyAgent("user", code_execution_config={"executor": NotebookExecutor()})

Documentation

Documentation will be in a future PR once the user-defined module work is completed. See #1421 .

Backward compatibility

For backward compatibility, existing code that uses either setting code_execution_config to a dictionary (without the key "executor") will still be using the legacy code execution module, and subclasses that overrides run_code and execute_code_blocks will still have their overriding methods used in those classes.

Once we have finished the other tasks in the code execution roadmap (#1421), a deprecation warning will be displayed when they do that, encouraging the developer to switch from subclassing when it comes to customizing code execution.

To turn off code execution, set code_execution_config=False. This is consistent with the current behavior.

Additional changes

Per PEP544 protocol is for supporting structured sub-typing aka interface in Python. So we don't have to declare subclass of a protocol, rather we can rely on static type checker or @runtime_checkable on the protocol to check of type. e.g.,

@runtime_checkable
class Animal(Protocol):

  def speak() -> str:
    ..

class Duck:
  def speak() -> str:
    return "quack"

isinstance(duck(), Animal)
# This checks whether it implements the Animal protocol.

We make Agent and LLMAgent protocols, this allows external code to create their own agent classes without subclassing our ConversableAgent and inherit all the underlying methods and variables, yet we can use them in our code like GroupChat for example.

Related issue number

#1336
#1396
#1095

Checks

@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2024

Codecov Report

Attention: Patch coverage is 88.44985% with 38 lines in your changes missing coverage. Please review.

Project coverage is 69.93%. Comparing base (5d81ed4) to head (2c4ae6f).
Report is 583 commits behind head on main.

Files Patch % Lines
autogen/agentchat/conversable_agent.py 65.27% 11 Missing and 14 partials ⚠️
autogen/coding/local_commandline_code_executor.py 91.66% 4 Missing and 1 partial ⚠️
autogen/agentchat/agent.py 80.95% 4 Missing ⚠️
autogen/coding/embedded_ipython_code_executor.py 96.36% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1405       +/-   ##
===========================================
+ Coverage   35.03%   69.93%   +34.89%     
===========================================
  Files          44       50        +6     
  Lines        5383     5677      +294     
  Branches     1247     1381      +134     
===========================================
+ Hits         1886     3970     +2084     
+ Misses       3342     1342     -2000     
- Partials      155      365      +210     
Flag Coverage Δ
unittests 69.89% <88.44%> (+34.85%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@afourney
Copy link
Member

afourney commented Jan 25, 2024

I like this idea very much.

Given that we use the markdown header to specify language, should be allow executor to be a dictionary?

{
    "executor": {
        "python": notebook_executor,
        "sh": terminal_executor,
        "typescript": ts_executor,
        "c#": c_sharp_executor
   },
   ...
}

@rlam3
Copy link
Collaborator

rlam3 commented Jan 25, 2024

I'm not sure why with the recent 0.2.8 change why we need to default to using docker. I thought the default was to not use docker.

@afourney
Copy link
Member

I'm not sure why with the recent 0.2.8 change why we need to default to using docker. I thought the default was to not use docker.

Running in Docker was always our recommendation -- it's so much safer when dealing with the arbitrary code these agents write. Previously, we printed a prominent warning to console when Docker wasn't explicitly disabled with use_docker=False -- I wrote that PR myself in #172, which was one of my first contributions to AutoGen.

The change in 0.2.8 is to elevate the recommendation to a default. But, you can easily opt-out setting use_docker to false or setting the global environment variable.

@ekzhu ekzhu requested review from afourney and BeibinLi January 26, 2024 00:50
@ekzhu
Copy link
Collaborator Author

ekzhu commented Jan 26, 2024

@afourney H

I like this idea very much.

Given that we use the markdown header to specify language, should be allow executor to be a dictionary?

{
    "executor": {
        "python": notebook_executor,
        "sh": terminal_executor,
        "typescript": ts_executor,
        "c#": c_sharp_executor
   },
   ...
}

Thanks. This is actually an interesting idea that the dictionary entry could be an instance of an executor to achieve customization. Though currently we assume the code executor is supposed to be language agnostic -- as the LLM could produce code in multiple languages and we assume those will be executed in the same environment. So, the code executor is more about the environment in which the code runs. E.g., a command line environment which supports command utilities, an ipython environment that only supports ipython commands (python code and stuff like ! pip install package.)

We can also introduce Google Code Lab environment and .NET interactive (shout out to @LittleLittleCloud @colombod) in the future. For now, I am expecting mostly community contributions on these cases. Each executor can put in their configuration parameters inside the code_execution_config:

{"executor": "ipython",
  "ipython": {
    "timeout": 50,
    "preload_modules": ["numpy", "pandas", ...],
  }
}

@afourney
Copy link
Member

That makes sense.

Another question: Right now the default assistant prompt is heavily tuned to suggesting sh and python code, and heavily instructed to making sure the codeblocks "stand alone". Are you imagining that the executors might also contain suggested meta-prompts, or descriptions, that can make this a little more integrated?

@ekzhu
Copy link
Collaborator Author

ekzhu commented Jan 26, 2024

Are you imagining that the executors might also contain suggested meta-prompts, or descriptions, that can make this a little more integrated?

You are thinking what I am thinking. I just updated the PR description. In short:

agent = ConversableAgent("agent", ...)
user_proxy.code_executor.user_capability.add_to_agent(agent)

@davorrunje
Copy link
Collaborator

I'm not sure why with the recent 0.2.8 change why we need to default to using docker. I thought the default was to not use docker.

Running in Docker was always our recommendation -- it's so much safer when dealing with the arbitrary code these agents write. Previously, we printed a prominent warning to console when Docker wasn't explicitly disabled with use_docker=False -- I wrote that PR myself in #172, which was one of my first contributions to AutoGen.

The change in 0.2.8 is to elevate the recommendation to a default. But, you can easily opt-out setting use_docker to false or setting the global environment variable.

This is actually not quite true. Running in docker doesn't mean running in a separate docker container which would be a much safer way of doing it. It also means running in the same docker container if autogen is already running in a docker container. I discovered that yesterday and thought it was a bug (#1396) while implementing some missing tests, but apparently, it is not.

@afourney
Copy link
Member

afourney commented Jan 26, 2024

Yes. There is a difference between use_docker=True with autogen hosted outside, and running everything in Docker (in which case use_docker is effectively ignored). The former is more secure than the latter. We should definitely distinguish this, and I would be happy to discuss this in another thread.

@ekzhu
Copy link
Collaborator Author

ekzhu commented Feb 9, 2024

@IANTHEREAL I have resolved everything else except regarding the functionality to update producer agent's description field. Please see my comments.

@ekzhu
Copy link
Collaborator Author

ekzhu commented Feb 10, 2024

@abhijithnair1 You can take a look at the PR description about a notebook executor that runs inside a JupyterNote book. There is also an ipython executor that runs in a separate IPython Kernel which is stateful.

Copy link
Collaborator

@IANTHEREAL IANTHEREAL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sonichi sonichi enabled auto-merge February 10, 2024 04:44
@sonichi sonichi added this pull request to the merge queue Feb 10, 2024
Merged via the queue into microsoft:main with commit 609ba7c Feb 10, 2024
57 checks passed
@AaronWard AaronWard mentioned this pull request Feb 11, 2024
3 tasks
whiskyboy pushed a commit to whiskyboy/autogen that referenced this pull request Apr 17, 2024
* code executor

* test

* revert to main conversable agent

* prepare for pr

* kernel

* run open ai tests only when it's out of draft status

* update workflow file

* revert workflow changes

* ipython executor

* check kernel installed; fix tests

* fix tests

* fix tests

* update system prompt

* Update notebook, more tests

* notebook

* raise instead of return None

* allow user provided code executor.

* fixing types

* wip

* refactoring

* polishing

* fixed failing tests

* resolved merge conflict

* fixing failing test

* wip

* local command line executor and embedded ipython executor

* revert notebook

* fix format

* fix merged error

* fix lmm test

* fix lmm test

* move warning

* name and description should be part of the agent protocol, reset is not as it is only used for ConversableAgent; removing accidentally commited file

* version for dependency

* Update autogen/agentchat/conversable_agent.py

Co-authored-by: Jack Gerrits <[email protected]>

* ordering of protocol

* description

* fix tests

* make ipython executor dependency optional

* update document optional dependencies

* Remove exclude from Agent protocol

* Make ConversableAgent consistent with Agent

* fix tests

* add doc string

* add doc string

* fix notebook

* fix interface

* merge and update agents

* disable config usage in reply function

* description field setter

* customize system message update

* update doc

---------

Co-authored-by: Davor Runje <[email protected]>
Co-authored-by: Jack Gerrits <[email protected]>
Co-authored-by: Aaron <[email protected]>
Co-authored-by: Chi Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-execution execute generated code
Projects
None yet
Development

Successfully merging this pull request may close these issues.