Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing python code directly from AI output #286

Closed
luiz00martins opened this issue Apr 6, 2023 · 15 comments
Closed

Executing python code directly from AI output #286

luiz00martins opened this issue Apr 6, 2023 · 15 comments
Labels
enhancement New feature or request needs discussion To be discussed among maintainers Stale

Comments

@luiz00martins
Copy link

I've encountered the same problem as #101. GPT Thinks that evaluate_code will execute python:

CODEREVIEW-GPT THOUGHTS: Let's find some code to review and suggest improvements.
REASONING: We need to search for files with common code extensions that are not too large and more recent than a certain date.
PLAN:
-  Use the 'evaluate_code' command to run a Python script that searches for files with common code extensions in the current folder and subfolders.
-  Use the 'memory_add' command to save the file paths for later reference and reuse.
-  Review the saved files using 'read_file' command to ensure they are indeed code files.
-  Choose a file to review and suggest improvements.
CRITICISM: We may need to adjust the search criteria if we don't find relevant code or if too many files match the criteria.
NEXT ACTION:  COMMAND = evaluate_code ARGUMENTS = {'code': 'import pathlib\nfrom typing import List\n\ndef find_code_files(root: str, extensions: List[str], max_size: int, min_date_modified: float) -> List[str]:\n """\n Returns a list of file paths for code files with the specified extensions, maximum size, and minimum date modified\n rooted in the given directory and its subdirectories.\n """\n files = [str(path) for path in pathlib.Path(root).rglob("*") if path.suffix in extensions\n and path.stat().st_size <= max_size\n and path.stat().st_mtime >= min_date_modified]\n return files\n\ndef print_files(files: List[str])
 -> None:\n """\n Prints the list of file paths to the console, one file per line.\n If no relevant files are found, prints a message indicating so.\n """\n for file in files:\n print(file)\n else:\n print("No
 relevant files found.")\n\n\ndef analyze_code(code: str) -> None:\n """\n Analyzes the given code and provides suggestions for improvements.\n """\n pass\n\nif __name__ == \'__main__\':\n root: str = \'.\'\n extensions: List[str] = [\'.py\', \'.js\', \'.java\']\n max_size: int = 10000 # bytes\n min_date_modified: float = 1609459200 # seconds since epoch (01/01/2021)\n\n files: List[str] = find_code_files(root, extensions, max_size, min_date_modified)\n print_files(files)\n'}

On one hand, that's a bug, which is addressed on #101.

On the other hand... that's a very interesting idea from GPT. Perhaps giving it the ability to execute python code could allow it to execute a lot of tasks in a way more dynamic way.

@drammen94
Copy link

What do you mean. It's already s function in the project.

@luiz00martins
Copy link
Author

Oh, really? My bad then 😄

What's the name of the command?

@drammen94
Copy link

execute_python_file(arguments["file"])

@luiz00martins
Copy link
Author

Is the file argument a path or actual code?

The feature I'm proposing is the direct execution of code. "file" makes it seem like it's a path to a file already in the system.

@yourfavtheo
Copy link

Same problem here too :

NEXT ACTION:  COMMAND = execute_python_file ARGUMENTS = {'file': '<path_to_python_script>'}
Executing file '<path_to_python_script>' in workspace 'auto_gpt_workspace'
SYSTEM:  Command execute_python_file returned: Error: Invalid file type. Only .py files are allowed.

I asked it to create a python script and it just try to execute "path_to_python_script".
Any solution ?

@Qoyyuum Qoyyuum added enhancement New feature or request Needs Benchmark This change is hard to test and requires a benchmark labels Apr 16, 2023
@Pwuts
Copy link
Member

Pwuts commented Apr 18, 2023

Closing as duplicate of #101

@Pwuts Pwuts closed this as not planned Won't fix, can't repro, duplicate, stale Apr 18, 2023
@Pwuts Pwuts removed the Needs Benchmark This change is hard to test and requires a benchmark label Apr 18, 2023
@luiz00martins
Copy link
Author

Not a duplicate. This is a feature request for direct python code execution.

@Pwuts
Copy link
Member

Pwuts commented Apr 18, 2023

That is already implemented

@luiz00martins
Copy link
Author

As execute_python_file? I'm not sure if that's the same thing as I stated on my original message.

As I said:

Is the file argument a path or actual code?

The feature I'm proposing is the direct execution of code. "file" makes it seem like it's a path to a file already in the system.

If you want to close as "won't do", that's okay. But I don't think it's a duplicate.

@Pwuts
Copy link
Member

Pwuts commented Apr 18, 2023

Ah, thanks for elaborating! I think this is something we could add without too much effort. The tricky thing is to properly sandbox it, in a way equivalent to execute_python_file.

@Pwuts Pwuts reopened this Apr 18, 2023
@Pwuts Pwuts moved this to 📋 Backlog in AutoGPT development kanban Apr 18, 2023
@Pwuts Pwuts added the needs discussion To be discussed among maintainers label Apr 18, 2023
@Pwuts Pwuts changed the title Execution of python code Executing python code directly from AI output Apr 18, 2023
Pwuts added a commit that referenced this issue Apr 19, 2023
ChatGPT is less confused by this phrasing

From my own observations and others (ie  #101 and #286) ChatGPT seems to think that `evaluate_code` will actually run code, rather than just provide feedback. Since changing the phrasing to `analyze_code` I haven't seen the AI make this mistake.

---------

Co-authored-by: Reinier van der Leer <[email protected]>
@Boostrix
Copy link
Contributor

Boostrix commented May 5, 2023

this has more to do with in-memory execution of code that isn't written to disk, I suppose ?
If so, that's related to the API discovery idea #56

@luiz00martins
Copy link
Author

luiz00martins commented May 5, 2023

Yeah, it is somewhat related.

I think this issue might supersede that one/that issue might supersede this one depending on how it's implemented. Although this one is a bit more general (e.g. the Agent might spin up a python instance just to do some calculations, so nothing necessarily related to an API).


Edit: As a matter of fact, now that I think about it, I think these should be separate tasks. Meaning, a search_for_api task, followed by a write_python_code task, which would use that knowledge. That would keep the system more general, but fulfil the capabilities of #56.

@Boostrix
Copy link
Contributor

Boostrix commented May 6, 2023

search_for_api would be a specialization of a do_research (crawl) command, where as api could be either a classical API or a networking API

Some of us have succeeded getting Agent-GPT to write code by exploring API docs already - my recent experiments made it download the github API docs and come up with a CLI tool to filter PRs based on excluding those that are touching the same paths/files: master...Boostrix:Auto-GPT:topic/PRHelper.py

While this is trivial in nature, it can already be pretty helpful to identify PRs that can be easily reviewed/integrated, because they're not stepping on anyone's toes. And it would be easy to extend as well.

The point being, having some sort of search API / extend yourself mechanism is exactly what many folks here are suggesting when it comes to "self-improving", in its simplest form: adding features without having to write much/any code.

So, thinking about it, I am inclined to think that commands should be based on classes that can be extended - a research command would be based on a crawler/spider class [http requests: #2730), and a find_api command would be based on the research command [class]

That way, you can have your cake and eat it, while also ensuring that the underlying functionality (searching/exploring the solution space), is available for other use-cases - like the idea of hooking up the agent to a research paper server (#826 ) or making it process pdf files (#1353 )

Commands in their current form worked, but to support scaling and reduce code rot, it would make sense to identify overlapping functionality and then use a layered approach for common building blocks.

The "API explorer" you mentioned could also be API based itself, so there is no need to go through HTML scraping - but some folks may need exactly that, so a scraping mechanism would be a higher-level implementation of a crawler #2730

Related talks collated here: #514 (comment)

@github-actions
Copy link
Contributor

This issue was closed automatically because it has been stale for 10 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023
@Boostrix
Copy link
Contributor

Boostrix commented Oct 4, 2023

regarding the API explorer idea: #5536

sindlinger pushed a commit to Orgsindlinger/Auto-GPT-WebUI that referenced this issue Sep 25, 2024
ChatGPT is less confused by this phrasing

From my own observations and others (ie  Significant-Gravitas#101 and Significant-Gravitas#286) ChatGPT seems to think that `evaluate_code` will actually run code, rather than just provide feedback. Since changing the phrasing to `analyze_code` I haven't seen the AI make this mistake.

---------

Co-authored-by: Reinier van der Leer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs discussion To be discussed among maintainers Stale
Projects
None yet
Development

No branches or pull requests

6 participants