-
Notifications
You must be signed in to change notification settings - Fork 44.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VCR Fix for debug_code_challenge_a #3926
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
198 changes: 198 additions & 0 deletions
198
...llenges/debug_code/cassettes/test_debug_code_challenge_a/test_debug_code_challenge_a.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
interactions: | ||
- request: | ||
body: '{"model": "gpt-3.5-turbo", "messages": [{"role": "system", "content": "You | ||
are Debug-Code-GPT, an autonomous agent that specializes in debugging python | ||
code\nYour decisions must always be made independently without seeking user | ||
assistance. Play to your strengths as an LLM and pursue simple strategies with | ||
no legal complications.\n\nGOALS:\n\n1. 1-Run the code in the file named ''code.py'' | ||
using the execute_code command.\n2. 2-Read code.py to understand why the code | ||
is not working as expected.\n3. 3-Modify code.py to fix the error.\n4. Repeat | ||
step 1, 2 and 3 until the code is working as expected. When you''re done use | ||
the task_complet command.\n5. Do not use any other commands than execute_python_file | ||
and write_file\n\n\nConstraints:\n1. ~4000 word limit for short term memory. | ||
Your short term memory is short, so immediately save important information to | ||
files.\n2. If you are unsure how you previously did something or want to recall | ||
past events, thinking about similar events will help you remember.\n3. No user | ||
assistance\n4. Exclusively use the commands listed in double quotes e.g. \"command | ||
name\"\n\nCommands:\n1. append_to_file: Append to file, args: \"filename\": | ||
\"<filename>\", \"text\": \"<text>\"\n2. delete_file: Delete file, args: \"filename\": | ||
\"<filename>\"\n3. list_files: List Files in Directory, args: \"directory\": | ||
\"<directory>\"\n4. read_file: Read file, args: \"filename\": \"<filename>\"\n5. | ||
write_to_file: Write to file, args: \"filename\": \"<filename>\", \"text\": | ||
\"<text>\"\n6. execute_python_file: Execute Python File, args: \"filename\": | ||
\"<filename>\"\n7. improve_code: Get Improved Code, args: \"suggestions\": \"<list_of_suggestions>\", | ||
\"code\": \"<full_code_string>\"\n8. delete_agent: Delete GPT Agent, args: \"key\": | ||
\"<key>\"\n9. get_hyperlinks: Get text summary, args: \"url\": \"<url>\"\n10. | ||
get_text_summary: Get text summary, args: \"url\": \"<url>\", \"question\": | ||
\"<question>\"\n11. list_agents: List GPT Agents, args: () -> str\n12. message_agent: | ||
Message GPT Agent, args: \"key\": \"<key>\", \"message\": \"<message>\"\n13. | ||
start_agent: Start GPT Agent, args: \"name\": \"<name>\", \"task\": \"<short_task_desc>\", | ||
\"prompt\": \"<prompt>\"\n14. Task Complete (Shutdown): \"task_complete\", args: | ||
\"reason\": \"<reason>\"\n\nResources:\n1. Internet access for searches and | ||
information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered | ||
Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. | ||
Continuously review and analyze your actions to ensure you are performing to | ||
the best of your abilities.\n2. Constructively self-criticize your big-picture | ||
behavior constantly.\n3. Reflect on past decisions and strategies to refine | ||
your approach.\n4. Every command has a cost, so be smart and efficient. Aim | ||
to complete tasks in the least number of steps.\n5. Write all code to a file.\n\nYou | ||
should only respond in JSON format as described below \nResponse Format: \n{\n \"thoughts\": | ||
{\n \"text\": \"thought\",\n \"reasoning\": \"reasoning\",\n \"plan\": | ||
\"- short bulleted\\n- list that conveys\\n- long-term plan\",\n \"criticism\": | ||
\"constructive self-criticism\",\n \"speak\": \"thoughts summary to say | ||
to user\"\n },\n \"command\": {\n \"name\": \"command name\",\n \"args\": | ||
{\n \"arg name\": \"value\"\n }\n }\n} \nEnsure the response | ||
can be parsed by Python json.loads"}, {"role": "system", "content": "The current | ||
time and date is Tue Jan 1 00:00:00 2000"}, {"role": "user", "content": "Determine | ||
which next command to use, and respond using the format specified above:"}], | ||
"temperature": 0, "max_tokens": 0}' | ||
headers: | ||
Accept: | ||
- '*/*' | ||
Content-Length: | ||
- '3694' | ||
Content-Type: | ||
- application/json | ||
method: POST | ||
uri: https://api.openai.com/v1/chat/completions | ||
response: | ||
body: | ||
string: !!binary | | ||
H4sIAAAAAAAAA7SSwW7bMAyG73sKgpdcnCCJ1y3xdeuK3VfsMA+BIjG2VltUJTpNEPjdB8d2GqQ9 | ||
DdhN+En+/EjxhNZghrpUomtfTT9/UeZ5tXwoH/c/m4evj4fV/bcfippYpM81JsjbP6RlqJhprn1F | ||
YtlhgjqQEjKYLT6t0mV6t/i4TLBmQxVmWHiZprO7qTRhy9N5Ol9ggk1UBWF2Qh+49rIRfiIXMVul | ||
6wRfvS/64m6doLCo6iKt16s2QV2y1RQx+3XCmuJoG7gizFDFaKMoJx0kOyHXDXDKHQBAjlJyU5QS | ||
c8xgEIcAHaQTc/wOseSmMhBFBYHtEehAuhHrCpCSQLMhsO783tmKwKmaDEw6feaPE2jimNoX0sYf | ||
pWS3OWdrrmvlzCzH5Lp/IBXZWVf0EPdvW77YqoLC7glqAuXAGlLAO3gplVyBRYiN9xzJgDAYBuVM | ||
n2MjaHUFFwKHWwxfKdcTTKFnoP84tA5WrLaxHjdv2E0ESrXvJjzCJQ47DlAfQTchkBPoMG/Noif1 | ||
NBqdl/WPHzjLsfdtk/FsBv43V9NV9i3fmfqGT4Xi9uz6QJf7ajRQjAxnjgEndy22Ce6ss7Hc9BeD | ||
GUZhjwlaZ+iA2bz93X74CwAA//8DAEvY8//qAwAA | ||
headers: | ||
CF-Cache-Status: | ||
- DYNAMIC | ||
CF-RAY: | ||
- 7c23c799fce124fc-SJC | ||
Cache-Control: | ||
- no-cache, must-revalidate | ||
Connection: | ||
- keep-alive | ||
Content-Encoding: | ||
- gzip | ||
Content-Type: | ||
- application/json | ||
Date: | ||
- Thu, 04 May 2023 21:19:11 GMT | ||
Server: | ||
- cloudflare | ||
access-control-allow-origin: | ||
- '*' | ||
alt-svc: | ||
- h3=":443"; ma=86400, h3-29=":443"; ma=86400 | ||
openai-model: | ||
- gpt-3.5-turbo-0301 | ||
openai-organization: | ||
- user-adtx4fhfg1qsiyzdoaxciooj | ||
openai-processing-ms: | ||
- '9158' | ||
openai-version: | ||
- '2020-10-01' | ||
strict-transport-security: | ||
- max-age=15724800; includeSubDomains | ||
x-ratelimit-limit-requests: | ||
- '3500' | ||
x-ratelimit-limit-tokens: | ||
- '90000' | ||
x-ratelimit-remaining-requests: | ||
- '3499' | ||
x-ratelimit-remaining-tokens: | ||
- '86506' | ||
x-ratelimit-reset-requests: | ||
- 17ms | ||
x-ratelimit-reset-tokens: | ||
- 2.329s | ||
x-request-id: | ||
- 44489accbcf3486fd32dc5e3d57362f0 | ||
status: | ||
code: 200 | ||
message: OK | ||
- request: | ||
body: '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Below | ||
is a message from an AI agent with the role of an autonomous agent that specializes | ||
in debugging python code. Please review the provided Thought, Reasoning, Plan, | ||
and Criticism. If these elements accurately contribute to the successful execution | ||
of the assumed role, respond with the letter ''Y'' followed by a space, and | ||
then explain why it is effective. If the provided information is not suitable | ||
for achieving the role''s objectives, please provide one or more sentences addressing | ||
the issue and suggesting a resolution.Executing the code will give me an idea | ||
of what the code is supposed to do and what is causing the error.- Execute the | ||
code in the file named ''code.py'' using the execute_python_file command.I don''t | ||
have any criticism for my current plan."}], "temperature": 0, "max_tokens": | ||
0}' | ||
headers: | ||
Accept: | ||
- '*/*' | ||
Content-Length: | ||
- '875' | ||
Content-Type: | ||
- application/json | ||
method: POST | ||
uri: https://api.openai.com/v1/chat/completions | ||
response: | ||
body: | ||
string: !!binary | | ||
H4sIAAAAAAAAA0RQwWpbMRC89yvEnmVjx3bi6BZK0kBKKW4vTShBlva9p1RvV0irYtf434sedXOc | ||
YWZ2Zk8QPBhwgxU3pji7+WjxKh5211/Sp2/4cHf/eXfzfLcb149Px6+ggfdv6OSfY+54TBElMIEG | ||
l9EKejDL6+3qarVZbtYaRvYYwUCfZLaab2ZS855ni9ViCRpqsT2COUHKPCZ5Ff6FVMAsN1sN79n/ | ||
+fVCg7DY+K683Z41uIGDwwLm5QQjlktq5ohgwJYSiliS1pFJkFr/H+r7EIoKRVlS2HXoJPxGlaIl | ||
1XFurK3CxCPXomyPJKokdMHG8CdQrwIpj/va9w2kowxMyrHHubo/oKvSaBlw4qYzyuXa7KoIpmYP | ||
HklCd7woC9fsUHE3Icx5quFVJY+5LfAXZWgrPHrVVXLtRTYGOV6cUws4a+gChTK8ZrSFCQwU4QQa | ||
Ank8gFmcf54//AUAAP//AwDYoVqEAAIAAA== | ||
headers: | ||
CF-Cache-Status: | ||
- DYNAMIC | ||
CF-RAY: | ||
- 7c23c7e2dc1b24fc-SJC | ||
Cache-Control: | ||
- no-cache, must-revalidate | ||
Connection: | ||
- keep-alive | ||
Content-Encoding: | ||
- gzip | ||
Content-Type: | ||
- application/json | ||
Date: | ||
- Thu, 04 May 2023 21:19:16 GMT | ||
Server: | ||
- cloudflare | ||
access-control-allow-origin: | ||
- '*' | ||
alt-svc: | ||
- h3=":443"; ma=86400, h3-29=":443"; ma=86400 | ||
openai-model: | ||
- gpt-3.5-turbo-0301 | ||
openai-organization: | ||
- user-adtx4fhfg1qsiyzdoaxciooj | ||
openai-processing-ms: | ||
- '2382' | ||
openai-version: | ||
- '2020-10-01' | ||
strict-transport-security: | ||
- max-age=15724800; includeSubDomains | ||
x-ratelimit-limit-requests: | ||
- '3500' | ||
x-ratelimit-limit-tokens: | ||
- '90000' | ||
x-ratelimit-remaining-requests: | ||
- '3499' | ||
x-ratelimit-remaining-tokens: | ||
- '89791' | ||
x-ratelimit-reset-requests: | ||
- 17ms | ||
x-ratelimit-reset-tokens: | ||
- 138ms | ||
x-request-id: | ||
- 74381d3f20cfa83486bb0688b872f39c | ||
status: | ||
code: 200 | ||
message: OK | ||
version: 1 |
70 changes: 70 additions & 0 deletions
70
tests/integration/challenges/debug_code/test_debug_code_challenge_a.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
import contextlib | ||
from functools import wraps | ||
from typing import Generator | ||
|
||
import pytest | ||
|
||
from autogpt.agent import agent | ||
from autogpt.commands.execute_code import execute_python_file | ||
from autogpt.commands.file_operations import read_file, write_to_file | ||
from tests.integration.agent_utils import run_interaction_loop | ||
from tests.integration.challenges.utils import run_multiple_times | ||
from tests.utils import requires_api_key | ||
import docker | ||
|
||
CODE = """def two_sum(nums, target): | ||
seen = {} | ||
for i, num in enumerate(nums): | ||
complement = target - num | ||
if complement in seen: | ||
return [seen[complement], i] | ||
seen[num] = i | ||
return None | ||
|
||
# Example usage: | ||
nums = [2, 7, 11, 15] | ||
target = 9 | ||
result = two_sum(nums, target) | ||
print(result) # Output: [0, 1] | ||
""" | ||
|
||
|
||
def input_generator(input_sequence: list) -> Generator[str, None, None]: | ||
""" | ||
Creates a generator that yields input strings from the given sequence. | ||
|
||
:param input_sequence: A list of input strings. | ||
:return: A generator that yields input strings. | ||
""" | ||
yield from input_sequence | ||
|
||
|
||
# @pytest.skip("Nobody beat this challenge yet") | ||
@pytest.mark.vcr | ||
@requires_api_key("OPENAI_API_KEY") | ||
@run_multiple_times(3) | ||
def test_debug_code_challenge_a( | ||
create_code_agent, monkeypatch | ||
) -> None: | ||
""" | ||
Test the challenge_a function in a given agent by mocking user inputs and checking the output file content. | ||
|
||
:param get_company_revenue_agent: The agent to test. | ||
:param monkeypatch: pytest's monkeypatch utility for modifying builtins. | ||
""" | ||
|
||
# monkeypatch.setattr(docker, 'from_env', lambda: docker_client) | ||
client = docker.from_env() | ||
|
||
file_path = str(create_code_agent.workspace.get_path("code.py")) | ||
write_to_file(file_path, CODE) | ||
|
||
|
||
input_sequence = ["s", "s", "s", "s", "s", "EXIT"] | ||
gen = input_generator(input_sequence) | ||
monkeypatch.setattr("builtins.input", lambda _: next(gen)) | ||
|
||
with contextlib.suppress(SystemExit): | ||
create_code_agent.start_interaction_loop() | ||
assert "[0, 1]" in execute_python_file(file_path), "Expected the output to be [0, 1]" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For some reason this doesn't appear to be executing. |
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't tell it what the expected output is, as otherwise the agent might simplify this to just be print('[0, 1]')