Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds risk avoidance mode and relevant config. #934

Closed
wants to merge 33 commits into from
Closed

Adds risk avoidance mode and relevant config. #934

wants to merge 33 commits into from

Conversation

jnt0rrente
Copy link

Background

General discussion among the AI community has been lately expressing concern about alignment, risk and recklessness in recent developments. This intends to alleviate concerns, allowing users to put more trust leaving in AutoGPT on its own.

Changes

This PR adds a Risk Avoidance (Hybrid) mode, mutually exclusive with Continuous mode and meant to be a midpoint between full-auto and human assisted. When under this mode, an intermediate GPT call will be made to evaluate every command executed. If the calculated risk exceeds a user-defined threshold, execution is paused until the human manually approves it. There are relevant additions made to configuration, both in environment variables (threshold and model to use) and the command-line argument to activate the mode.

Documentation

Changes are documented with code comments. I believe them to be sufficient, plus the added behavior is pretty self-descriptive.

Test Plan

I tested the risk avoidance mode by running AutoGPT with the --riskAvoidance flag and several risk thresholds to calibrate its correct behavior. The prompt used to assess risk was thoroughly tested with both gpt-3.5 and gpt-4 through the ChatGPT interface, with what I consider to be success.

PR Quality Checklist

  • My pull request is atomic and focuses on a single change.
  • I have thoroughly tested my changes with multiple different prompts.
  • I have considered potential risks and mitigations for my changes.
  • I have documented my changes clearly and comprehensively.
  • I have not snuck in any "extra" small tweaks changes

I have not added tests since the functionality is not testable on its own.

@jnt0rrente
Copy link
Author

Just to provide context to any discussion, this is the prompt used to calculate risk:

You are to act as an intelligent, automated expert risk evaluator. You will be given input in the form "{command: , arguments: }" and you must evaluate the possible risks of an ideal automaton running that command. Think of risks against life, human or otherwise; the environment, society and the global economy.

Your answers must be in the form "{"calculated_risk": <value between 0 and 1>, "reason": ""}. Using the right syntax is vital to your task. Mind the curly brackets, the colon, the comma and the space. If you do not use the right syntax, you will be penalized.

Respond with "Acknowledged." if you fully understand and agree to the above.

@jnt0rrente
Copy link
Author

jnt0rrente commented Apr 12, 2023

More comments:

@onekum
Copy link
Contributor

onekum commented Apr 12, 2023

Price/cost seems pretty trivial when we're talking about safety, but it's still important nevertheless. I do think that this would be a good change, but I want to note that this will incur an increase in cost to execute a single thought cycle, and this cost will add up over time.

Granted, since this is an optional mode, it is the user's choice to use it and therefore they consent to the extra cost.

@onekum
Copy link
Contributor

onekum commented Apr 12, 2023

As an extra note, you should also make the reviewing AI also consider the risk to the system its running on if it doesn't already. I'd put my full faith in this PR if you can prove that it'll prevent an rm -rf to my system.

Copy link
Contributor

@nponeccop nponeccop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newlines at the end of files

.env.template Outdated Show resolved Hide resolved
scripts/risk_evaluation.py Outdated Show resolved Hide resolved
@jnt0rrente
Copy link
Author

As an extra note, you should also make the reviewing AI also consider the risk to the system its running on if it doesn't already. I'd put my full faith in this PR if you can prove that it'll prevent an rm -rf to my system.

I did not test for "rm -rf /", but I did try {"write_to_file", "/usr/bin/ls"} and iirc that scored about 0.9. Granted it is a somewhat different scenario and risk category but I'm confident what you propose would be correctly recognized as dangerous.

I think the sentence "think of risks against..." is often redundant, as GPT-4 has a very good understanding of what a risk is. Don't take it as a literal set of the risks it will recognize.

@jnt0rrente
Copy link
Author

By the way, this is referencing #789 , which I created yesterday. Forgot to tag.

nponeccop
nponeccop previously approved these changes Apr 12, 2023
scripts/risk_evaluation.py Outdated Show resolved Hide resolved
onekum
onekum previously approved these changes Apr 12, 2023
@GoMightyAlgorythmGo
Copy link

GoMightyAlgorythmGo commented Apr 12, 2023

Just to provide context to any discussion, this is the prompt used to calculate risk:

You are to act as an intelligent, automated expert risk evaluator. You will be given input in the form "{command: , arguments: }" and you must evaluate the possible risks of an ideal automaton running that command. Think of risks against life, human or otherwise; the environment, society and the global economy.
Your answers must be in the form "{"calculated_risk": <value between 0 and 1>, "reason": ""}. Using the right syntax is vital to your task. Mind the curly brackets, the colon, the comma and the space. If you do not use the right syntax, you will be penalized.
Respond with "Acknowledged." if you fully understand and agree to the above.

I would advice to work with individual risk metrics, numbers and scores
instead of doing each command maybe a more broader overview might be enought. After all GPT is already extreamly hyperchondric. Are there even cases where gpt did something like buy something you did not want? What was the worst that ever happened? Can he ever buy anything without money? I mean he could delete your pc but idk... You seem to talk as if you already have capable AI systems. Mine is struggling to remember its todo list and to make basic systems that would allow it to work more efficiently. Granted the constant errors make it hard to say. Also the brilliance definitely shines trough some of the time. I think "AI" API that are save and where a AI can have something like a "credit card" for children where the parents have to confirm the purchase and so on and some things are restricted or go to the user for approval are nice. But i don't think we are there yet at all. Except computer safety if you have important stuff on your laptop or virtual machine environment or so

@jnt0rrente
Copy link
Author

jnt0rrente commented Apr 12, 2023

I would advice to work with individual risk metrics, numbers and scores
instead of doing each command maybe a more broader overview might be enought. After all GPT is already extreamly hyperchondric. Are there even cases where gpt did something like buy something you did not want? What was the worst that ever happened? Can he ever buy anything without money? I mean he could delete your pc but idk... You seem to talk as if you already have capable AI systems. Mine is struggling to remember its todo list and to make basic systems that would allow it to work more efficiently. Granted the constant errors make it hard to say. Also the brilliance definitely shines trough some of the time. I think "AI" API that are save and where a AI can have something like a "credit card" for children where the parents have to confirm the purchase and so on and some things are restricted or go to the user for approval are nice. But i don't think we are there yet at all. Except computer safety if you have important stuff on your laptop or virtual machine environment or so

I agree that GPT is already extremely well censored (almost to a fault, in my opinion)..However, and as I previously stated, there is already work in progress to make AutoGPT work with different, local LLMs. In addition, Bitcoin capabilities are being included in the next PR batch.

I don't think one can be too careful on these matters, and this isn't really even a tradeoff - it's an optional feature, after all.

.env.template Outdated Show resolved Hide resolved
scripts/main.py Outdated Show resolved Hide resolved
nponeccop
nponeccop previously approved these changes Apr 12, 2023
@richbeales richbeales added the needs discussion To be discussed among maintainers label Apr 12, 2023
@LuposX
Copy link

LuposX commented Apr 12, 2023

I think It would be good to add examples to the prompt, for things that are risky and things which are not.

@jnt0rrente jnt0rrente linked an issue Apr 12, 2023 that may be closed by this pull request
1 task
@jnt0rrente jnt0rrente dismissed stale reviews from nponeccop and onekum via f8d6bd9 April 12, 2023 21:08
@jnt0rrente
Copy link
Author

I think It would be good to add examples to the prompt, for things that are risky and things which are not.

I did attempt this approach, but I ultimately decided otherwise since I found omitting it provided good results, and so it would have been an unnecesary bias from my part. However, I would completely agree with adding a way for the user to provide their own opinion on risk, if and after this is merged.

nponeccop
nponeccop previously approved these changes Apr 12, 2023
@nponeccop
Copy link
Contributor

@Torantulino ?

@jnt0rrente jnt0rrente requested review from onekum and aslafy-z April 13, 2023 08:12
@jnt0rrente
Copy link
Author

Is there any more discussion needed? I'd like to resolve this and move on to other issues.

@nponeccop
Copy link
Contributor

@richbeales @p-i- ?

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 17, 2023
@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Jun 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@vercel
Copy link

vercel bot commented Jun 7, 2023

Deployment failed with the following error:

Resource is limited - try again in 4 hours (more than 100, code: "api-deployments-free-per-day").

@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Jun 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

.env.template Show resolved Hide resolved
Copy link
Member

@Pwuts Pwuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of naming, clarity and code structure, risk avoidance mode would be better as a modifier on top of continuous mode instead of a mode in itself. That way the whole thing becomes a bit more flexible so that it can possibly be combined with other modifiers in the future.

Similarly, autogpt/risk_evaluation.py could be autogpt/self_evaluation.py, which leaves room for other, similar functionality to be added later.

@@ -4,6 +4,7 @@

@click.group(invoke_without_command=True)
@click.option("-c", "--continuous", is_flag=True, help="Enable Continuous Mode")
@click.option("--risk-avoidance", is_flag=True, help="Enable Risk Avoidance Mode")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

risk-avoidance is technically accurate, although it doesn't reflect how this mode works, which is more of a supervisory workflow. Maybe something like --self-supervise would reflect it better. Or an option that combines risk avoidance and self-feedback:

  • --self-supervise=none (default)
  • --self-supervise=guidance for the existing self-feedback mode
  • --self-supervise=risk-averse --max-risk=0.3

What do you think? cc @ntindle

Copy link
Member

@ntindle ntindle Jun 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pwuts’s suggestion is a really good way to handle it, especially in light of the work we are planningfor guardrails

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Jun 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@Boostrix
Copy link
Contributor

Boostrix commented Jun 10, 2023

Similarly, autogpt/risk_evaluation.py could be autogpt/self_evaluation.py, which leaves room for other, similar functionality to be added later.

Agreed.
See specifically:

@eyalk11
Copy link
Contributor

eyalk11 commented Jun 30, 2023

One should carefully think how to combine this with #3914.

@lc0rp lc0rp modified the milestones: v0.4.4 Release, v0.4.5 Jul 4, 2023
@lc0rp lc0rp modified the milestones: v0.4.5 Release, v0.4.6 Release Jul 14, 2023
@lc0rp lc0rp modified the milestones: v0.4.6 Release, v0.4.7 Release Jul 22, 2023
@lc0rp lc0rp modified the milestones: v0.4.7 Release, v0.4.8 Aug 1, 2023
@Pwuts Pwuts added needs restructuring PRs that should be split or restructured and removed needs discussion To be discussed among maintainers labels Sep 8, 2023
@Pwuts Pwuts mentioned this pull request Sep 10, 2023
SquareandCompass pushed a commit to SquareandCompass/Auto-GPT that referenced this pull request Oct 21, 2023
…rs (Significant-Gravitas#934)

* add basic support to Spark dataframe

add support to SynapseML LightGBM model

update to pyspark>=3.2.0 to leverage pandas_on_Spark API

* clean code, add TODOs

* add sample_train_data for pyspark.pandas dataframe, fix bugs

* improve some functions, fix bugs

* fix dict change size during iteration

* update model predict

* update LightGBM model, update test

* update SynapseML LightGBM params

* update synapseML and tests

* update TODOs

* Added support to roc_auc for spark models

* Added support to score of spark estimator

* Added test for automl score of spark estimator

* Added cv support to pyspark.pandas dataframe

* Update test, fix bugs

* Added tests

* Updated docs, tests, added a notebook

* Fix bugs in non-spark env

* Fix bugs and improve tests

* Fix uninstall pyspark

* Fix tests error

* Fix java.lang.OutOfMemoryError: Java heap space

* Fix test_performance

* Update test_sparkml to test_0sparkml to use the expected spark conf

* Remove unnecessary widgets in notebook

* Fix iloc java.lang.StackOverflowError

* fix pre-commit

* Added params check for spark dataframes

* Refactor code for train_test_split to a function

* Update train_test_split_pyspark

* Refactor if-else, remove unnecessary code

* Remove y from predict, remove mem control from n_iter compute

* Update workflow

* Improve _split_pyspark

* Fix test failure of too short training time

* Fix typos, improve docstrings

* Fix index errors of pandas_on_spark, add spark loss metric

* Fix typo of ndcgAtK

* Update NDCG metrics and tests

* Remove unuseful logger

* Use cache and count to ensure consistent indexes

* refactor for merge maain

* fix errors of refactor

* Updated SparkLightGBMEstimator and cache

* Updated config2params

* Remove unused import

* Fix unknown parameters

* Update default_estimator_list

* Add unit tests for spark metrics
@jnt0rrente jnt0rrente closed this by deleting the head repository Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conflicts Automatically applied to PRs with merge conflicts needs restructuring PRs that should be split or restructured Security 🛡️ size/l
Projects
Status: 🔄 Second Chance
Development

Successfully merging this pull request may close these issues.

Risk-avoiding continuous mode