-
Notifications
You must be signed in to change notification settings - Fork 45.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds risk avoidance mode and relevant config. #934
Conversation
Just to provide context to any discussion, this is the prompt used to calculate risk:
|
More comments:
|
Price/cost seems pretty trivial when we're talking about safety, but it's still important nevertheless. I do think that this would be a good change, but I want to note that this will incur an increase in cost to execute a single thought cycle, and this cost will add up over time. Granted, since this is an optional mode, it is the user's choice to use it and therefore they consent to the extra cost. |
As an extra note, you should also make the reviewing AI also consider the risk to the system its running on if it doesn't already. I'd put my full faith in this PR if you can prove that it'll prevent an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newlines at the end of files
I did not test for "rm -rf /", but I did try {"write_to_file", "/usr/bin/ls"} and iirc that scored about 0.9. Granted it is a somewhat different scenario and risk category but I'm confident what you propose would be correctly recognized as dangerous. I think the sentence "think of risks against..." is often redundant, as GPT-4 has a very good understanding of what a risk is. Don't take it as a literal set of the risks it will recognize. |
By the way, this is referencing #789 , which I created yesterday. Forgot to tag. |
I would advice to work with individual risk metrics, numbers and scores |
I agree that GPT is already extremely well censored (almost to a fault, in my opinion)..However, and as I previously stated, there is already work in progress to make AutoGPT work with different, local LLMs. In addition, Bitcoin capabilities are being included in the next PR batch. I don't think one can be too careful on these matters, and this isn't really even a tradeoff - it's an optional feature, after all. |
I think It would be good to add examples to the prompt, for things that are risky and things which are not. |
I did attempt this approach, but I ultimately decided otherwise since I found omitting it provided good results, and so it would have been an unnecesary bias from my part. However, I would completely agree with adding a way for the user to provide their own opinion on risk, if and after this is merged. |
Is there any more discussion needed? I'd like to resolve this and move on to other issues. |
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
Deployment failed with the following error:
|
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of naming, clarity and code structure, risk avoidance mode would be better as a modifier on top of continuous mode instead of a mode in itself. That way the whole thing becomes a bit more flexible so that it can possibly be combined with other modifiers in the future.
Similarly, autogpt/risk_evaluation.py
could be autogpt/self_evaluation.py
, which leaves room for other, similar functionality to be added later.
@@ -4,6 +4,7 @@ | |||
|
|||
@click.group(invoke_without_command=True) | |||
@click.option("-c", "--continuous", is_flag=True, help="Enable Continuous Mode") | |||
@click.option("--risk-avoidance", is_flag=True, help="Enable Risk Avoidance Mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
risk-avoidance
is technically accurate, although it doesn't reflect how this mode works, which is more of a supervisory workflow. Maybe something like --self-supervise
would reflect it better. Or an option that combines risk avoidance and self-feedback:
--self-supervise=none
(default)--self-supervise=guidance
for the existing self-feedback mode--self-supervise=risk-averse
--max-risk=0.3
What do you think? cc @ntindle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pwuts’s suggestion is a really good way to handle it, especially in light of the work we are planningfor guardrails
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
Agreed. |
One should carefully think how to combine this with #3914. |
…rs (Significant-Gravitas#934) * add basic support to Spark dataframe add support to SynapseML LightGBM model update to pyspark>=3.2.0 to leverage pandas_on_Spark API * clean code, add TODOs * add sample_train_data for pyspark.pandas dataframe, fix bugs * improve some functions, fix bugs * fix dict change size during iteration * update model predict * update LightGBM model, update test * update SynapseML LightGBM params * update synapseML and tests * update TODOs * Added support to roc_auc for spark models * Added support to score of spark estimator * Added test for automl score of spark estimator * Added cv support to pyspark.pandas dataframe * Update test, fix bugs * Added tests * Updated docs, tests, added a notebook * Fix bugs in non-spark env * Fix bugs and improve tests * Fix uninstall pyspark * Fix tests error * Fix java.lang.OutOfMemoryError: Java heap space * Fix test_performance * Update test_sparkml to test_0sparkml to use the expected spark conf * Remove unnecessary widgets in notebook * Fix iloc java.lang.StackOverflowError * fix pre-commit * Added params check for spark dataframes * Refactor code for train_test_split to a function * Update train_test_split_pyspark * Refactor if-else, remove unnecessary code * Remove y from predict, remove mem control from n_iter compute * Update workflow * Improve _split_pyspark * Fix test failure of too short training time * Fix typos, improve docstrings * Fix index errors of pandas_on_spark, add spark loss metric * Fix typo of ndcgAtK * Update NDCG metrics and tests * Remove unuseful logger * Use cache and count to ensure consistent indexes * refactor for merge maain * fix errors of refactor * Updated SparkLightGBMEstimator and cache * Updated config2params * Remove unused import * Fix unknown parameters * Update default_estimator_list * Add unit tests for spark metrics
Background
General discussion among the AI community has been lately expressing concern about alignment, risk and recklessness in recent developments. This intends to alleviate concerns, allowing users to put more trust leaving in AutoGPT on its own.
Changes
This PR adds a Risk Avoidance (Hybrid) mode, mutually exclusive with Continuous mode and meant to be a midpoint between full-auto and human assisted. When under this mode, an intermediate GPT call will be made to evaluate every command executed. If the calculated risk exceeds a user-defined threshold, execution is paused until the human manually approves it. There are relevant additions made to configuration, both in environment variables (threshold and model to use) and the command-line argument to activate the mode.
Documentation
Changes are documented with code comments. I believe them to be sufficient, plus the added behavior is pretty self-descriptive.
Test Plan
I tested the risk avoidance mode by running AutoGPT with the --riskAvoidance flag and several risk thresholds to calibrate its correct behavior. The prompt used to assess risk was thoroughly tested with both gpt-3.5 and gpt-4 through the ChatGPT interface, with what I consider to be success.
PR Quality Checklist
I have not added tests since the functionality is not testable on its own.