Fig.1 illustrates the general flow of AgentEval with verification step
+ + + +TL;DR: +* As a developer, how can you assess the utility and effectiveness of an LLM-powered application in helping end users with their tasks? +* To shed light on the question above, we previously introduced [`AgentEval`](https://microsoft.github.io/autogen/blog/2023/11/20/AgentEval/) — a framework to assess the multi-dimensional utility of any LLM-powered application crafted to assist users in specific tasks. We have now embedded it as part of the AutoGen library to ease developer adoption. +* Here, we introduce an updated version of AgentEval that includes a verification process to estimate the robustness of the QuantifierAgent. More details can be found in [this paper](https://arxiv.org/abs/2405.02178). + + +## Introduction + +Previously introduced [`AgentEval`](https://microsoft.github.io/autogen/blog/2023/11/20/AgentEval/) is a comprehensive framework designed to bridge the gap in assessing the utility of LLM-powered applications. It leverages recent advancements in LLMs to offer a scalable and cost-effective alternative to traditional human evaluations. The framework comprises three main agents: `CriticAgent`, `QuantifierAgent`, and `VerifierAgent`, each playing a crucial role in assessing the task utility of an application. + +**CriticAgent: Defining the Criteria** + +The CriticAgent's primary function is to suggest a set of criteria for evaluating an application based on the task description and examples of successful and failed executions. For instance, in the context of a math tutoring application, the CriticAgent might propose criteria such as efficiency, clarity, and correctness. These criteria are essential for understanding the various dimensions of the application's performance. It’s highly recommended that application developers validate the suggested criteria leveraging their domain expertise. + +**QuantifierAgent: Quantifying the Performance** + +Once the criteria are established, the QuantifierAgent takes over to quantify how well the application performs against each criterion. This quantification process results in a multi-dimensional assessment of the application's utility, providing a detailed view of its strengths and weaknesses. + +**VerifierAgent: Ensuring Robustness and Relevance** + +VerifierAgent ensures the criteria used to evaluate a utility are effective for the end-user, maintaining both robustness and high discriminative power. It does this through two main actions: + +1. Criteria Stability: + * Ensures criteria are essential, non-redundant, and consistently measurable. + * Iterates over generating and quantifying criteria, eliminating redundancies, and evaluating their stability. + * Retains only the most robust criteria. + +2. Discriminative Power: + + * Tests the system's reliability by introducing adversarial examples (noisy or compromised data). + * Assesses the system's ability to distinguish these from standard cases. + * If the system fails, it indicates the need for better criteria to handle varied conditions effectively. + +## A Flexible and Scalable Framework + +One of AgentEval's key strengths is its flexibility. It can be applied to a wide range of tasks where success may or may not be clearly defined. For tasks with well-defined success criteria, such as household chores, the framework can evaluate whether multiple successful solutions exist and how they compare. For more open-ended tasks, such as generating an email template, AgentEval can assess the utility of the system's suggestions. + +Furthermore, AgentEval allows for the incorporation of human expertise. Domain experts can participate in the evaluation process by suggesting relevant criteria or verifying the usefulness of the criteria identified by the agents. This human-in-the-loop approach ensures that the evaluation remains grounded in practical, real-world considerations. + +## Empirical Validation + +To validate AgentEval, the framework was tested on two applications: math problem solving and ALFWorld, a household task simulation. The math dataset comprised 12,500 challenging problems, each with step-by-step solutions, while the ALFWorld dataset involved multi-turn interactions in a simulated environment. In both cases, AgentEval successfully identified relevant criteria, quantified performance, and verified the robustness of the evaluations, demonstrating its effectiveness and versatility. + +## How to use `AgentEval` + +AgentEval currently has two main stages; criteria generation and criteria quantification (criteria verification is still under development). Both stages make use of sequential LLM-powered agents to make their determinations. + +**Criteria Generation:** + +During criteria generation, AgentEval uses example execution message chains to create a set of criteria for quantifying how well an application performed for a given task. + +``` +def generate_criteria( + llm_config: Optional[Union[Dict, Literal[False]]] = None, + task: Task = None, + additional_instructions: str = "", + max_round=2, + use_subcritic: bool = False, +) +``` + +Parameters: +* llm_config (dict or bool): llm inference configuration. +* task ([Task](https://github.com/microsoft/autogen/tree/main/autogen/agentchat/contrib/agent_eval/task.py)): The task to evaluate. +* additional_instructions (str, optional): Additional instructions for the criteria agent. +* max_round (int, optional): The maximum number of rounds to run the conversation. +* use_subcritic (bool, optional): Whether to use the Subcritic agent to generate subcriteria. The Subcritic agent will break down a generated criteria into smaller criteria to be assessed. + +Example code: +``` +config_list = autogen.config_list_from_json("OAI_CONFIG_LIST") +task = Task( + **{ + "name": "Math problem solving", + "description": "Given any question, the system needs to solve the problem as consisely and accurately as possible", + "successful_response": response_successful, + "failed_response": response_failed, + } +) + +criteria = generate_criteria(task=task, llm_config={"config_list": config_list}) +``` + +Note: Only one sample execution chain (success/failure) is required for the task object but AgentEval will perform better with an example for each case. + + +Example Output: +``` +[ + { + "name": "Accuracy", + "description": "The solution must be correct and adhere strictly to mathematical principles and techniques appropriate for the problem.", + "accepted_values": ["Correct", "Minor errors", "Major errors", "Incorrect"] + }, + { + "name": "Conciseness", + "description": "The explanation and method provided should be direct and to the point, avoiding unnecessary steps or complexity.", + "accepted_values": ["Very concise", "Concise", "Somewhat verbose", "Verbose"] + }, + { + "name": "Relevance", + "description": "The content of the response must be relevant to the question posed and should address the specific problem requirements.", + "accepted_values": ["Highly relevant", "Relevant", "Somewhat relevant", "Not relevant"] + } +] +``` + + + +**Criteria Quantification:** + +During the quantification stage, AgentEval will use the generated criteria (or user defined criteria) to assess a given execution chain to determine how well the application performed. + +``` +def quantify_criteria( + llm_config: Optional[Union[Dict, Literal[False]]], + criteria: List[Criterion], + task: Task, + test_case: str, + ground_truth: str, +) +``` + +Parameters: +* llm_config (dict or bool): llm inference configuration. +* criteria ([Criterion](https://github.com/microsoft/autogen/tree/main/autogen/agentchat/contrib/agent_eval/criterion.py)): A list of criteria for evaluating the utility of a given task. This can either be generated by the `generate_criteria` function or manually created. +* task ([Task](https://github.com/microsoft/autogen/tree/main/autogen/agentchat/contrib/agent_eval/task.py)): The task to evaluate. It should match the one used during the `generate_criteria` step. +* test_case (str): The execution chain to assess. Typically this is a json list of messages but could be any string representation of a conversation chain. +* ground_truth (str): The ground truth for the test case. + +Example Code: +``` +test_case="""[ + { + "content": "Find $24^{-1} \\pmod{11^2}$. That is, find the residue $b$ for which $24b \\equiv 1\\pmod{11^2}$.\n\nExpress your answer as an integer from $0$ to $11^2-1$, inclusive.", + "role": "user" + }, + { + "content": "To find the modular inverse of 24 modulo 11^2, we can use the Extended Euclidean Algorithm. Here is a Python function to compute the modular inverse using this algorithm:\n\n```python\ndef mod_inverse(a, m):\n..." + "role": "assistant" + } + ]""" + +quantifier_output = quantify_criteria( + llm_config={"config_list": config_list}, + criteria=criteria, + task=task, + test_case=test_case, + ground_truth="true", +) +``` + +The output will be a json object consisting of the ground truth and a dictionary mapping each criteria to it's score. + +``` +{ + "actual_success": true, + "estimated_performance": { + "Accuracy": "Correct", + "Conciseness": "Concise", + "Relevance": "Highly relevant" + } +} +``` + +## What is next? +* Enabling AgentEval in AutoGen Studio for a nocode solution. +* Fully implementing VerifierAgent in the AgentEval framework. + +## Conclusion + +AgentEval represents a significant advancement in the evaluation of LLM-powered applications. By combining the strengths of CriticAgent, QuantifierAgent, and VerifierAgent, the framework offers a robust, scalable, and flexible solution for assessing task utility. This innovative approach not only helps developers understand the current performance of their applications but also provides valuable insights that can drive future improvements. As the field of intelligent agents continues to evolve, frameworks like AgentEval will play a crucial role in ensuring that these applications meet the diverse and dynamic needs of their users. + + +## Further reading + +Please refer to our [paper](https://arxiv.org/abs/2405.02178) and [codebase](https://github.com/microsoft/autogen/tree/main/autogen/agentchat/contrib/agent_eval) for more details about AgentEval. + +If you find this blog useful, please consider citing: +```bobtex +@article{arabzadeh2024assessing, + title={Assessing and Verifying Task Utility in LLM-Powered Applications}, + author={Arabzadeh, Negar and Huo, Siging and Mehta, Nikhil and Wu, Qinqyun and Wang, Chi and Awadallah, Ahmed and Clarke, Charles LA and Kiseleva, Julia}, + journal={arXiv preprint arXiv:2405.02178}, + year={2024} +} +``` diff --git a/website/blog/2024-06-24-AltModels-Classes/img/agentstogether.jpeg b/website/blog/2024-06-24-AltModels-Classes/img/agentstogether.jpeg new file mode 100644 index 000000000000..fd859fc62076 --- /dev/null +++ b/website/blog/2024-06-24-AltModels-Classes/img/agentstogether.jpeg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:964628601b60ddbab8940fea45014dcd841b89783bb0b7e9ac9d3690f1c41798 +size 659594 diff --git a/website/blog/2024-06-24-AltModels-Classes/index.mdx b/website/blog/2024-06-24-AltModels-Classes/index.mdx new file mode 100644 index 000000000000..9c94094e7e4c --- /dev/null +++ b/website/blog/2024-06-24-AltModels-Classes/index.mdx @@ -0,0 +1,393 @@ +--- +title: Enhanced Support for Non-OpenAI Models +authors: + - marklysze + - Hk669 +tags: [mistral ai,anthropic,together.ai,gemini] +--- + + + +## TL;DR + +- **AutoGen has expanded integrations with a variety of cloud-based model providers beyond OpenAI.** +- **Leverage models and platforms from Gemini, Anthropic, Mistral AI, Together.AI, and Groq for your AutoGen agents.** +- **Utilise models specifically for chat, language, image, and coding.** +- **LLM provider diversification can provide cost and resilience benefits.** + +In addition to the recently released AutoGen [Google Gemini](https://ai.google.dev/) client, new client classes for [Mistral AI](https://mistral.ai/), [Anthropic](https://www.anthropic.com/), [Together.AI](https://www.together.ai/), and [Groq](https://groq.com/) enable you to utilize over 75 different large language models in your AutoGen agent workflow. + +These new client classes tailor AutoGen's underlying messages to each provider's unique requirements and remove that complexity from the developer, who can then focus on building their AutoGen workflow. + +Using them is as simple as installing the client-specific library and updating your LLM config with the relevant `api_type` and `model`. We'll demonstrate how to use them below. + +The community is continuing to enhance and build new client classes as cloud-based inference providers arrive. So, watch this space, and feel free to [discuss](https://discord.gg/pAbnFJrkgZ) or [develop](https://github.com/microsoft/autogen/pulls) another one. + +## Benefits of choice + +The need to use only the best models to overcome workflow-breaking LLM inconsistency has diminished considerably over the last 12 months. + +These new classes provide access to the very largest trillion-parameter models from OpenAI, Google, and Anthropic, continuing to provide the most consistent +and competent agent experiences. However, it's worth trying smaller models from the likes of Meta, Mistral AI, Microsoft, Qwen, and many others. Perhaps they +are capable enough for a task, or sub-task, or even better suited (such as a coding model)! + +Using smaller models will have cost benefits, but they also allow you to test models that you could run locally, allowing you to determine if you can remove cloud inference costs +altogether or even run an AutoGen workflow offline. + +On the topic of cost, these client classes also include provider-specific token cost calculations so you can monitor the cost impact of your workflows. With costs per million +tokens as low as 10 cents (and some are even free!), cost savings can be noticeable. + +## Mix and match + +How does Google's Gemini 1.5 Pro model stack up against Anthropic's Opus or Meta's Llama 3? + +Now you have the ability to quickly change your agent configs and find out. If you want to run all three in the one workflow, +AutoGen's ability to associate specific configurations to each agent means you can select the best LLM for each agent. + +## Capabilities + +The common requirements of text generation and function/tool calling are supported by these client classes. + +Multi-modal support, such as for image/audio/video, is an area of active development. The [Google Gemini](https://microsoft.github.io/autogen/docs/topics/non-openai-models/cloud-gemini) client class can be +used to create a multimodal agent. + +## Tips + +Here are some tips when working with these client classes: + +- **Most to least capable** - start with larger models and get your workflow working, then iteratively try smaller models. +- **Right model** - choose one that's suited to your task, whether it's coding, function calling, knowledge, or creative writing. +- **Agent names** - these cloud providers do not use the `name` field on a message, so be sure to use your agent's name in their `system_message` and `description` fields, as well as instructing the LLM to 'act as' them. This is particularly important for "auto" speaker selection in group chats as we need to guide the LLM to choose the next agent based on a name, so tweak `select_speaker_message_template`, `select_speaker_prompt_template`, and `select_speaker_auto_multiple_template` with more guidance. +- **Context length** - as your conversation gets longer, models need to support larger context lengths, be mindful of what the model supports and consider using [Transform Messages](https://microsoft.github.io/autogen/docs/topics/handling_long_contexts/intro_to_transform_messages) to manage context size. +- **Provider parameters** - providers have parameters you can set such as temperature, maximum tokens, top-k, top-p, and safety. See each client class in AutoGen's [API Reference](https://microsoft.github.io/autogen/docs/reference/oai/gemini) or [documentation](https://microsoft.github.io/autogen/docs/topics/non-openai-models/cloud-gemini) for details. +- **Prompts** - prompt engineering is critical in guiding smaller LLMs to do what you need. [ConversableAgent](https://microsoft.github.io/autogen/docs/reference/agentchat/conversable_agent), [GroupChat](https://microsoft.github.io/autogen/docs/reference/agentchat/groupchat), [UserProxyAgent](https://microsoft.github.io/autogen/docs/reference/agentchat/user_proxy_agent), and [AssistantAgent](https://microsoft.github.io/autogen/docs/reference/agentchat/assistant_agent) all have customizable prompt attributes that you can tailor. Here are some prompting tips from [Anthropic](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)([+Library](https://docs.anthropic.com/en/prompt-library/library)), [Mistral AI](https://docs.mistral.ai/guides/prompting_capabilities/), [Together.AI](https://docs.together.ai/docs/examples), and [Meta](https://llama.meta.com/docs/how-to-guides/prompting/). +- **Help!** - reach out on the AutoGen [Discord](https://discord.gg/pAbnFJrkgZ) or [log an issue](https://github.com/microsoft/autogen/issues) if you need help with or can help improve these client classes. + +Now it's time to try them out. + +## Quickstart + +### Installation + +Install the appropriate client based on the model you wish to use. + +```sh +pip install pyautogen["mistral"] # for Mistral AI client +pip install pyautogen["anthropic"] # for Anthropic client +pip install pyautogen["together"] # for Together.AI client +pip install pyautogen["groq"] # for Groq client +``` + +### Configuration Setup + +Add your model configurations to the `OAI_CONFIG_LIST`. Ensure you specify the `api_type` to initialize the respective client (Anthropic, Mistral, or Together). + +```yaml +[ + { + "model": "your anthropic model name", + "api_key": "your Anthropic api_key", + "api_type": "anthropic" + }, + { + "model": "your mistral model name", + "api_key": "your Mistral AI api_key", + "api_type": "mistral" + }, + { + "model": "your together.ai model name", + "api_key": "your Together.AI api_key", + "api_type": "together" + }, + { + "model": "your groq model name", + "api_key": "your Groq api_key", + "api_type": "groq" + } +] +``` + +### Usage + +The `[config_list_from_json](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils/#config_list_from_json)` function loads a list of configurations from an environment variable or a json file. + +```py +import autogen +from autogen import AssistantAgent, UserProxyAgent + +config_list = autogen.config_list_from_json( + "OAI_CONFIG_LIST" +) +``` + +### Construct Agents + +Construct a simple conversation between a User proxy and an Assistant agent + +```py +user_proxy = UserProxyAgent( + name="User_proxy", + code_execution_config={ + "last_n_messages": 2, + "work_dir": "groupchat", + "use_docker": False, # Please set use_docker = True if docker is available to run the generated code. Using docker is safer than running the generated code directly. + }, + human_input_mode="ALWAYS", + is_termination_msg=lambda msg: not msg["content"] +) + +assistant = AssistantAgent( + name="assistant", + llm_config = {"config_list": config_list} +) +``` + +### Start chat + +```py + +user_proxy.intiate_chat(assistant, message="Write python code to print Hello World!") + +``` + +**NOTE: To integrate this setup into GroupChat, follow the [tutorial](https://microsoft.github.io/autogen/docs/notebooks/agentchat_groupchat) with the same config as above.** + + +## Function Calls + +Now, let's look at how Anthropic's Sonnet 3.5 is able to suggest multiple function calls in a single response. + +This example is a simple travel agent setup with an agent for function calling and a user proxy agent for executing the functions. + +One thing you'll note here is Anthropic's models are more verbose than OpenAI's and will typically provide chain-of-thought or general verbiage when replying. Therefore we provide more explicit instructions to `functionbot` to not reply with more than necessary. Even so, it can't always help itself! + +Let's start with setting up our configuration and agents. + +```py +import os +import autogen +import json +from typing import Literal +from typing_extensions import Annotated + +# Anthropic configuration, using api_type='anthropic' +anthropic_llm_config = { + "config_list": + [ + { + "api_type": "anthropic", + "model": "claude-3-5-sonnet-20240620", + "api_key": os.getenv("ANTHROPIC_API_KEY"), + "cache_seed": None + } + ] +} + +# Our functionbot, who will be assigned two functions and +# given directions to use them. +functionbot = autogen.AssistantAgent( + name="functionbot", + system_message="For currency exchange tasks, only use " + "the functions you have been provided with. Do not " + "reply with helpful tips. Once you've recommended functions " + "reply with 'TERMINATE'.", + is_termination_msg=lambda x: x.get("content", "") and (x.get("content", "").rstrip().endswith("TERMINATE") or x.get("content", "") == ""), + llm_config=anthropic_llm_config, +) + +# Our user proxy agent, who will be used to manage the customer +# request and conversation with the functionbot, terminating +# when we have the information we need. +user_proxy = autogen.UserProxyAgent( + name="user_proxy", + system_message="You are a travel agent that provides " + "specific information to your customers. Get the " + "information you need and provide a great summary " + "so your customer can have a great trip. If you " + "have the information you need, simply reply with " + "'TERMINATE'.", + is_termination_msg=lambda x: x.get("content", "") and (x.get("content", "").rstrip().endswith("TERMINATE") or x.get("content", "") == ""), + human_input_mode="NEVER", + max_consecutive_auto_reply=10, +) +``` + +We define the two functions. +```py +CurrencySymbol = Literal["USD", "EUR"] + +def exchange_rate(base_currency: CurrencySymbol, quote_currency: CurrencySymbol) -> float: + if base_currency == quote_currency: + return 1.0 + elif base_currency == "USD" and quote_currency == "EUR": + return 1 / 1.1 + elif base_currency == "EUR" and quote_currency == "USD": + return 1.1 + else: + raise ValueError(f"Unknown currencies {base_currency}, {quote_currency}") + +def get_current_weather(location, unit="fahrenheit"): + """Get the weather for some location""" + if "chicago" in location.lower(): + return json.dumps({"location": "Chicago", "temperature": "13", "unit": unit}) + elif "san francisco" in location.lower(): + return json.dumps({"location": "San Francisco", "temperature": "55", "unit": unit}) + elif "new york" in location.lower(): + return json.dumps({"location": "New York", "temperature": "11", "unit": unit}) + else: + return json.dumps({"location": location, "temperature": "unknown"}) +``` + +And then associate them with the `user_proxy` for execution and `functionbot` for the LLM to consider using them. + +```py +@user_proxy.register_for_execution() +@functionbot.register_for_llm(description="Currency exchange calculator.") +def currency_calculator( + base_amount: Annotated[float, "Amount of currency in base_currency"], + base_currency: Annotated[CurrencySymbol, "Base currency"] = "USD", + quote_currency: Annotated[CurrencySymbol, "Quote currency"] = "EUR", +) -> str: + quote_amount = exchange_rate(base_currency, quote_currency) * base_amount + return f"{quote_amount} {quote_currency}" + +@user_proxy.register_for_execution() +@functionbot.register_for_llm(description="Weather forecast for US cities.") +def weather_forecast( + location: Annotated[str, "City name"], +) -> str: + weather_details = get_current_weather(location=location) + weather = json.loads(weather_details) + return f"{weather['location']} will be {weather['temperature']} degrees {weather['unit']}" +``` + +Finally, we start the conversation with a request for help from our customer on their upcoming trip to New York and the Euro they would like exchanged to USD. + +Importantly, we're also using Anthropic's Sonnet to provide a summary through the `summary_method`. Using `summary_prompt`, we guide Sonnet to give us an email output. + +```py +# start the conversation +res = user_proxy.initiate_chat( + functionbot, + message="My customer wants to travel to New York and " + "they need to exchange 830 EUR to USD. Can you please " + "provide them with a summary of the weather and " + "exchanged currently in USD?", + summary_method="reflection_with_llm", + summary_args={ + "summary_prompt": """Summarize the conversation by + providing an email response with the travel information + for the customer addressed as 'Dear Customer'. Do not + provide any additional conversation or apologise, + just provide the relevant information and the email.""" + }, +) +``` + +After the conversation has finished, we'll print out the summary. + +```py +print(f"Here's the LLM summary of the conversation:\n\n{res.summary['content']}") +``` + +Here's the resulting output. + +```text +user_proxy (to functionbot): + +My customer wants to travel to New York and they need to exchange 830 EUR +to USD. Can you please provide them with a summary of the weather and +exchanged currently in USD? + +-------------------------------------------------------------------------------- +functionbot (to user_proxy): + +Certainly! I'd be happy to help your customer with information about the +weather in New York and the currency exchange from EUR to USD. Let's use +the available tools to get this information. + +***** Suggested tool call (toolu_016wBUKVX2TToBaMqmiGvhav): weather_forecast ***** +Arguments: +{"location": "New York"} +********************************************************************************** +***** Suggested tool call (toolu_01Nxjeew2BspfKdZ85on3XkP): currency_calculator ***** +Arguments: +{"base_amount": 830, "base_currency": "EUR", "quote_currency": "USD"} +************************************************************************************* + +-------------------------------------------------------------------------------- + +>>>>>>>> EXECUTING FUNCTION weather_forecast... + +>>>>>>>> EXECUTING FUNCTION currency_calculator... +user_proxy (to functionbot): + +user_proxy (to functionbot): + +***** Response from calling tool (toolu_016wBUKVX2TToBaMqmiGvhav) ***** +New York will be 11 degrees fahrenheit +*********************************************************************** + +-------------------------------------------------------------------------------- +user_proxy (to functionbot): + +***** Response from calling tool (toolu_01Nxjeew2BspfKdZ85on3XkP) ***** +913.0000000000001 USD +*********************************************************************** + +-------------------------------------------------------------------------------- +functionbot (to user_proxy): + +Thank you for providing the information. I can now give your customer a +summary of the weather in New York and the currency exchange from EUR to USD. + +Weather in New York: +The current forecast for New York indicates that it will be 11 degrees +Fahrenheit. This is quite cold, so your customer should be prepared with +warm clothing for their trip. + +Currency Exchange: +For 830 EUR, your customer will receive approximately 913 USD. This is based +on the current exchange rate provided by our currency calculator. + +To summarize: +1. Weather in New York: 11°F (very cold) +2. Currency exchange: 830 EUR = 913 USD + +Your customer should pack warm clothes for the cold weather in New York and +can expect to have about 913 USD for their trip after exchanging 830 EUR. + +TERMINATE + +-------------------------------------------------------------------------------- +Here's the LLM summary of the conversation: + +Certainly. I'll provide an email response to the customer with the travel +information as requested. + +Dear Customer, + +We are pleased to provide you with the following information for your +upcoming trip to New York: + +Weather Forecast: +The current forecast for New York indicates a temperature of 11 degrees +Fahrenheit. Please be prepared for very cold weather and pack appropriate +warm clothing. + +Currency Exchange: +We have calculated the currency exchange for you. Your 830 EUR will be +equivalent to approximately 913 USD at the current exchange rate. + +We hope this information helps you prepare for your trip to New York. Have +a safe and enjoyable journey! + +Best regards, +Travel Assistance Team +``` + +So we can see how Anthropic's Sonnet is able to suggest multiple tools in a single response, with AutoGen executing them both and providing the results back to Sonnet. Sonnet then finishes with a nice email summary that can be the basis for continued real-life conversation with the customer. + +## More tips and tricks + +For an interesting chess game between Anthropic's Sonnet and Mistral's Mixtral, we've put together a sample notebook that highlights some of the tips and tricks for working with non-OpenAI LLMs. [See the notebook here](https://microsoft.github.io/autogen/docs/notebooks/agentchat_nested_chats_chess_altmodels). diff --git a/website/blog/authors.yml b/website/blog/authors.yml index 302bb8fceaaf..0e023514465d 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -13,8 +13,8 @@ qingyunwu: yiranwu: name: Yiran Wu title: PhD student at Pennsylvania State University - url: https://github.com/kevin666aa - image_url: https://github.com/kevin666aa.png + url: https://github.com/yiranwu0 + image_url: https://github.com/yiranwu0.png jialeliu: name: Jiale Liu @@ -123,3 +123,20 @@ yifanzeng: title: PhD student at Oregon State University url: https://xhmy.github.io/ image_url: https://xhmy.github.io/assets/img/photo.JPG + +jluey: + name: James Woffinden-Luey + title: Senior Research Engineer at Microsoft Research + url: https://github.com/jluey1 + +Hk669: + name: Hrushikesh Dokala + title: CS Undergraduate Based in India + url: https://github.com/Hk669 + image_url: https://github.com/Hk669.png + +marklysze: + name: Mark Sze + title: AI Freelancer + url: https://github.com/marklysze + image_url: https://github.com/marklysze.png diff --git a/website/docs/Examples.md b/website/docs/Examples.md index 2ec83d1e0f21..3b71dd682e97 100644 --- a/website/docs/Examples.md +++ b/website/docs/Examples.md @@ -80,6 +80,9 @@ Links to notebook examples: - OpenAI Assistant in a Group Chat - [View Notebook](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_oai_assistant_groupchat.ipynb) - GPTAssistantAgent based Multi-Agent Tool Use - [View Notebook](https://github.com/microsoft/autogen/blob/main/notebook/gpt_assistant_agent_function_call.ipynb) +### Non-OpenAI Models +- Conversational Chess using non-OpenAI Models - [View Notebook](/docs/notebooks/agentchat_nested_chats_chess_altmodels) + ### Multimodal Agent - Multimodal Agent Chat with DALLE and GPT-4V - [View Notebook](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_dalle_and_gpt4v.ipynb) diff --git a/website/docs/Getting-Started.mdx b/website/docs/Getting-Started.mdx index 0f8c7322411d..3e162a098327 100644 --- a/website/docs/Getting-Started.mdx +++ b/website/docs/Getting-Started.mdx @@ -3,11 +3,12 @@ import TabItem from "@theme/TabItem"; # Getting Started -AutoGen is a framework that enables development of LLM applications using -multiple agents that can converse with each other to solve tasks. AutoGen agents -are customizable, conversable, and seamlessly allow human participation. They -can operate in various modes that employ combinations of LLMs, human inputs, and -tools. +AutoGen is an open-source programming framework for building AI agents and facilitating +cooperation among multiple agents to solve tasks. AutoGen aims to provide an easy-to-use +and flexible framework for accelerating development and research on agentic AI, +like PyTorch for Deep Learning. It offers features such as agents that can converse +with other agents, LLM and tool use support, autonomous and human-in-the-loop workflows, +and multi-agent conversation patterns.  diff --git a/website/docs/contributor-guide/contributing.md b/website/docs/contributor-guide/contributing.md index b90d81f227c8..b1b6b848f667 100644 --- a/website/docs/contributor-guide/contributing.md +++ b/website/docs/contributor-guide/contributing.md @@ -1,6 +1,6 @@ # Contributing to AutoGen -This project welcomes and encourages all forms of contributions, including but not limited to: +The project welcomes contributions from developers and organizations worldwide. Our goal is to foster a collaborative and inclusive community where diverse perspectives and expertise can drive innovation and enhance the project's capabilities. Whether you are an individual contributor or represent an organization, we invite you to join us in shaping the future of this project. Together, we can build something truly remarkable. Possible contributions include but not limited to: - Pushing patches. - Code review of pull requests. @@ -32,3 +32,7 @@ To see what we are working on and what we plan to work on, please check our ## Becoming a Reviewer There is currently no formal reviewer solicitation process. Current reviewers identify reviewers from active contributors. If you are willing to become a reviewer, you are welcome to let us know on discord. + +## Contact Maintainers + +The project is currently maintained by a [dynamic group of volunteers](https://butternut-swordtail-8a5.notion.site/410675be605442d3ada9a42eb4dfef30?v=fa5d0a79fd3d4c0f9c112951b2831cbb&pvs=4) from several different organizations. Contact project administrators Chi Wang and Qingyun Wu via auto-gen@outlook.com if you are interested in becoming a maintainer. diff --git a/website/docs/ecosystem/agentops.md b/website/docs/ecosystem/agentops.md index 76995b6eb5e4..581fb2671e97 100644 --- a/website/docs/ecosystem/agentops.md +++ b/website/docs/ecosystem/agentops.md @@ -1,29 +1,37 @@ -# AgentOps 🖇️ +# Agent Monitoring and Debugging with AgentOps - +