Skip to content

Commit 6ca4785

Browse files
authored
Improving website navigation and help click-through analytics collection (#2205)
* Add changes * Update
1 parent 44acab4 commit 6ca4785

File tree

13 files changed

+304
-169
lines changed

13 files changed

+304
-169
lines changed

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
[![Build](https://github.com/microsoft/autogen/actions/workflows/python-package.yml/badge.svg)](https://github.com/microsoft/autogen/actions/workflows/python-package.yml)
33
![Python Version](https://img.shields.io/badge/3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)
44
[![Downloads](https://static.pepy.tech/badge/pyautogen/week)](https://pepy.tech/project/pyautogen)
5-
[![Discord](https://img.shields.io/discord/1153072414184452236?logo=discord&style=flat)](https://discord.gg/pAbnFJrkgZ)
5+
[![Discord](https://img.shields.io/discord/1153072414184452236?logo=discord&style=flat)](https://aka.ms/autogen-dc)
66
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40pyautogen)](https://twitter.com/pyautogen)
77

88

@@ -62,7 +62,7 @@ AutoGen is powered by collaborative [research studies](https://microsoft.github.
6262
## Roadmaps
6363

6464
To see what we are working on and what we plan to work on, please check our
65-
[Roadmap Issues](https://github.com/microsoft/autogen/issues?q=is%3Aopen+is%3Aissue+label%3Aroadmap).
65+
[Roadmap Issues](https://aka.ms/autogen-roadmap).
6666

6767
## Quickstart
6868
The easiest way to start playing is
@@ -172,7 +172,7 @@ In addition, you can find:
172172

173173
- [Research](https://microsoft.github.io/autogen/docs/Research), [blogposts](https://microsoft.github.io/autogen/blog) around AutoGen, and [Transparency FAQs](https://github.com/microsoft/autogen/blob/main/TRANSPARENCY_FAQS.md)
174174

175-
- [Discord](https://discord.gg/pAbnFJrkgZ)
175+
- [Discord](https://aka.ms/autogen-dc)
176176

177177
- [Contributing guide](https://microsoft.github.io/autogen/docs/Contribute)
178178

website/blog/2023-04-21-LLM-tuning-math/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -71,4 +71,4 @@ The need for model selection, parameter tuning and cost saving is not specific t
7171
* [Research paper about the tuning technique](https://arxiv.org/abs/2303.04673)
7272
* [Documentation about inference tuning](/docs/Use-Cases/enhanced_inference)
7373

74-
*Do you have any experience to share about LLM applications? Do you like to see more support or research of LLM optimization or automation? Please join our [Discord](https://discord.gg/pAbnFJrkgZ) server for discussion.*
74+
*Do you have any experience to share about LLM applications? Do you like to see more support or research of LLM optimization or automation? Please join our [Discord](https://aka.ms/autogen-dc) server for discussion.*

website/blog/2023-05-18-GPT-adaptive-humaneval/index.mdx

+26-16
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,19 @@ tags: [LLM, GPT, research]
77
![An adaptive way of using GPT-3.5 and GPT-4 outperforms GPT-4 in both coding success rate and inference cost](img/humaneval.png)
88

99
**TL;DR:**
10-
* **A case study using the HumanEval benchmark shows that an adaptive way of using multiple GPT models can achieve both much higher accuracy (from 68% to 90%) and lower inference cost (by 18%) than using GPT-4 for coding.**
1110

11+
- **A case study using the HumanEval benchmark shows that an adaptive way of using multiple GPT models can achieve both much higher accuracy (from 68% to 90%) and lower inference cost (by 18%) than using GPT-4 for coding.**
1212

1313
GPT-4 is a big upgrade of foundation model capability, e.g., in code and math, accompanied by a much higher (more than 10x) price per token to use over GPT-3.5-Turbo. On a code completion benchmark, [HumanEval](https://huggingface.co/datasets/openai_humaneval), developed by OpenAI, GPT-4 can successfully solve 68% tasks while GPT-3.5-Turbo does 46%. It is possible to increase the success rate of GPT-4 further by generating multiple responses or making multiple calls. However, that will further increase the cost, which is already nearly 20 times of using GPT-3.5-Turbo and with more restricted API call rate limit. Can we achieve more with less?
1414

1515
In this blog post, we will explore a creative, adaptive way of using GPT models which leads to a big leap forward.
1616

1717
## Observations
1818

19-
* GPT-3.5-Turbo can already solve 40%-50% tasks. For these tasks if we never use GPT-4, we can save nearly 40-50% cost.
20-
* If we use the saved cost to generate more responses with GPT-4 for the remaining unsolved tasks, it is possible to solve some more of them while keeping the amortized cost down.
19+
- GPT-3.5-Turbo can already solve 40%-50% tasks. For these tasks if we never use GPT-4, we can save nearly 40-50% cost.
20+
- If we use the saved cost to generate more responses with GPT-4 for the remaining unsolved tasks, it is possible to solve some more of them while keeping the amortized cost down.
2121

22-
The obstacle of leveraging these observations is that we do not know *a priori* which tasks can be solved by the cheaper model, which tasks can be solved by the expensive model, and which tasks can be solved by paying even more to the expensive model.
22+
The obstacle of leveraging these observations is that we do not know _a priori_ which tasks can be solved by the cheaper model, which tasks can be solved by the expensive model, and which tasks can be solved by paying even more to the expensive model.
2323

2424
To overcome that obstacle, one may want to predict which task requires what model to solve and how many responses are required for each task. Let's look at one example code completion task:
2525

@@ -49,8 +49,8 @@ Some simple example test cases are provided in the docstr. If we already have a
4949

5050
Combining these observations, we can design a solution with two intuitive ideas:
5151

52-
* Make use of auto-generated feedback, i.e., code execution results, to filter responses.
53-
* Try inference configurations one by one, until one response can pass the filter.
52+
- Make use of auto-generated feedback, i.e., code execution results, to filter responses.
53+
- Try inference configurations one by one, until one response can pass the filter.
5454

5555
![Design](img/design.png)
5656

@@ -72,6 +72,7 @@ The inference cost includes the cost for generating the assertions in our soluti
7272
Here are a few examples of function definitions which are solved by different configurations in the portfolio.
7373

7474
1. Solved by GPT-3.5-Turbo, n=1, temperature=0
75+
7576
```python
7677
def compare(game,guess):
7778
"""I think we all remember that feeling when the result of some long-awaited
@@ -89,8 +90,10 @@ def compare(game,guess):
8990
compare([0,5,0,0,0,4],[4,1,1,0,0,-2]) -> [4,4,1,0,0,6]
9091
"""
9192
```
93+
9294
2. Solved by GPT-3.5-Turbo, n=7, temperature=1, stop=["\nclass", "\ndef", "\nif", "\nprint"]: the `vowels_count` function presented earlier.
9395
3. Solved by GPT-4, n=1, temperature=0:
96+
9497
```python
9598
def string_xor(a: str, b: str) -> str:
9699
""" Input are two strings a and b consisting only of 1s and 0s.
@@ -99,7 +102,9 @@ def string_xor(a: str, b: str) -> str:
99102
'100'
100103
"""
101104
```
105+
102106
4. Solved by GPT-4, n=2, temperature=1, stop=["\nclass", "\ndef", "\nif", "\nprint"]:
107+
103108
```python
104109
def is_palindrome(string: str) -> bool:
105110
""" Test if given string is a palindrome """
@@ -119,7 +124,9 @@ def make_palindrome(string: str) -> str:
119124
'catac'
120125
"""
121126
```
127+
122128
5. Solved by GPT-4, n=1, temperature=1, stop=["\nclass", "\ndef", "\nif", "\nprint"]:
129+
123130
```python
124131
def sort_array(arr):
125132
"""
@@ -135,8 +142,9 @@ def sort_array(arr):
135142
```
136143

137144
The last problem is an example with wrong example test cases in the original definition. It misleads the adaptive solution because a correct implementation is regarded as wrong and more trials are made. The last configuration in the sequence returns the right implementation, even though it does not pass the auto-generated assertions. This example demonstrates that:
138-
* Our adaptive solution has a certain degree of fault tolerance.
139-
* The success rate and inference cost for the adaptive solution can be further improved if correct example test cases are used.
145+
146+
- Our adaptive solution has a certain degree of fault tolerance.
147+
- The success rate and inference cost for the adaptive solution can be further improved if correct example test cases are used.
140148

141149
It is worth noting that the reduced inference cost is the amortized cost over all the tasks. For each individual task, the cost can be either larger or smaller than directly using GPT-4. This is the nature of the adaptive solution: The cost is in general larger for difficult tasks than that for easy tasks.
142150

@@ -147,22 +155,24 @@ An example notebook to run this experiment can be found at: https://github.com/m
147155
Our solution is quite simple to implement using a generic interface offered in [`autogen`](/docs/Use-Cases/enhanced_inference#logic-error), yet the result is quite encouraging.
148156

149157
While the specific way of generating assertions is application-specific, the main ideas are general in LLM operations:
150-
* Generate multiple responses to select - especially useful when selecting a good response is relatively easier than generating a good response at one shot.
151-
* Consider multiple configurations to generate responses - especially useful when:
158+
159+
- Generate multiple responses to select - especially useful when selecting a good response is relatively easier than generating a good response at one shot.
160+
- Consider multiple configurations to generate responses - especially useful when:
152161
- Model and other inference parameter choice affect the utility-cost tradeoff; or
153162
- Different configurations have complementary effect.
154163

155164
A [previous blog post](/blog/2023/04/21/LLM-tuning-math) provides evidence that these ideas are relevant in solving math problems too.
156165
`autogen` uses a technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) to support inference parameter tuning and model selection.
157166

158167
There are many directions of extensions in research and development:
159-
* Generalize the way to provide feedback.
160-
* Automate the process of optimizing the configurations.
161-
* Build adaptive agents for different applications.
162168

163-
*Do you find this approach applicable to your use case? Do you have any other challenge to share about LLM applications? Do you like to see more support or research of LLM optimization or automation? Please join our [Discord](https://discord.gg/pAbnFJrkgZ) server for discussion.*
169+
- Generalize the way to provide feedback.
170+
- Automate the process of optimizing the configurations.
171+
- Build adaptive agents for different applications.
172+
173+
_Do you find this approach applicable to your use case? Do you have any other challenge to share about LLM applications? Do you like to see more support or research of LLM optimization or automation? Please join our [Discord](https://aka.ms/autogen-dc) server for discussion._
164174

165175
## For Further Reading
166176

167-
* [Documentation](/docs/Getting-Started) about `autogen` and [Research paper](https://arxiv.org/abs/2303.04673).
168-
* [Blog post](/blog/2023/04/21/LLM-tuning-math) about a related study for math.
177+
- [Documentation](/docs/Getting-Started) about `autogen` and [Research paper](https://arxiv.org/abs/2303.04673).
178+
- [Blog post](/blog/2023/04/21/LLM-tuning-math) about a related study for math.

0 commit comments

Comments
 (0)