-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle context size overflow in AssistantAgent #9
Comments
I will handle this problem in microsoft/FLAML#1153. The problem should be in generate_reply, when it returns extra long messages. My current plan include the following functionalities:
@thinkall has implemented the tiktoken count in microsoft/FLAML#1158. Should I try to fix this concurrently? |
Your proposal can solve part of the problem. It does the check on the sender's side in case the receiver requests a length limit.
It'll be good to figure out what we want to support and have a comprehensive design. |
Sure, I will discuss will @thinkall and @LeoLjl about it. I just updated microsoft/FLAML#1153 to allow user to set a pre-defined token limit for outputs from code or function call, this is a different task and I think is a different task from handling token_limit in oai_reply. |
@sonichi @qingyun-wu Here is my proposed plan: On AssistantAgent: On UserProxyAgent (I already added this in microsoft/FLAML#1153): From the two changes above, all 3 generate_reply cases are addressed: oai_reply, code execution and function call. For tasks that involve databases and has a large consumption on tokens, like answer questions given a long text, or search for data in a database, I think we need special design targeting at those applications. |
The proposal is a good start. I like the design that covers two options: deal with token limit after/before a reply is made.
|
On second thought, I don’t think we need to pass a token_limit argument. Currently for function and code execution, I use a class variable “auto_reply_token_limit” to customize behavior when limit is reached. When a new agent is overloading, they can employ this variable, or just create a new class variable. |
Should the sender tell the receiver the token limit? "token_limit" and ways to handle token_limit should be separated. "token_limit" is a number that should be sent by the sender. Maybe we can make that a field in the message. The way to handle token_limit is decided in the auto reply method. |
I have a few questions when looking at the
It seems that this message argument is not used. When would this be used?
|
Good questions. Regarding 1, yes messages will be used when |
Integrate Mem0 for providing long-term memory for AI Agents
* first draft workflow for understandings * breaking change WIP * breaking SK upgrade * update compiles * add error handling/logging * change network timeout * mermory refactor * simple dev understanding * error handling for building understanding
microsoft/FLAML#1098, microsoft/FLAML#1153, microsoft/FLAML#1158 each addresses this in some specialized way. Can we integrate these ideas into a generic solution and make
AssistantAgent
able to overcome this limitation out of the box?Tasks
The text was updated successfully, but these errors were encountered: