-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using RAG or Multi-Agent features with llama models the script stops replying without meeting the termination condition or errors #4422
Comments
@gianx89 thanks for raising the issue. Will you be able to share more how you are provisioning your local API? Anything that might help reproducing? |
Hi and thanks, @MohMaz. I’ve tried the following provisioning modes:
I’ve been focusing on various Llama versions and tested them all in different "sizes". Here’s what I’ve tried:
I’ve also experimented with Mistral and other models in the RAG-only version of my project, but I consistently encountered problems. Now, I’m shifting my focus to the Multi-Agent aspect of the project. I’ve tried reducing the token size (using an approximate method) and limiting the size of the chat history, but the results remain the same. Here’s my
Reproducing the ProblemThis is a link to a Python file to reproduce the problem. Currently, it uses You must provide some documents in the RAG documents folder; otherwise, you might not be able to reproduce the issue. The problem sometimes worsens when I include context from RAG. |
I think the termination condition was triggered. It could be the message transform that truncates the message and prevents the message from displaying fully. @thinkall @WaelKarkoub what do you think? |
It happened even without truncation. I added it trying to solve the problem. I'll post later an output without truncation. Or you can try it disabling the truncation in the provided script. |
Here it's an output without truncation, the same problem happens.
This is the link containing the code to reproduce the error. |
Hi @gianx89 , the chat is terminated by the RAG agent. To fix it, you'll need to pass in your own "is_termination_msg" to it like you've done for GroupChatManager. You may want to check out https://microsoft.github.io/autogen/0.2/docs/topics/retrieval_augmentation , https://github.com/microsoft/autogen/blob/0.2/notebook/agentchat_groupchat_RAG.ipynb and https://github.com/microsoft/autogen/blob/0.2/notebook/agentchat_RetrieveChat.ipynb for some examples of RAG agents. You've mentioned that with openai models, everything is fine. I'm curious, because with your code, I'd expect the same termination behavior. |
@thinkall Hi and thanks. I didn't try this exact code iteration with OpenAI. In general, the same code performed better and in a most reliable way with OpenAI APIs. I tried adding
I tried adding This is an example output:
This is the revised code: Any suggestions? |
@thinkall I tried to fix the
Now the chat ends correctly. However, sometimes, I get this behaviour (I changed the prompt a little bit but it's not important):
It seems that sometimes it doesn't understand context, the context gets updated until it runs out of context and terminates. Is it a normal behaviour? |
Hi @gianx89 , it's normal with the current default prompt of the RAG agent. You can either try adding something like "never reply with 'update context'" to your question ; or pass in a new simple system message to the rag agent (I'd recommend this since your question already have detailed prompts in it). You can find an example in the notebooks and docs I've shared in my previous response. |
Hi to all! I'm using Autogen to develop a RAG system, even better if Multi-Agent. I must use open source, preferably local models, like llama 3.1 or llama 3.2 . I'm using ChromaDB as my vector database.
I'm developing a system that can write a Comic Book Story with a specific format. There is writer or teams of writers that writes the story, a critic or a team of critics that gives advices on how to improve the story a manager or team of managers that incorporates this suggestions. When the story is deemed satisfactory, an agent writes "TERMINATE".
I don't have any issues using OpenAI APIs and models like GPT-3.5 Turbo or GPT-4. However, when working with open-source or local models, I encounter unpredictable behaviors.
The agents starts talking:
The chats ends without error but abruptly, without meeting the termination condition. Sometimes, very rarely, I get the right results and the chat ends with "TERMINATE" string.
Any suggestions? I can't share all the code at the moment, but I can reply with snippets of it.
P.S.
The max_rounds are pretty high (100)
The text was updated successfully, but these errors were encountered: