Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor LLama extension + cookbook to map chat messages --> Prompt 1-1 #630

Open
rossdanlm opened this issue Dec 27, 2023 · 3 comments
Open
Labels

Comments

@rossdanlm
Copy link
Contributor

rossdanlm commented Dec 27, 2023

See comments in #605 (comment)

Right now we'll only be storing the last message from response instead of the response (if there are multiple texts returned)

@saqadri
Copy link
Contributor

saqadri commented Dec 27, 2023

Just so I understand the impact -- it sounds like our llama extension doesn't support multi-turn messages (i.e. chats)?

@rossdanlm
Copy link
Contributor Author

rossdanlm commented Dec 27, 2023

TLDR

I am not sure if the code below is the equivalent mapping to a ChatCompletionRequestMessage object:

"
CONTEXT:
Q: {q_1}
A: {a_1}
...
Q: {q_n}
A: {a_n}

QUESTION:
{resolved_prompt}
"

If yes, then we support multi-turn. If no then there could we weird things going on. In either case, we're not saving each message as it's own individual prompt. Instead we're storing the q&a into the model parser object itself (under the self.qa field), not into the prompts

Details

I'm not quite sure, I haven't been able to run the cookbook myself to test becasue of #606

It's a bit hard for me to debug without being to run, but by looking through the code, this is my understanding:

  1. We build chat history in a string format: https://github.com/lastmile-ai/aiconfig/blob/main/extensions/llama/python/llama.py#L41-L46
  2. This gets passed into the Llama call: https://github.com/lastmile-ai/aiconfig/blob/main/extensions/llama/python/llama.py#L88. Perhaps this is a short-hand for representing messages object, but I couldn't find example of this from the API docs: https://docs.llama-api.com/api-reference/endpoint/create. Jonathan probalby knows more details since he built it.

Just a note for #2 is that in our other chat-based model parsers we use completion params instead as well as parse a response object like ChatCompletionResponseMessage: https://github.com/abetlen/llama-cpp-python/blob/f952d45c2cd0ccb63b117130c1b1bf4897987e4c/llama_cpp/llama_types.py#L57-L75 which LLama also accepts:
6.

@rossdanlm
Copy link
Contributor Author

Also I notice that we seem to do add individual prompts for each message for the typescript implementation (https://github.com/lastmile-ai/aiconfig/blob/v1.1.8/extensions/llama/typescript/llama.ts#L131-L154), so this seems like it might only apply to python? Will sync with @jonathanlastmileai on this later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants