-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: InternLM 2.5 Chat Tool Calls: Incorrect and Inconsistent Formatting #8405
Comments
@apresence it looks like you are using the old release. Guys have done a lot of work addressing issues with prompt templates. Everything should be working close to perfect right now. Also main does not exist anymore use llama-cli refer to the manual at the bottom of the README for more details. |
Well, that was a n00b mistake. I had pulled the latest from git in-place in the dir where I had cloned it previously, then recompiled. But I didn't realize the binary names changed AND I didn't do a
|
@apresence no worries, I am using qwen for this test: <|im_start|>system |
I'm focusing specifically on the tool calling feature of InternLM 2.5. There are a few main issues:
Regarding Issue 1, there are two ways I see to fix this:
|
OK, I modified tokenizer_config.json and cleared the "special" flag for the tools tokens I'll have fixed GGUFs up on HF shortly. |
Here's my experiment. It looks ok. https://github.com/foldl/chatllm.cpp/blob/master/scripts/tool_internlm.py |
It looks like you're using a modified version of llama.cpp there. It's possible your version handles special tokens differently, or you're using a different route to the backend ggml libs than I am. Also, your code is using Chinese for the prompts. I doubt that would make a difference, but it's possible. |
The Chinese prompt is from their official example: https://github.com/InternLM/lagent/blob/main/lagent/agents/internlm2_agent.py |
After some more testing, I've found that the tool call works 100% of the time if The following config output is from running the sample code in transformers with debugging turned on:
As you can see, rope scaling is set to
I think this might be why tool calls are unreliable with rope scaling turned on. Looking at the configuration_internlm2.py code from transformers, we can see this comment about the
Since it says that 'linear' is an option, I tried it with Implementing Just to button it all up ... This works 100% of the time:
This works only ~ 50% of the time:
And it's probably because:
|
@apresence @dspasyuk hi, guys. BTW, we can pull a request and set the tools special tokens to type of |
@apresence hi, this is interesting. How do you test? on a dataset? |
Thank you for taking the time to address this topic. You are right, Let's remove For the record, this is the version of
Without tool call fixThe tool call tokens are never included regardless of the With --special
Without --special
With tool call fixThe tool call tokens are always included regardless of the With --special
Without --special
|
@apresence hi, |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
What happened?
After having spent the better part of two days chasing my tail with this, I figured I'd try to save someone else the trouble.
I have tested using the latest llama.cpp with the various GGUFs of internlm2_5-7b-chat, including the one provided by internlm and several converted by the community, as well as HF transformers AND in both chat and completion modes. I cannot get tool calls to work as described in the paper and summarized here and outlined further here.
I'm creating this issue for two reasons:
I posted a message about this issue on HF as well.
Name and Version
$ ./main --version
version: 3093 (7672ade)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Tokenization?
For the record, this doesn't appear to be a (de)tokenization issue, as those are (de/en)coding fine:
The HF model is (de/en)coding fine too.
Issue 1: Inconsistent Formatting
Command:
./main --predict 512 --gpu-layers 32 --temp 0.8 --top-p 0.8 --top-k 50 -r '<|im_end|>\n' -if --multiline-input --model models/internlm.internlm2_5-7b-chat-q4_k_m.gguf --special
NOTE: top-k and other parameters are taken from the model card on HF.
Input:
Example Output 1:
There should be only one
<|im_end|>
token there, not two. Let's continue on and simulate the return for that tool call.Input:
Output:
So it sorta works, but the tokens seem to be jumbled up, and it's linkin' the heck out of that image!
The other problem here is that the special tokens aren't being passed on in the
server
program when the/completion
endpoint is used. I tried the--special
option to see if that would work, and it didn't.Example Output 2: <|api_name=generate_image|>...<|api_name_end|>
I didn't have
--special
turned on for this so the special tokens weren't displayed, but the model made up new ones!Issue 2: Incorrect Formatting
FWIW, the models/internlm.internlm2_5-7b-chat model does not adhere to the spec even when run on transfomers, but at least the output is consistent. That is, it always returns only the JSON part of the tool call without the surrounding
<|action_start|><|plugin|>...<|action_end|>
tags.It might just not be returning the special tokens. I'm looking into that.
The following is a console snippet from a docker container created from the image
huggingface/transformers-pytorch-gpu
and run on a GeForce RTX 4090.The text was updated successfully, but these errors were encountered: