Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eos problem by using the qwen2_5 sft #3113

Open
YingchaoX opened this issue Feb 14, 2025 · 0 comments
Open

eos problem by using the qwen2_5 sft #3113

YingchaoX opened this issue Feb 14, 2025 · 0 comments

Comments

@YingchaoX
Copy link

Describe the bug
The eos token does not work after sft the qwen2.5-14b base model.

Image

Only if I add the stop token, the finish_reason can be stop.

Image

the training script is like below, I use the base model to do sft training:

swift sft \
 --model qwen/Qwen2.5-14B \
 --model_type qwen2_5 \
 ...

the sft data is like below
Image

Your hardware and system info
8*A800 80G
ms-swift 3.0.2.post1
torch 2.4.0
python 3.9
transformers 4.45.2

Additional context
check the config file of the saving model, the eos_token_id in generation_config.json is not correct, as well as the config.json.

14:32:15 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat generation_config.json 
{
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 64,
  "pad_token_id": 151643,
  "transformers_version": "4.45.2"
}
14:32:15 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat special_tokens_map.json
{
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "eos_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
}
14:32:28 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat added_tokens.json
{
  "</tool_call>": 151658,
  "<tool_call>": 151657,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
}
14:33:04 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat config.json
{
  "_name_or_path": "/mnt/juice/models/Qwen2.5-14B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 131072,
  "max_window_layers": 48,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 48,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.2",
  "use_cache": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}

also, for the previous version, for the model_type keyword, there are qwen2_14b-instruct and qwen2_14b to distinct the template between chat and base, but in this 3.x version, we only use the qwen2_5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant