eos problem by using the qwen2_5 sft #3113

YingchaoX · 2025-02-14T06:40:25Z

Describe the bug
The eos token does not work after sft the qwen2.5-14b base model.

Only if I add the stop token, the finish_reason can be stop.

the training script is like below, I use the base model to do sft training:

swift sft \
 --model qwen/Qwen2.5-14B \
 --model_type qwen2_5 \
 ...

the sft data is like below

Your hardware and system info
8*A800 80G
ms-swift 3.0.2.post1
torch 2.4.0
python 3.9
transformers 4.45.2

Additional context
check the config file of the saving model, the eos_token_id in generation_config.json is not correct, as well as the config.json.

14:32:15 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat generation_config.json 
{
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 64,
  "pad_token_id": 151643,
  "transformers_version": "4.45.2"
}
14:32:15 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat special_tokens_map.json
{
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "eos_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
}
14:32:28 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat added_tokens.json
{
  "</tool_call>": 151658,
  "<tool_call>": 151657,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
}
14:33:04 (base)xiongyc@bridgellm:/data/xiongyc/checkpoint-31322$ cat config.json
{
  "_name_or_path": "/mnt/juice/models/Qwen2.5-14B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 131072,
  "max_window_layers": 48,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 48,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.2",
  "use_cache": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}

also, for the previous version, for the model_type keyword, there are qwen2_14b-instruct and qwen2_14b to distinct the template between chat and base, but in this 3.x version, we only use the qwen2_5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eos problem by using the qwen2_5 sft #3113

eos problem by using the qwen2_5 sft #3113

YingchaoX commented Feb 14, 2025

eos problem by using the qwen2_5 sft #3113

eos problem by using the qwen2_5 sft #3113

Comments

YingchaoX commented Feb 14, 2025