[model] support ERNIE-4.5 #4757

Jintao-Huang · 2025-06-30T03:13:32Z

在开始微调之前，请确保您的环境已准备妥当。

对megatron相关依赖的安装可以查看megatron-swift训练文档（可直接使用镜像）：https://swift.readthedocs.io/zh-cn/latest/Instruction/Megatron-SWIFT%E8%AE%AD%E7%BB%83.html

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

微调数据集准备格式如下（system字段可选），在训练脚本中指定--dataset <dataset_path>即可。

{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "浙江的省会在杭州。"}]}

HF格式的权重转为Megatron格式，并测试转换精度：

# 4 * 20GiB
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift export \
    --model PaddlePaddle/ERNIE-4.5-21B-A3B-PT \
    --to_mcore true \
    --torch_dtype bfloat16 \
    --output_dir ERNIE-4.5-21B-A3B-PT-mcore \
    --test_convert_precision true

对ERNIE-4.5-21B-A3B-PT-mcore进行自我认知微调（全参数训练）。在4卡A800上所需显存资源为：4 * 51GiB，训练速度为16s/it。该脚本只是方便跑通测试，建议更换更好的通用数据集进行混合。

# 4 * 51GiB, 16s/it
CUDA_VISIBLE_DEVICES=0,1,2,3 \
megatron sft \
    --load ERNIE-4.5-21B-A3B-PT-mcore \
    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
              'AI-ModelScope/alpaca-gpt4-data-en#500' \
              'swift/self-cognition#500' \
    --expert_model_parallel_size 4 \
    --moe_grouped_gemm true \
    --moe_shared_expert_overlap true \
    --moe_aux_loss_coeff 0.01 \
    --micro_batch_size 4 \
    --global_batch_size 16 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --finetune true \
    --cross_entropy_loss_fusion true \
    --lr 1e-5 \
    --lr_warmup_fraction 0.05 \
    --min_lr 1e-6 \
    --save megatron_output/ERNIE-4.5-21B-A3B-PT \
    --eval_interval 100 \
    --save_interval 100 \
    --max_length 2048 \
    --max_epochs 1 \
    --num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim true \
    --no_save_rng true \
    --sequence_parallel true \
    --optimizer_cpu_offload true \
    --use_precision_aware_optimizer true \
    --attention_backend flash \
    --model_author swift \
    --model_name swift-robot

训练显存占用：

训练日志：

将Megatron格式权重转为HF格式，并测试转换精度：

CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift export \
    --mcore_model megatron_output/ERNIE-4.5-21B-A3B-PT/vx-xxx \
    --to_hf true \
    --torch_dtype bfloat16 \
    --output_dir megatron_output/ERNIE-4.5-21B-A3B-PT/vx-xxx-hf \
    --test_convert_precision true

训练完成后，使用以下命令进行推理：

CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift infer \
    --model output/vx-xxx/checkpoint-xxx \
    --stream true \
    --temperature 0 \
    --max_new_tokens 512

推送模型到ModelScope：

swift export \
    --model output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>'

xudongguan202 · 2025-07-18T10:36:08Z

Ernie必须用megatron训练吗，我用普通的sft跑报错了，提示不支持这个config类型

Jintao-Huang · 2025-07-19T06:33:15Z

是可以训练的。提个issue吧，附带报错信息

MJy1023 · 2025-09-23T06:18:39Z

请问hf2mcore转换权重过程中是否会把moe gate的fp32参数转为bf16类型

Jintao-Huang added 4 commits June 30, 2025 11:12

update

529e875

update

a3c0f59

update

505bba4

update

e7d6ef8

Jintao-Huang merged commit 76c56b0 into modelscope:main Jun 30, 2025
1 of 2 checks passed

Jintao-Huang mentioned this pull request Jul 2, 2025

🚀[Fine-tuning] ERNIE-4.5-MoE Megatron Training Implementation and Best Practices👋 PaddlePaddle/ERNIE#966

Closed

Jintao-Huang changed the title ~~support ERNIE~~ [model] support ERNIE-4.5 Jul 2, 2025

Jintao-Huang mentioned this pull request Jul 23, 2025

ERNIE-4.5-21B-A3B-PT微调lora报错 #5074

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[model] support ERNIE-4.5 #4757

[model] support ERNIE-4.5 #4757

Uh oh!

Jintao-Huang commented Jun 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

xudongguan202 commented Jul 18, 2025

Uh oh!

Jintao-Huang commented Jul 19, 2025

Uh oh!

MJy1023 commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[model] support ERNIE-4.5 #4757

[model] support ERNIE-4.5 #4757

Uh oh!

Conversation

Jintao-Huang commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xudongguan202 commented Jul 18, 2025

Uh oh!

Jintao-Huang commented Jul 19, 2025

Uh oh!

MJy1023 commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jintao-Huang commented Jun 30, 2025 •

edited

Loading