From 51a7d0e0d9ec18b246a460cbb5898f2c4061c0c5 Mon Sep 17 00:00:00 2001
From: root <15750543867@163.com>
Date: Sun, 25 May 2025 16:58:01 +0800
Subject: [PATCH] update ascend quick_start_doc

---
 docs/ascend/ascend_vllm073.rst              | 102 -----------
 docs/ascend_tutorial/ascend_quick_start.rst | 183 ++++++++++++++++++++
 tests/npu/run_qwen2_5_05b_grpo.sh           |   2 +-
 3 files changed, 184 insertions(+), 103 deletions(-)
 delete mode 100644 docs/ascend/ascend_vllm073.rst
 create mode 100644 docs/ascend_tutorial/ascend_quick_start.rst

diff --git a/docs/ascend/ascend_vllm073.rst b/docs/ascend/ascend_vllm073.rst
deleted file mode 100644
index d7c2a60ee1b..00000000000
--- a/docs/ascend/ascend_vllm073.rst
+++ /dev/null
@@ -1,102 +0,0 @@
-verl x Ascend
-========
-
-我们在 verl 上增加对华为昇腾设备的支持。
-
-硬件支持
-=======
-
-* Atlas 800T A2
-
-* Atlas 200T A2 Box16
-
-安装
-=======
-
-环境准备
-------
-
-+-----------+-------------+
-| software  | version     |
-+-----------+-------------+
-| Python    | == 3.10     |
-+-----------+-------------+
-| torch     | == 2.5.1    |
-+-----------+-------------+
-| torch_npu | == 2.5.1rc1 |
-+-----------+-------------+
-| CANN      | == 8.1.RC1  |
-+-----------+-------------+
-
-1. 为了能够在 ASCEND NPU 上正常使能 flash_attention_2， transformers 版本需要大于等于 4.52.0。
-2. 目前支持 SFT 与 LLM 模型的 GRPO 训练，VLM模型的 GRPO 训练因为 vllm-ascend 的问题将会在后续支持，涉及到的issue为：
-
-    https://github.com/vllm-project/vllm-ascend/issues/809
-
-    https://github.com/vllm-project/vllm-ascend/issues/825
-
-源码安装
-------
-
-.. code-block:: bash
-
-    git clone https://github.com/volcengine/verl.git
-    cd verl
-    pip install -r requirements-npu.txt
-    pip install -e .
-
-vLLM
-------
-
-为了保证能够在 verl 上正常使用 vLLM，需要使用以下命令编译安装 vLLM 和 vLLM Ascend 插件（`vllm-ascend`）。
-
-.. code-block:: bash
-
-    git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm.git
-    cd vllm
-    pip install -r requirements-build.txt
-    # for Atlas 200T A2 Box16
-    VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
-    # for Atlas 800T A2
-    VLLM_TARGET_DEVICE=empty pip install -e .
-
-.. code-block:: bash
-    
-    git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm-ascend.git
-    cd vllm-ascend
-    export COMPILE_CUSTOM_KERNELS=1
-    python setup.py install
-
-其他第三方库说明
-------
-
-+--------------+---------------+
-| software     | description   |
-+--------------+---------------+
-| flash_attn   | not supported |
-+--------------+---------------+
-| liger-kernel | not supported |
-+--------------+---------------+
-
-精度对比
-------
-
-根据经验，对于SFT等微调算法，我们期望在相同配置下，在华为昇腾设备上的 Loss 与英伟达 GPU 的 Loss 平均绝对误差小于等于 2%，具体计算方式如下：
-
-.. image:: https://github.com/eric-haibin-lin/verl-community/blob/main/docs/loss_comparison.png?raw=true
-   :alt: loss_comparison
-
-其中，N 表示训练的步数。更多信息请参考 `精度计算说明 <https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html>`_。
-
-根据经验，对于GRPO等强化学习算法，我们期望在相同配置下，在华为昇腾设备上的 reward 与英伟达 GPU 的 reward 平均绝对误差小于等于 4%，具体计算参考 Loss 计算。
-
-进展
-------
-
-+-----------+-------------+
-| algorithm | description |
-+-----------+-------------+
-|    SFT    |  supported  |
-+-----------+-------------+
-|    GRPO   |  supported  |
-+-----------+-------------+
diff --git a/docs/ascend_tutorial/ascend_quick_start.rst b/docs/ascend_tutorial/ascend_quick_start.rst
new file mode 100644
index 00000000000..f65f427ff09
--- /dev/null
+++ b/docs/ascend_tutorial/ascend_quick_start.rst
@@ -0,0 +1,183 @@
+verl x Ascend
+===================================
+
+
+我们在 verl 上增加对华为昇腾设备的支持。
+
+硬件支持
+-----------------------------------
+
+Atlas 200T A2 Box16
+
+Atlas 800T A2
+
+
+安装
+-----------------------------------
+
+基础环境准备
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
++-----------+-------------+
+| software  | version     |
++-----------+-------------+
+| Python    | == 3.10     |
++-----------+-------------+
+| CANN      | == 8.1.RC1  |
++-----------+-------------+
+| torch     | == 2.5.1    |
++-----------+-------------+
+| torch_npu | == 2.5.1.RC1|
++-----------+-------------+
+
+
+vllm & vllm-ascend
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+为了能够在 verl 中正常使用 vllm，需使用以下命令编译安装 vllm 和 vllm-ascend。请注意根据机器类型区分安装方式。
+
+.. code-block:: bash
+    
+    # vllm
+    git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm.git
+    cd vllm
+    pip install -r requirements-build.txt
+
+    # for Atlas 200T A2 Box16
+    VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
+    
+    # for Atlas 800T A2
+    VLLM_TARGET_DEVICE=empty pip install -e .
+
+.. code-block:: bash
+    
+    # vllm-ascend
+    git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm-ascend.git
+    cd vllm-ascend
+    export COMPILE_CUSTOM_KERNELS=1
+    python setup.py install
+
+安装verl
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+    git clone https://github.com/volcengine/verl.git
+    cd verl
+    pip install -r requirements-npu.txt
+    pip install -e .
+
+其他三方库说明
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
++--------------+---------------+
+| software     | description   |
++--------------+---------------+
+| transformers | >= v4.52.0    |
++--------------+---------------+
+| flash_attn   | not supported |
++--------------+---------------+
+| liger-kernel | not supported |
++--------------+---------------+
+
+1. 支持通过 transformers 使能 --flash_attention_2， transformers 需大于等于 4.52.0版本。
+2. 不支持通过 flash_attn 使能 flash attention 加速。
+3. 不支持 liger-kernel 使能。
+
+
+快速开始
+-----------------------------------
+正式使用前，建议您通过对Qwen2.5-0.5B GRPO的训练尝试以检验环境准备和安装的正确性。
+
+.. code-block:: bash
+
+    set -x
+
+    export VLLM_ATTENTION_BACKEND=XFORMERS
+
+    python3 -m verl.trainer.main_ppo \
+        algorithm.adv_estimator=grpo \
+        data.train_files=$HOME/data/gsm8k/train.parquet \
+        data.val_files=$HOME/data/gsm8k/test.parquet \
+        data.train_batch_size=128 \
+        data.max_prompt_length=512 \
+        data.max_response_length=128 \
+        data.filter_overlong_prompts=True \
+        data.truncation='error' \
+        actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
+        actor_rollout_ref.actor.optim.lr=5e-7 \
+        actor_rollout_ref.model.use_remove_padding=False \
+        actor_rollout_ref.actor.ppo_mini_batch_size=64 \
+        actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=20 \
+        actor_rollout_ref.actor.use_kl_loss=True \
+        actor_rollout_ref.actor.kl_loss_coef=0.001 \
+        actor_rollout_ref.actor.kl_loss_type=low_var_kl \
+        actor_rollout_ref.model.enable_gradient_checkpointing=True \
+        actor_rollout_ref.actor.fsdp_config.param_offload=False \
+        actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
+        actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \
+        actor_rollout_ref.rollout.enable_chunked_prefill=False \
+        actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
+        actor_rollout_ref.rollout.name=vllm \
+        actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
+        actor_rollout_ref.rollout.n=5 \
+        actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \
+        actor_rollout_ref.ref.fsdp_config.param_offload=True \
+        algorithm.kl_ctrl.kl_coef=0.001 \
+        trainer.critic_warmup=0 \
+        trainer.logger=['console'] \
+        trainer.project_name='verl_grpo_example_gsm8k' \
+        trainer.experiment_name='qwen2_7b_function_rm' \
+        trainer.n_gpus_per_node=8 \
+        trainer.nnodes=1 \
+        trainer.save_freq=-1 \
+        trainer.test_freq=5 \
+        trainer.total_epochs=1 \
+        trainer.device=npu $@
+
+
+支持现状
+-----------------------------------
+
++-----------+----------------------+-------------+-------------------+----------------------+
+| algorithm |         model        | rewards mae |  throughput ratio |        hardware      |
++-----------+----------------------+-------------+-------------------+----------------------+
+|   GRPO    | Qwen2.5-7B-instruct  |    0.38%    |        0.588      |  Atlas 200T A2 Box16 |
++-----------+----------------------+-------------+-------------------+----------------------+
+|   GRPO    | Qwen2.5-32B-instruct |    0.30%    |        0.685      |  Atlas 200T A2 Box16 |
++-----------+----------------------+-------------+-------------------+----------------------+
+
+目前支持 Qwen2.5 的 GRPO 训练，Qwen2.5-VL GRPO 训练在 vllm-ascend 的修复后支持，涉及到的issue为：
+
+1. `issues#809 <https://github.com/vllm-project/vllm-ascend/issues/809>`_
+
+2. `issues#825 <https://github.com/vllm-project/vllm-ascend/issues/825>`_
+
+
+精度对比说明
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+对于 SFT 类算法，我们期望在相同配置下华为昇腾设备与 A100 的 loss 平均绝对误差<= 2%。计算方式如下图。更多信息请参考 `精度计算说明 <https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html>`_。
+
+.. image:: https://github.com/eric-haibin-lin/verl-community/blob/main/docs/loss_comparison.png?raw=true
+   :alt: loss_comparison
+
+根据经验，对于 GRPO 等 RL 类算法，我们期望在相同配置下华为昇腾设备与 A100 的 rewards 平均绝对误差<= 4%，计算方式参考上图。
+
+
+吞吐对比说明
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Ascend npu 和 A100 分别取日志中前4个 step 的 "perf/throughput" 做平均， throughput ratio = npu 平均值 / A100 平均值。 
+
+
+
+计划
+-----------------------------------
+
+查看 `roadmap <https://github.com/volcengine/verl/discussions/900>`_ 获取更多特性的支持进度。
+
+
+
+声明
+-----------------------------------
+verl中提供的ascend支持代码皆为参考样例，商业使用请通过官方正式途径沟通，谢谢。
\ No newline at end of file
diff --git a/tests/npu/run_qwen2_5_05b_grpo.sh b/tests/npu/run_qwen2_5_05b_grpo.sh
index 6ccaf7b4379..d54102b7506 100644
--- a/tests/npu/run_qwen2_5_05b_grpo.sh
+++ b/tests/npu/run_qwen2_5_05b_grpo.sh
@@ -12,7 +12,7 @@ python3 -m verl.trainer.main_ppo \
     data.filter_overlong_prompts=True \
     data.truncation='error' \
     actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
-    actor_rollout_ref.actor.optim.lr=1e-6 \
+    actor_rollout_ref.actor.optim.lr=5e-7 \
     actor_rollout_ref.model.use_remove_padding=False \
     actor_rollout_ref.actor.ppo_mini_batch_size=64 \
     actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=20 \