[recipe,sglang] feat: add Truncated importance sampling + sglang recipe#4462
[recipe,sglang] feat: add Truncated importance sampling + sglang recipe#4462eternally-z wants to merge 5 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new training recipe for FlashRL with SGLang. The main addition is the run_flashrl_sglang.sh script. My review focuses on improving the robustness of this script. I've identified a critical issue in the Ray cluster startup logic that could lead to silent failures and hard-to-debug problems. I've also pointed out a potential issue with an unquoted command-line argument that could affect script portability across different shell environments. The proposed changes will make the script more reliable and prevent unexpected behavior.
| echo "Ray server is not running, starting new Ray cluster..." | ||
| # Start new Ray cluster with 8 GPUs | ||
| ray start --head \ | ||
| --port=${RAY_PORT} \ | ||
| --dashboard-port=${RAY_DASHBOARD_PORT} \ | ||
| --num-gpus=8 \ | ||
| --dashboard-host=0.0.0.0 \ | ||
| --temp-dir=${RAY_TEMP_DIR} \ | ||
| --disable-usage-stats || true | ||
| echo "Ray cluster started successfully on localhost:${RAY_PORT}" | ||
| # Wait a moment for Ray to fully initialize | ||
| echo "Waiting for Ray cluster to be ready..." | ||
| sleep 5 | ||
| # Verify Ray is actually running | ||
| if ray status --address="localhost:${RAY_PORT}" >/dev/null 2>&1; then | ||
| echo "Ray cluster is ready and accessible." | ||
| RAY_RUNNING=true | ||
| else | ||
| echo "Warning: Ray cluster may not be fully ready yet, but continuing..." | ||
| RAY_RUNNING=false | ||
| fi |
There was a problem hiding this comment.
The Ray cluster startup logic is not robust and could lead to hard-to-debug failures:
- Error Suppression: The
|| trueon line 45 suppresses the exit code fromray start. Ifray startfails, the script will continue, causing later commands to fail unexpectedly. - Unreliable Wait: Using a fixed
sleep 5is not a reliable way to wait for the cluster to initialize. - Ignoring Failures: The script continues even if the cluster is not ready. The
RAY_RUNNINGvariable is set but never used, so the script will attempt to run the Python application regardless of the cluster's status.
The script should fail fast if the Ray cluster cannot be started and should reliably wait for it to be ready.
echo "Ray server is not running, starting new Ray cluster..."
# Start new Ray cluster with 8 GPUs. The script will exit if this fails due to `set -e`.
ray start --head \
--port=${RAY_PORT} \
--dashboard-port=${RAY_DASHBOARD_PORT} \
--num-gpus=8 \
--dashboard-host=0.0.0.0 \
--temp-dir=${RAY_TEMP_DIR} \
--disable-usage-stats
# Wait for Ray to be ready with a timeout
echo "Waiting for Ray cluster to be ready..."
for i in {1..15}; do
if ray status --address="localhost:${RAY_PORT}" >/dev/null 2>&1; then
echo "Ray cluster is ready and accessible."
break
fi
if [[ $i -eq 15 ]]; then
echo "Error: Timed out waiting for Ray cluster to start on localhost:${RAY_PORT}." >&2
exit 1
fi
sleep 2
done| reward_model.overlong_buffer.enable=True \ | ||
| reward_model.overlong_buffer.len=4096 \ | ||
| reward_model.overlong_buffer.penalty_factor=1.0 \ | ||
| trainer.logger=['console'] \ |
There was a problem hiding this comment.
Co-authored-by: AniZpZ <aniz1905@gmail.com> Co-authored-by: Wilboludriver <wilbolu@outlook.com>
e36aab2 to
027bde5
Compare
|
Better move the script to |
Do you implement sglang fp8 rollout? We already have an implementation in #4415 |
|
@eternally-z please take a look on https://verl.readthedocs.io/en/latest/algo/rollout_corr.html |
Thanks for catching this. The API usage error was due to my local environment running an outdated version of verl. I have updated the code to match the latest API and pushed the fix. |
we have a pr that adds a patch in sglang that apply flash-rl in sglang quantization which supporting various FP8 quantization granularities, while remaining completely non-intrusive to verl |
Thanks for the suggestion.The script has been moved to examples/rollout_correction/ . |
6b4fe51 to
448b410
Compare
What does this PR do?
This PR introduces a recipe for Truncated Importance Sampling (TIS) rollout using SGLang as the inference engine.
Previous TIS configurations and experiments primarily used vLLM as the inference engine. For the detailed principles regarding TIS, please refer to PR #2953. We present the first TIS rollout recipe using SGLang and provide the experimental results below.
We added a patch in SGLang that applies flash-rl to quantization, supporting various FP8 granularities.Please refer to sglang PR 9650 & 15440(implementation) and 14870(logic abstraction).
Configuration
Experiments results
@Wilboludriver @AniZpZ






Observations
Accuracy of Quantization: Increasing val scores and reasonable response logs confirm (not shown here) that training precision remains intact under Blockwise FP8 Rollout. In contrast, Per-channel FP8 quantization leads to significant precision degradation during the generation phase.
Effects on Training: With the same number of training steps, FP8 Rollout yields longer response lengths and consistently higher AIME-2024 val scores compared to BF16. This performance boost might be attributed to noise introduced by FP8; further investigation using mismatch metrics is required.
Gen. Throughput: FP8 Rollout initially exhibits higher throughput than BF16 but is overtaken in later stages. Profiling is currently underway to identify the bottleneck.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)