-
Notifications
You must be signed in to change notification settings - Fork 442
feat: KV cache quantization support in fp8 rollout in GRPO #1212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 40 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
4c6b67b
kv-cache: prepare clean commit without excluded files
312d204
kv cache fp8 code refine and cleanup
e2cbca8
Fix indentiation error. Enable using environment variables to set FP8…
c92dc55
Correct typos
ad09785
Update fp8.py
dca01e2
Remove dapo.py from response_datasets
03e47a9
Update sanity check in grpo.py. Remove redundant code in megatron bac…
96955de
Remove redundant comments
b1214cd
Remove _hook_builder
be8853d
rebase and update refitting process
zpqiu 4ae3ec0
fix refitting bugs after rebase
zpqiu d66c262
Refactor FP8 KV cache scale handling by centralizing vLLM parameter n…
bc26b40
lint check
zpqiu 7f709b4
Update to correct BF16 issue with load_weights
964d929
WIP: changes before rebase
ede399c
Code draft to support dynamic kv scales calculation
ff455d6
Refit should only takes care of kv_scales when kv_cache_dtype is fp8 …
ac2b5ed
lint check
zpqiu 9362035
remove debug prints
zpqiu f35662d
make refitting with kv scales cleaner
zpqiu 667661e
remove debug print; raise errors of calibration process; refine refit…
zpqiu b84b10e
remove old hotfix about save_ckpt
zpqiu 50c4abd
avoid importing vllm at grpo.py
zpqiu 0dbf7ab
add placeholder func and parameter for dtensor path
zpqiu 0ea586b
Refit should take care of kv_scales in the validation phase
sharonyu-115 8f759a1
Remote TODO comment
sharonyu-115 ea3e500
Merge branch 'main' into kv-cache-fp8
guyueh1 6f3bed7
Rename the example yaml file to grpo_math_qwen3_8B_fp8_kvcache.yaml a…
sharonyu-115 4089ab5
Add kv_cache fp8 test case to test_vllm_generation_with_megatron_trai…
sharonyu-115 ac6f66c
update pp>1 assert info
zpqiu af60c9a
update guard statements in DTensor path files
zpqiu f150419
Merge branch 'main' into kv-cache-fp8
zpqiu 3a37119
add l1 test; update config yaml
zpqiu 231a739
at first calibration align with training data processing to ensure pa…
zpqiu 4f1324a
remove l1 test; upload missed recipe yaml
zpqiu 47ea0c0
Merge branch 'main' into kv-cache-fp8
zpqiu 94d16ec
resolve fp8 patch conflicts
zpqiu 48b20aa
add nightly test
zpqiu 603c366
increase gpu hours for new nightly test
zpqiu 7ca82f3
allow a larger logprob tolerance
zpqiu b34ad76
update kv_cache_dtype with choices
zpqiu 40fa1ac
add default kv_cache_dtype; update checking logic code
zpqiu 6d65466
Merge branch 'main' into kv-cache-fp8
zpqiu db3ea88
add requires_kv_scale_sync property to GenerationInterface
zpqiu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
49 changes: 49 additions & 0 deletions
49
examples/configs/recipes/llm/grpo-qwen3-8b-base-1n8g-fp8-kvcache-megatron.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| defaults: ../../grpo_math_1B.yaml | ||
| grpo: | ||
| val_period: 20 | ||
| checkpointing: | ||
| enabled: false | ||
| checkpoint_dir: results/grpo_qwen3_8b_fp8_kvcache | ||
| loss_fn: | ||
| use_importance_sampling_correction: true | ||
| policy: | ||
| model_name: Qwen/Qwen3-8B-Base | ||
| train_micro_batch_size: 1 | ||
| logprob_batch_size: 1 | ||
| max_total_sequence_length: 8192 | ||
| dtensor_cfg: | ||
| enabled: false | ||
| optimizer: null | ||
| scheduler: null | ||
| megatron_cfg: | ||
| enabled: true | ||
| converter_type: Qwen3ForCausalLM | ||
| tensor_model_parallel_size: 4 | ||
| optimizer: | ||
| lr: 1.0e-06 | ||
| min_lr: 1.0e-06 | ||
| weight_decay: 0.1 | ||
| use_precision_aware_optimizer: false | ||
| scheduler: | ||
| lr_decay_iters: null | ||
| lr_warmup_iters: 10 | ||
| lr_warmup_init: 1.0e-07 | ||
| make_sequence_length_divisible_by: ${mul:${policy.megatron_cfg.tensor_model_parallel_size}, | ||
| 2} | ||
| generation: | ||
| vllm_cfg: | ||
| precision: fp8 | ||
| kv_cache_dtype: fp8 | ||
| use_deep_gemm: true | ||
| data: | ||
| max_input_seq_length: 2048 | ||
| prompt_file: null | ||
| dataset_name: DAPOMath17K | ||
| env: | ||
| dapo: | ||
| num_workers: 16 | ||
| math: | ||
| num_workers: 16 | ||
| math_verify_impl: dapo_math_verify | ||
| cluster: | ||
| gpus_per_node: 8 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.