Conversation
…-RL V0.3 (#1301) Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughAdds four new bullet entries to the README’s News section dated 9/30/2025, 9/27/2025, 8/15/2025, and 7/31/2025. No other files or public entities are modified. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
README.md (1)
8-11: Conform nested list indentation to markdownlint (MD007)Indent nested list items by 2 spaces (currently 4), to satisfy tooling and keep consistency.
- * Student generates on-policy sequences and aligns logits to a larger teacher via KL, achieving near-larger-model quality at lower cost than RL. See [On-policy Distillation](#on-policy-distillation). + * Student generates on-policy sequences and aligns logits to a larger teacher via KL, achieving near-larger-model quality at lower cost than RL. See [On-policy Distillation](#on-policy-distillation).As flagged by markdownlint (MD007). [Based on static analysis hints]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
README.md
9-9: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (2)
README.md (2)
7-7: LGTM on added News itemsThe FP8 Quantization, MoE optimization, and NeMo‑RL V0.3 entries look correct and well‑linked.
Also applies to: 10-11
6-11: Confirm cherry-pick scope
The PR objective lists three entries (FP8 Quantization 9/27, MoE optimization 8/15, NeMo-RL V0.3 7/31), but README also includes the GCP RL item (9/30). Was this extra entry intended for r0.4.0?
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
beep boop [🤖]: Hi @snowmanwwg 👋,
Summary by CodeRabbit