Skip to content

fix: Fix fp8 after vllm v0.11.2 bump#1660

Merged
terrykong merged 3 commits intoNVIDIA-NeMo:mainfrom
guyueh1:fix_fp8_for_vllm_v0.5
Dec 20, 2025
Merged

fix: Fix fp8 after vllm v0.11.2 bump#1660
terrykong merged 3 commits intoNVIDIA-NeMo:mainfrom
guyueh1:fix_fp8_for_vllm_v0.5

Conversation

@guyueh1
Copy link
Contributor

@guyueh1 guyueh1 commented Dec 19, 2025

What does this PR do ?

Fix FP8 patches after vllm bump to v0.11.2

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added new configuration profiles for DeepSeek v3, Llama 3.1, and Qwen3 models with optimized settings for various cluster sizes
    • Enhanced FP8 quantization support with DeepGEMM optimization for improved inference performance
  • Improvements

    • Optimized weight handling and MoE inference processing for better compatibility with multiple backends

✏️ Tip: You can customize this high-level summary in your review settings.

@guyueh1 guyueh1 requested review from a team as code owners December 19, 2025 00:07
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

📝 Walkthrough

Walkthrough

Multiple new LLM performance configuration files added for GRPO variants across different cluster scales, one existing config simplified by removing sequence packing settings, and significant refactoring of FP8 weight post-processing logic in the generation module to support DeepGEMM optimization and conditional MoE backend handling.

Changes

Cohort / File(s) Summary
Configuration updates—new performance tuning files
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml, grpo-deepseek-v3-64n4g-async-1off.yaml, grpo-deepseek-v3-64n8g-fp8-async-1off.yaml, grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml, grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml, grpo-qwen3-235b-16n4g.yaml, grpo-qwen3-235b-32n4g-async-1off.yaml
New YAML configuration files for GRPO recipe variants specifying cluster topology (GPU per node, node counts), checkpoint/log directories, Megatron parallelism settings, VLLM generation parameters, and FP8 quantization settings where applicable.
Configuration cleanup
examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml
Removed sequence_packing configuration (modified_ffd algorithm) and expert_parallel_size settings from policy and generation sections.
FP8 weight post-processing refactor
nemo_rl/models/generation/fp8.py
Unconditional FP8 KV cache patching, in-place weight updates via copy_() instead of direct assignment, replaced rocm_aiter_moe flag with rocm_aiter_ops.is_fused_moe_enabled(), added conditional flashinfer_moe_backend handling with weight swapping, integrated deepgemm_post_process_fp8_weight_block() for unified weight post-processing, and updated parameter copying logic for processed weights.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • nemo_rl/models/generation/fp8.py: Complex refactoring with multiple conditional branches for MoE backends (flashinfer vs. non-flashinfer), weight swapping logic, and DeepGEMM integration—requires careful verification of in-place operations and parameter tensor updates.
  • Configuration files: Largely repetitive and straightforward; verify parameter consistency across variants (e.g., pipeline_model_parallel_size, tensor_parallel_size, layer splits, and FP8 settings).

Possibly related PRs

Suggested labels

Low Precision, CI:L2

Suggested reviewers

  • terrykong
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning PR contains major FP8 handling changes affecting numerical accuracy and convergence but lacks test results, performance metrics, or regression validation in its description. Include test results comparing FP8 outputs before/after vLLM bump, convergence curves from training runs, performance metrics, and configuration validation results.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change as fixing FP8 compatibility after a vllm v0.11.2 dependency bump, which aligns with the core code change in nemo_rl/models/generation/fp8.py and multiple configuration updates.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48dbb37 and 92421fe.

📒 Files selected for processing (9)
  • examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml (0 hunks)
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml (1 hunks)
  • nemo_rl/models/generation/fp8.py (2 hunks)
💤 Files with no reviewable changes (1)
  • examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml
🧰 Additional context used
📓 Path-based instructions (5)
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
  • nemo_rl/models/generation/fp8.py
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

  • nemo_rl/models/generation/fp8.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

  • nemo_rl/models/generation/fp8.py
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • nemo_rl/models/generation/fp8.py
🧠 Learnings (5)
📓 Common learnings
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes

Applied to files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
📚 Learning: 2025-09-18T14:20:36.297Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:113-120
Timestamp: 2025-09-18T14:20:36.297Z
Learning: In distillation workflows, the teacher policy does not perform generation - it only does inference/logprob computation on sequences generated by the student policy. Therefore, teacher generation configuration mismatches (like vLLM tensor parallelism settings) and colocation concerns are not relevant.

Applied to files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : Recipe YAML files should follow the naming pattern: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml for VLM recipes

Applied to files:

  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Applied to files:

  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (14)
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml (4)

1-3: ✓ Configuration inheritance and checkpoint naming are consistent.

The file correctly references the base 16n8g configuration and names the checkpoint directory to match the variant.


7-10: Verify pipeline layer distribution for uniform load balancing.

The configuration specifies num_layers_in_first_pipeline_stage: 23 and num_layers_in_last_pipeline_stage: 23, but does not specify the distribution for intermediate stages. With pipeline_model_parallel_size: 4, there are two intermediate stages whose layer counts are undefined. This could result in uneven computation distribution across pipeline stages.

Verify that the layer counts across all four pipeline stages sum correctly and are appropriately balanced for the 235B model.


11-13: Clarify FP8 configuration alignment with PR purpose.

The PR description states the goal is to "Fix FP8 patches after vllm bump to v0.11.2," but this configuration file does not include FP8-specific quantization settings (e.g., weights_dtype, quantization_type, or use_fp8_kv_cache).

Verify whether FP8 configuration is inherited from the base config (grpo-qwen3-235b-16n8g.yaml), applied globally via defaults, or should be explicitly set here for the vllm v0.11.2 compatibility fix.


14-20: ✓ Logger and cluster configuration are consistent and well-organized.

The naming convention is consistently applied across checkpoint, log, and wandb directories, and the cluster definition (16 nodes, 4 GPUs per node) correctly represents the target infrastructure.

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml (3)

1-20: LGTM! Well-structured FP8 configuration recipe.

The file follows the correct naming convention and is properly located. The configuration cleanly extends the base async recipe with FP8-specific overrides for both training (Megatron) and generation (vLLM) components, with consistent naming across checkpoint, log, and wandb paths.


1-1: Base configuration file exists and is correctly referenced.

The file ./grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml exists in the same directory and is properly referenced by the relative path in defaults.


6-12: FP8 configuration is correct.

The blockwise FP8 recipe with e4m3 format is well-documented in NVIDIA Transformer Engine and appropriate for training. The NVTE_FP8_BLOCK_SCALING_FP32_SCALES=1 environment variable relaxes the default constraint that scales be powers of 2, which is a standard setting. Float8BlockScaling uses block-wise scaling for FP8 tensors where values within each block share a common scaling factor, and blockwise is a valid fp8_recipe choice in training configurations.

examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml (1)

1-20: LGTM!

Configuration follows the naming conventions and the parallelism math is consistent: 128 GPUs (32×4) with TP=8 and PP=4 yields 4 data-parallel replicas. Note that TP=8 with 4 GPUs per node implies tensor parallelism spans across 2 nodes, which is valid for large models but relies on high-speed inter-node interconnects. Based on learnings, the recipe YAML naming pattern is correctly applied.

nemo_rl/models/generation/fp8.py (3)

592-600: Correct use of in-place copy for weight scale updates.

Using copy_() preserves the existing torch.nn.Parameter object and its weight_loader attribute, which is required for model refit functionality. This is consistent with the approach used in maybe_post_process_fp8_weight_block.


626-659: LGTM! Clean unified weight post-processing logic.

The refactored logic correctly handles all code paths:

  1. Flashinfer backend (swap w13 to w31) vs. default (use original)
  2. DeepGEMM enabled (apply post-processing) vs. disabled (skip)
  3. Always copy processed weights back using copy_() to preserve Parameters

The unconditional final copy ensures both modified and unmodified weight paths correctly update the layer's parameters while maintaining the weight_loader attribute for refit.


610-621: Verify AITER enablement approach for vLLM v0.11.2 compatibility.

The reference to rocm_aiter_ops.is_fused_moe_enabled() appears inconsistent with vLLM v0.11.2's AITER API design. vLLM uses environment variable switches (VLLM_ROCM_USE_AITER as master switch and VLLM_ROCM_USE_AITER_MOE for specific features) to control AITER kernel selection rather than runtime method calls. Confirm whether the code should query environment variables or if the import path and method signature are correct for the target vLLM version.

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1)

12-13: LGTM: Pipeline parallelism made explicit.

The addition explicitly sets pipeline parallelism to 1, making the configuration clearer and more maintainable.

examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml (1)

1-21: LGTM: Clean configuration with consistent naming.

The configuration properly scales down to 32 nodes × 4 GPUs with appropriate parallelism settings. The log directory and WandB names correctly match the cluster configuration.

examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml (1)

6-17: Both FP8 settings are correctly configured and compatible with this project's vLLM integration.

The FP8 configuration in the file correctly implements end-to-end FP8 for both training and generation. Verification confirms:

  1. vllm_cfg.precision: "fp8" - This parameter is correctly named and actively used in the codebase. The value is validated in nemo_rl/models/generation/fp8.py (line 243) where it checks if vllm_cfg.get("precision") == "fp8" to enable FP8 weight quantization.

  2. vllm_cfg.use_deep_gemm: true - This parameter is correctly spelled and implemented. It's used in nemo_rl/models/generation/fp8.py (line 271) where it sets the environment variables VLLM_USE_DEEP_GEMM="1" and VLLM_USE_DEEP_GEMM_E8M0="0" to enable Deep GEMM optimization for FP8 operations.

Both parameters are used consistently across multiple FP8 configuration files in the repository and are validated against the project's implementation, not the upstream vLLM v0.11.2 API directly (these are custom vllm_cfg parameters specific to NeMo RL's integration layer).

@guyueh1 guyueh1 changed the title fix: Fix fp8 after vllm v0.11.2 bump fix: [Draft] Fix fp8 after vllm v0.11.2 bump Dec 19, 2025
@guyueh1
Copy link
Contributor Author

guyueh1 commented Dec 19, 2025

Draft for now because I messed up with rebase, will update

@guyueh1 guyueh1 requested review from a team as code owners December 19, 2025 18:49
Signed-off-by: Guyue Huang <guyueh@nvidia.com>

fix fp8 for vllm v0.5

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Revert "Perf recipe for v0.5"

This reverts commit c35d0f4.

Fix Fp8 sequence padding for PP>1 case

Signed-off-by: root <root@pool0-00514.cm.cluster>

fix patching for MP>1

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 force-pushed the fix_fp8_for_vllm_v0.5 branch from b9b3bad to 8d82ada Compare December 19, 2025 18:52
@guyueh1 guyueh1 changed the title fix: [Draft] Fix fp8 after vllm v0.11.2 bump fix: Fix fp8 after vllm v0.11.2 bump Dec 19, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 self-assigned this Dec 20, 2025
@guyueh1 guyueh1 added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Dec 20, 2025
@guyueh1
Copy link
Contributor Author

guyueh1 commented Dec 20, 2025

@terrykong this is ready but still no coverage on the FP8 code part

@terrykong terrykong enabled auto-merge (squash) December 20, 2025 17:25
@terrykong terrykong merged commit b238e41 into NVIDIA-NeMo:main Dec 20, 2025
40 of 42 checks passed
chtruong814 pushed a commit that referenced this pull request Dec 20, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
parthmannan pushed a commit to parthmannan/RL that referenced this pull request Jan 15, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests r0.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants