Conversation
Signed-off-by: Malay Nagda <malayn@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
📝 WalkthroughWalkthroughIntroduces Kimi-K2 pretraining configuration infrastructure across multiple modules: base workload configuration presets for GB300, GB200, B200, and H100 GPUs with multiple precision variants, factory functions to assemble GPU-specific pretrain configurations, conditional exports through the package interface, and updates to override and pipeline layout handling logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@scripts/performance/configs/kimi/__init__.py`:
- Around line 1-6: Add the required NVIDIA copyright header to the top of this
Python module (before any imports or code). Update the file that defines
HAVE_MEGATRON_BRIDGE and imports megatron.bridge so the header precedes the
try/except block and remains intact; do not remove or modify the existing logic
that sets HAVE_MEGATRON_BRIDGE or the import of megatron.bridge.
In `@scripts/performance/configs/kimi/kimi_workload_base_configs.py`:
- Line 91: The constant name KIMI_K2_PRETRAIN_CONFIG_H100_FP8_SC appears
inconsistent with the FP8_CS suffix used for other GPUs (e.g.,
GB300/GB200/B200); if "SC" is a typo, rename KIMI_K2_PRETRAIN_CONFIG_H100_FP8_SC
to KIMI_K2_PRETRAIN_CONFIG_H100_FP8_CS to match the convention and update any
references; if "SC" is an intentional different precision variant, add a
clarifying comment next to the definition of KIMI_K2_PRETRAIN_CONFIG_H100_FP8_SC
explaining how SC differs from CS and ensure any related tests/config uses the
correct symbol.
- Around line 78-91: KIMI_K2_PRETRAIN_CONFIG_H100 sets
pipeline_model_parallel_size=16 and virtual_pipeline_model_parallel_size=2 which
has no corresponding entry in _get_kimi_k2_pipeline_layout (called during
kimi_k2_pretrain_config_h100 construction) and will raise ValueError; fix by
either adding a (16, 2) layout entry to the layout map inside
_get_kimi_k2_pipeline_layout in kimi_k2.py or change
KIMI_K2_PRETRAIN_CONFIG_H100 to use a valid VP (e.g., set
virtual_pipeline_model_parallel_size=1 to match the existing (16,1) layout) and
update the related aliases (KIMI_K2_PRETRAIN_CONFIG_H100_BF16 / _FP8_CS /
_FP8_SC) if needed.
In `@scripts/performance/utils/overrides.py`:
- Around line 398-409: When model_recipe_name == "kimi_k2" we must avoid passing
sentinel pp_size (None) or vp_size (-1) into _get_kimi_k2_pipeline_layout;
instead derive effective sizes from the recipe's current layout: read
current_layout = recipe.model.pipeline_model_parallel_layout and compute
effective_pp = pp_size if pp_size is not None else (current_layout[0] if
current_layout else 1) and effective_vp = vp_size if vp_size != -1 else
(current_layout[1] if current_layout else 1), then call
_get_kimi_k2_pipeline_layout(effective_pp, effective_vp) and assign
recipe.model.pipeline_model_parallel_layout accordingly (keeping the existing
exception handling and preserving any explicit pipeline_model_parallel_layout
override afterwards).
🧹 Nitpick comments (10)
src/megatron/bridge/recipes/kimi/kimi_k2.py (3)
25-25: Type hint does not reflect thatvp_sizecan beNone.Line 37 explicitly handles
vp_size is None, but the signature declaresvp_size: int. Update the type hint per coding guidelines (int | Noneinstead ofOptional[int]). As per coding guidelines, "Use 'T | None' for nullable types instead of 'Optional[T]'".Proposed fix
-def _get_kimi_k2_pipeline_layout(pp_size: int, vp_size: int): +def _get_kimi_k2_pipeline_layout(pp_size: int, vp_size: int | None):
44-45: Redundant outerlist()call.
[list(x) for x in layout]already produces alist; wrapping it inlist(...)is unnecessary.Proposed fix
- layout = list([list(x) for x in layout]) + layout = [list(x) for x in layout]
88-88: Line likely exceeds 119-character limit.The inline comment on
cfg.dataset.blendmakes this line very long. Consider moving the usage example into the docstring or a standalone comment above the assignment. As per coding guidelines, "Maximum line length is 119 characters (matching ruff configuration)".scripts/performance/utils/overrides.py (1)
22-22: Importing a private (_-prefixed) function across package boundaries.
_get_kimi_k2_pipeline_layoutis conventionally private to its module. If it's intended to be used by the performance scripts, consider either making it public (removing the_prefix) or providing a public wrapper. This isn't blocking, just a convention note.scripts/performance/configs/kimi/kimi_llm_pretrain.py (4)
15-23: Import ordering violates coding guidelines.Per guidelines, first-party imports (
megatron.bridge.*) should come before local folder imports (utils.*). Swap the groups.Proposed fix
import logging -from utils.overrides import set_workload_base_configs -from utils.precision import get_precision_config -from utils.utils import get_workload_base_config - from megatron.bridge.recipes.kimi.kimi_k2 import _get_kimi_k2_pipeline_layout from megatron.bridge.recipes.kimi.kimi_k2 import kimi_k2_pretrain_config as pretrain_config from megatron.bridge.training.config import ConfigContainer + +from utils.overrides import set_workload_base_configs +from utils.precision import get_precision_config +from utils.utils import get_workload_base_configAs per coding guidelines, "Organize imports in order: future imports, standard library, third-party (including megatron.core, torch, transformers), first-party (megatron.bridge.*), local folder imports, separated by blank lines".
45-84: Unusedmockparameter in all four factory functions.The
mockparameter is declared but never used (confirmed by static analysis ARG001). Either remove it or use it to conditionally configure mock data. If it's a placeholder for future use, add a brief comment or prefix with_.Proposed fix (if not needed)
def kimi_k2_pretrain_config_gb300( - precision: str = "bf16", mock: bool = True, config_variant: str = "v1" + precision: str = "bf16", config_variant: str = "v1" ) -> ConfigContainer:Also applies to: 87-126, 129-163, 166-201
45-201: Significant code duplication across four GPU config factories.The four functions are nearly identical — only the GPU name,
overlap_grad_reduce, andnum_workers/pin_memorydiffer. Consider extracting a common helper:Sketch of a refactored approach
def _kimi_k2_pretrain_config( gpu: str, precision: str = "bf16", config_variant: str = "v1", overlap_grad_reduce: bool = True, num_workers: int | None = None, pin_memory: bool | None = None, ) -> ConfigContainer: base_cfg = get_workload_base_config( model_family_name="kimi", model_recipe_name="kimi_k2", gpu=gpu, compute_dtype=precision.upper(), task="pretrain", config_variant=config_variant, ) cfg = pretrain_config() cfg.mixed_precision = get_precision_config(precision) if base_cfg.moe_flex_dispatcher_backend is not None: cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend if base_cfg.pp_layout: cfg.model.pipeline_model_parallel_layout = base_cfg.pp_layout else: pp_size = base_cfg.pipeline_model_parallel_size vp_size = base_cfg.virtual_pipeline_model_parallel_size cfg.model.pipeline_model_parallel_layout = _get_kimi_k2_pipeline_layout(pp_size, vp_size) set_kimi_k2_common_configs(cfg) set_workload_base_configs(cfg, base_cfg) cfg.comm_overlap.overlap_grad_reduce = overlap_grad_reduce if num_workers is not None: cfg.dataset.num_workers = num_workers if pin_memory is not None: cfg.dataset.pin_memory = pin_memory return cfg
29-43:set_kimi_k2_common_configshas no return value despite modifyingcfgin place.The function signature declares
-> Noneand mutatescfg, which is fine. However, some callers (like the factory functions) don't document that common configs override the recipe'sgrad_reduce_in_fp32=TruetoFalse(lines 38-39). This is a meaningful behavioral change from the recipe default. A brief docstring note would help future maintainers understand the intent.scripts/performance/configs/kimi/kimi_workload_base_configs.py (1)
40-43: Precision aliases are reference aliases, not distinct configs.All
_BF16,_FP8_CS, etc. variants point to the exact sameWorkloadBaseConfigobject (e.g.,KIMI_K2_PRETRAIN_CONFIG_GB300_BF16 = KIMI_K2_PRETRAIN_CONFIG_GB300). Mutating one would mutate all. SinceWorkloadBaseConfigis a frozen dataclass (usesreplace), this is likely safe, but consider adding a brief comment explaining that precision is applied at a different layer.Also applies to: 59-61, 73-75, 89-91
scripts/performance/configs/kimi/__init__.py (1)
8-14: Add# noqa: F401to suppress false-positive "imported but unused" warnings on re-exports.Flake8 flags these imports as unused (F401), but they are intentional re-exports for the package's public API. Adding
# noqa: F401will suppress the false positives.Proposed fix
if HAVE_MEGATRON_BRIDGE: - from .kimi_llm_pretrain import ( - kimi_k2_pretrain_config_b200, - kimi_k2_pretrain_config_gb200, - kimi_k2_pretrain_config_gb300, - kimi_k2_pretrain_config_h100, - ) + from .kimi_llm_pretrain import ( # noqa: F401 + kimi_k2_pretrain_config_b200, + kimi_k2_pretrain_config_gb200, + kimi_k2_pretrain_config_gb300, + kimi_k2_pretrain_config_h100, + )
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
What does this PR do ?
Add Kimi-K2 performance recipes.
Also adds updates to-
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes