cp: `Fix LLAMA3 LoRa TFLOPs Formula (2416)` into `r0.3.0` by svcnvidia-nemo-ci · Pull Request #2533 · NVIDIA-NeMo/Megatron-Bridge

svcnvidia-nemo-ci · 2026-02-25T20:37:36Z

beep boop [🤖]: Hi @rhmukundan 👋,

we've cherry picked #2416 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

New Features
- Added LoRA model support to FLOPs calculations with specialized computation methods that separately track metrics for frozen and unfrozen model components.
- When LoRA is enabled, calculations now use optimized computation paths for improved accuracy.

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

copy-pr-bot · 2026-02-25T20:37:39Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

svcnvidia-nemo-ci · 2026-02-25T20:37:40Z

/ok to test 276b77d

coderabbitai · 2026-02-25T20:43:41Z

📝 Walkthrough

Walkthrough

This change adds LoRA-aware FLOPs calculation to the training utilities. When LoRA is detected, it bypasses the standard model-specific TFLOPS method and instead uses a specialized computation path that calculates frozen and unfrozen FLOPs separately using predefined statistics and model configuration dimensions.

Changes

Cohort / File(s)	Summary
LoRA FLOPs Detection and Calculation `src/megatron/bridge/training/utils/flop_utils.py`	Added LoRA type import and conditional branching in transformer_flops function. When LoRA is active, uses _LORA_SEQ_STATS mapping with model config parameters (hidden size, layers, heads, ffn size, vocab size) to compute frozen FLOPs (weighted 2/3) and unfrozen FLOPs (weighted 1/3) instead of invoking model-specific TFLOPS. Existing non-LoRA logic remains intact.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Fix LLAMA3 LoRa TFLOPs Formula #2416 — Directly related modification to the same FLOPs utility file implementing LoRA-specific branching logic with _LORA_SEQ_STATS mapping and model config-derived dimensions.

Suggested labels

performance, r0.3.0

Suggested reviewers

rhmukundan
guyueh1

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR lacks testing documentation and validation for LoRA FLOPs formula fix despite modifying numerically sensitive calculations.	Add unit/functional test results, before-and-after FLOPs verification for LLAMA3 LoRA configs, and address three outstanding review comments on correctness issues.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies this as a cherry-pick of a specific fix (LLAMA3 LoRA TFLOPs Formula from issue `#2416`) into a release branch (r0.3.0), which aligns with the changeset that adds LoRA-aware FLOPs calculation logic.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-2416-r0.3.0

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/training/utils/flop_utils.py`:
- Around line 204-215: The LoRA FLOPs calculation uses cfg.model.vocab_size
directly which undercounts when vocab padding is applied; compute the padded
vocab size with the existing helper (call calculate_padded_vocab_size with the
model vocab) and replace the plain vocab_size term used inside
model_flops_frozen's logits factor (the expression involving avg_tokens,
n_layers, hs, ... , 6 * vocab_size / (n_layers * hs)) so the logits TFLOPs use
the padded vocabulary size consistently with the non‑LoRA path.
- Around line 190-219: The file fails ruff formatting; run the formatter and
commit the changes: run ruff-format (or pre-commit run --all-files) on
src/megatron/bridge/training/utils/flop_utils.py, fix formatting diffs around
the is_lora block (including _LORA_SEQ_STATS, the seq_len lookup, and the
expressions computing model_flops_frozen, model_flops_unfrozen and the return
line), and re-commit the formatted file so CI no longer reports a formatting
delta.
- Around line 196-199: Replace the hard raise when seq_len is missing from
_LORA_SEQ_STATS with a graceful fallback: log a warning (use the module logger
or warnings.warn) indicating missing LoRA stats for seq_len, then fall back to
the standard transformer path by deriving reasonable defaults for avg_seqlen2
and avg_tokens (e.g., estimate avg_seqlen2 = seq_len * seq_len and avg_tokens =
seq_len, or call an existing transformer stats helper if available) instead of
raising; keep references to _LORA_SEQ_STATS, seq_len, avg_seqlen2, and
avg_tokens so the rest of the function can continue using those values.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c55b3dd and 276b77d.

📒 Files selected for processing (1)

src/megatron/bridge/training/utils/flop_utils.py

coderabbitai · 2026-02-25T20:43:44Z

src/megatron/bridge/training/utils/flop_utils.py

+        if is_lora:
+            _LORA_SEQ_STATS = {
+                4096: (842603, 4096),
+                2048: (488991, 2030),
+            }
+            seq_len = cfg.model.seq_length
+            if seq_len not in _LORA_SEQ_STATS:
+                raise ValueError(f"No LoRA stats for seq_length={seq_len}. Add it to _LORA_SEQ_STATS.")
+            avg_seqlen2, avg_tokens = _LORA_SEQ_STATS[seq_len]
+
+            hs = cfg.model.hidden_size
+            n_layers = cfg.model.num_layers
+            n_heads = cfg.model.num_attention_heads
+            ffn_hs = cfg.model.ffn_hidden_size
+            vocab_size = cfg.model.vocab_size
+
+            model_flops_frozen = (
+                avg_tokens
+                * n_layers
+                * hs**2
+                * (
+                    12
+                    + 12 * num_query_groups / n_heads
+                    + 18 * ffn_hs / hs
+                    + 6 * vocab_size / (n_layers * hs)
+                )
+            )
+            model_flops_unfrozen = n_layers * hs**2 * (12 * avg_seqlen2 / hs)
+
+            return batch_size * (model_flops_frozen * (2.0 / 3.0) + model_flops_unfrozen)


⚠️ Potential issue | 🟡 Minor

Please run formatting before merge (ruff-format).

CI already reports this file was reformatted by the pre-commit hook; please run pre-commit run --all-files and commit the formatting delta.

As per coding guidelines: "Use ruff for linting and formatting Python code".

🧰 Tools

🪛 GitHub Actions: CICD NeMo

[error] 207-214: pre-commit ruff-format hook failed: 1 file reformatted. Run 'pre-commit run --all-files' or commit again to apply formatting changes.

🪛 Ruff (0.15.2)

[warning] 197-197: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/training/utils/flop_utils.py` around lines 190 - 219, The file fails ruff formatting; run the formatter and commit the changes: run ruff-format (or pre-commit run --all-files) on src/megatron/bridge/training/utils/flop_utils.py, fix formatting diffs around the is_lora block (including _LORA_SEQ_STATS, the seq_len lookup, and the expressions computing model_flops_frozen, model_flops_unfrozen and the return line), and re-commit the formatted file so CI no longer reports a formatting delta.

coderabbitai · 2026-02-25T20:43:44Z

src/megatron/bridge/training/utils/flop_utils.py

+            if seq_len not in _LORA_SEQ_STATS:
+                raise ValueError(f"No LoRA stats for seq_length={seq_len}. Add it to _LORA_SEQ_STATS.")
+            avg_seqlen2, avg_tokens = _LORA_SEQ_STATS[seq_len]
+


⚠️ Potential issue | 🟠 Major

Avoid hard-failing on unsupported LoRA seq_length values.

Line 196–197 currently raises for any seq_length not present in _LORA_SEQ_STATS, which can stop otherwise valid training runs just for metrics computation. Prefer a graceful fallback to the standard transformer path when stats are missing.

Proposed fix

- if is_lora: + if is_lora: _LORA_SEQ_STATS = { } seq_len = cfg.model.seq_length - if seq_len not in _LORA_SEQ_STATS: - raise ValueError(f"No LoRA stats for seq_length={seq_len}. Add it to _LORA_SEQ_STATS.") - avg_seqlen2, avg_tokens = _LORA_SEQ_STATS[seq_len] + seq_stats = _LORA_SEQ_STATS.get(seq_len) + if seq_stats is None: + # Fallback to standard transformer FLOPs path below when stats are unavailable. + pass + else: + avg_seqlen2, avg_tokens = seq_stats - hs = cfg.model.hidden_size - n_layers = cfg.model.num_layers - n_heads = cfg.model.num_attention_heads - ffn_hs = cfg.model.ffn_hidden_size - vocab_size = cfg.model.vocab_size + hs = cfg.model.hidden_size + n_layers = cfg.model.num_layers + n_heads = cfg.model.num_attention_heads + ffn_hs = cfg.model.ffn_hidden_size + vocab_size = cfg.model.vocab_size - model_flops_frozen = ( - avg_tokens - * n_layers - * hs**2 - * ( - 12 - + 12 * num_query_groups / n_heads - + 18 * ffn_hs / hs - + 6 * vocab_size / (n_layers * hs) + model_flops_frozen = ( + avg_tokens + * n_layers + * hs**2 + * ( + 12 + + 12 * num_query_groups / n_heads + + 18 * ffn_hs / hs + + 6 * vocab_size / (n_layers * hs) + ) ) - ) - model_flops_unfrozen = n_layers * hs**2 * (12 * avg_seqlen2 / hs) + model_flops_unfrozen = n_layers * hs**2 * (12 * avg_seqlen2 / hs) - return batch_size * (model_flops_frozen * (2.0 / 3.0) + model_flops_unfrozen) + return batch_size * (model_flops_frozen * (2.0 / 3.0) + model_flops_unfrozen)

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 197-197: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/training/utils/flop_utils.py` around lines 196 - 199, Replace the hard raise when seq_len is missing from _LORA_SEQ_STATS with a graceful fallback: log a warning (use the module logger or warnings.warn) indicating missing LoRA stats for seq_len, then fall back to the standard transformer path by deriving reasonable defaults for avg_seqlen2 and avg_tokens (e.g., estimate avg_seqlen2 = seq_len * seq_len and avg_tokens = seq_len, or call an existing transformer stats helper if available) instead of raising; keep references to _LORA_SEQ_STATS, seq_len, avg_seqlen2, and avg_tokens so the rest of the function can continue using those values.

coderabbitai · 2026-02-25T20:43:44Z

src/megatron/bridge/training/utils/flop_utils.py

+            vocab_size = cfg.model.vocab_size
+
+            model_flops_frozen = (
+                avg_tokens
+                * n_layers
+                * hs**2
+                * (
+                    12
+                    + 12 * num_query_groups / n_heads
+                    + 18 * ffn_hs / hs
+                    + 6 * vocab_size / (n_layers * hs)
+                )


⚠️ Potential issue | 🟠 Major

Use padded vocab size in LoRA FLOPs math for consistency.

Line 204 uses cfg.model.vocab_size, while the non-LoRA path uses calculate_padded_vocab_size(...) for logits FLOPs. This can undercount TFLOPs when vocab padding is enabled.

Proposed fix

- vocab_size = cfg.model.vocab_size + vocab_size = calculate_padded_vocab_size( + cfg.model.vocab_size, + cfg.model.make_vocab_size_divisible_by, + cfg.model.tensor_model_parallel_size, + logging_enabled=False, + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

vocab_size = cfg.model.vocab_size

model_flops_frozen = (

avg_tokens

* n_layers

* hs**2

* (

12

+ 12 * num_query_groups / n_heads

+ 18 * ffn_hs / hs

+ 6 * vocab_size / (n_layers * hs)

)

vocab_size = calculate_padded_vocab_size(

cfg.model.vocab_size,

cfg.model.make_vocab_size_divisible_by,

cfg.model.tensor_model_parallel_size,

logging_enabled=False,

)

model_flops_frozen = (

avg_tokens

* n_layers

* hs**2

* (

12

12 * num_query_groups / n_heads

18 * ffn_hs / hs

6 * vocab_size / (n_layers * hs)

)

🧰 Tools

🪛 GitHub Actions: CICD NeMo

[error] 207-214: pre-commit ruff-format hook failed: 1 file reformatted. Run 'pre-commit run --all-files' or commit again to apply formatting changes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/training/utils/flop_utils.py` around lines 204 - 215, The LoRA FLOPs calculation uses cfg.model.vocab_size directly which undercounts when vocab padding is applied; compute the padded vocab size with the existing helper (call calculate_padded_vocab_size with the model vocab) and replace the plain vocab_size term used inside model_flops_frozen's logits factor (the expression involving avg_tokens, n_layers, hs, ... , 6 * vocab_size / (n_layers * hs)) so the logits TFLOPs use the padded vocabulary size consistently with the non‑LoRA path.

ko3n1g · 2026-02-25T20:51:45Z

merged into #2509

Fix LLAMA3 LoRa TFLOPs Formula (#2416)

276b77d

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

svcnvidia-nemo-ci requested a review from rhmukundan February 25, 2026 20:37

svcnvidia-nemo-ci added cherry-pick Run CICD labels Feb 25, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 20:38 Inactive

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

ko3n1g closed this Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `Fix LLAMA3 LoRa TFLOPs Formula (2416)` into `r0.3.0`#2533

cp: `Fix LLAMA3 LoRa TFLOPs Formula (2416)` into `r0.3.0`#2533
svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
cherry-pick-2416-r0.3.0

svcnvidia-nemo-ci commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

svcnvidia-nemo-ci commented Feb 25, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 25, 2026

Uh oh!

coderabbitai bot Feb 25, 2026

Uh oh!

coderabbitai bot Feb 25, 2026

Uh oh!

ko3n1g commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

svcnvidia-nemo-ci commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

svcnvidia-nemo-ci commented Feb 25, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ko3n1g commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

svcnvidia-nemo-ci commented Feb 25, 2026 •

edited by coderabbitai bot

Loading