Mrope accuracy fix for qwen by hsubramony · Pull Request #1437 · vllm-project/vllm-gaudi

hsubramony · 2026-05-11T20:59:14Z

When mrope_interleaved is enabled, HPUMRotaryEmbedding was still using the non-interleaved split/concat section mapping for cos/sin.
This produced incorrect rotary channel ordering for multimodal MRoPE inputs and could cause sample-level mismatches against upstream vLLM behavior.
Use apply_interleaved_rope for the interleaved branch, and preserve the existing split/concat logic for non-interleaved layouts.

When mrope_interleaved is enabled, HPUMRotaryEmbedding was still using the non-interleaved split/concat section mapping for cos/sin. This produced incorrect rotary channel ordering for multimodal MRoPE inputs and could cause sample-level mismatches against upstream vLLM behavior. Use apply_interleaved_rope for the interleaved branch, and preserve the existing split/concat logic for non-interleaved layouts. Co-authored-by: Jimin Ha <jimin.ha@intel.com> Signed-off-by: Harish Subramony <harish.subramony@intel.com>

Copilot

Pull request overview

This PR updates the HPU implementation of MRotaryEmbedding to improve multimodal MRoPE correctness/accuracy (notably for Qwen-style 3-axis positions) by changing how cos/sin values are assembled for the rotary kernel.

Changes:

Add an optional prepare_mrope_cache mode to precompute three sparse cos/sin caches (per T/H/W axis) that can be combined by addition.
Normalize/reshape incoming positions more defensively (handling [3, seq_len, 1], [1, seq_len], and flattened forms).
Switch the multimodal [3, seq_len] path from per-step split/concat to cache-sum assembly.

+            assert self.mrope_section
+
+            sin_start_idx = self.rotary_dim // 2
+            if getattr(self, "mrope_interleaved", False):
+                mrope1_slice = (
+                    list(range(1, self.mrope_section[1] * 3, 3))
+                    + list(range(sin_start_idx + 1, sin_start_idx + self.mrope_section[1] * 3, 3)))
+                mrope2_slice = (
+                    list(range(2, self.mrope_section[2] * 3, 3))
+                    + list(range(sin_start_idx + 2, sin_start_idx + self.mrope_section[2] * 3, 3)))
+            else:
+                c0 = self.mrope_section[0]
+                c1 = c0 + self.mrope_section[1]
+                c2 = c1 + self.mrope_section[2]
+                mrope1_slice = list(range(c0, c1)) + list(range(sin_start_idx + c0, sin_start_idx + c1))
+                mrope2_slice = list(range(c1, c2)) + list(range(sin_start_idx + c1, sin_start_idx + c2))


-            cos = torch.cat((cos, cos), dim=-1).unsqueeze(-2)
-            sin = torch.cat((sin, sin), dim=-1).unsqueeze(-2)
+        if offsets is not None:
+            offsets = offsets.view(positions.shape[0], -1)


+            if positions.shape[0] != num_tokens:
+                positions = positions.view(-1, num_tokens)


+            self.cos_sin_cache_mrope1 = torch.zeros_like(self.cos_sin_cache)
+            self.cos_sin_cache_mrope2 = torch.zeros_like(self.cos_sin_cache)
+            self.cos_sin_cache_mrope1[..., mrope1_slice] = self.cos_sin_cache[..., mrope1_slice]
+            self.cos_sin_cache_mrope2[..., mrope2_slice] = self.cos_sin_cache[..., mrope2_slice]
+            self.cos_sin_cache_mrope0 = self.cos_sin_cache.clone()
+            self.cos_sin_cache_mrope0[..., mrope1_slice] = 0
+            self.cos_sin_cache_mrope0[..., mrope2_slice] = 0
+
+            def repeat_cache(cos_sin_cache: torch.Tensor) -> torch.Tensor:
+                if self.is_neox_style:
+                    cos, sin = cos_sin_cache.chunk(2, dim=-1)
+                    return torch.cat((cos, cos, sin, sin), dim=-1)
+                return torch.repeat_interleave(cos_sin_cache, 2, dim=-1)
+
+            self.cos_sin_cache_mrope0 = repeat_cache(self.cos_sin_cache_mrope0)
+            self.cos_sin_cache_mrope1 = repeat_cache(self.cos_sin_cache_mrope1)
+            self.cos_sin_cache_mrope2 = repeat_cache(self.cos_sin_cache_mrope2)


+            self.cos_sin_cache_mrope1 = torch.zeros_like(self.cos_sin_cache)
+            self.cos_sin_cache_mrope2 = torch.zeros_like(self.cos_sin_cache)
+            self.cos_sin_cache_mrope1[..., mrope1_slice] = self.cos_sin_cache[..., mrope1_slice]
+            self.cos_sin_cache_mrope2[..., mrope2_slice] = self.cos_sin_cache[..., mrope2_slice]
+            self.cos_sin_cache_mrope0 = self.cos_sin_cache.clone()
+            self.cos_sin_cache_mrope0[..., mrope1_slice] = 0
+            self.cos_sin_cache_mrope0[..., mrope2_slice] = 0
+
+            def repeat_cache(cos_sin_cache: torch.Tensor) -> torch.Tensor:
+                if self.is_neox_style:
+                    cos, sin = cos_sin_cache.chunk(2, dim=-1)
+                    return torch.cat((cos, cos, sin, sin), dim=-1)
+                return torch.repeat_interleave(cos_sin_cache, 2, dim=-1)
+
+            self.cos_sin_cache_mrope0 = repeat_cache(self.cos_sin_cache_mrope0)
+            self.cos_sin_cache_mrope1 = repeat_cache(self.cos_sin_cache_mrope1)
+            self.cos_sin_cache_mrope2 = repeat_cache(self.cos_sin_cache_mrope2)
+            self._mrope_hpu_cache_prepared = True


+        use_mrope_cache_sum = positions.ndim == 2 and positions.shape[0] == 3
+        if use_mrope_cache_sum:
+            if not getattr(self, "_mrope_hpu_cache_prepared", False):
+                self.prepare_cos_sin(positions, offsets, prepare_mrope_cache=True)
+            cos_sin = (self.cos_sin_cache_mrope0[positions[0]] + self.cos_sin_cache_mrope1[positions[1]] +
+                       self.cos_sin_cache_mrope2[positions[2]])


github-actions · 2026-05-11T23:44:15Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2026-05-12T03:08:17Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

github-actions · 2026-05-13T18:36:43Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

When mrope_interleaved is enabled, HPUMRotaryEmbedding was still using the non-interleaved split/concat section mapping for cos/sin. This produced incorrect rotary channel ordering for multimodal MRoPE inputs and could cause sample-level mismatches against upstream vLLM behavior. Use apply_interleaved_rope for the interleaved branch, and preserve the existing split/concat logic for non-interleaved layouts. Signed-off-by: Harish Subramony <harish.subramony@intel.com> Co-authored-by: Jimin Ha <jimin.ha@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai>

Copilot AI review requested due to automatic review settings May 11, 2026 20:59

hsubramony requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 11, 2026 20:59

Copilot started reviewing on behalf of hsubramony May 11, 2026 20:59 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 11, 2026

🚦 Team Review Dashboard #701

Open

hsubramony marked this pull request as draft May 11, 2026 23:25

hsubramony force-pushed the mrope_fix branch from ef61592 to 21697e7 Compare May 11, 2026 23:43

hsubramony marked this pull request as ready for review May 11, 2026 23:44

hsubramony and others added 2 commits May 12, 2026 16:27

Merge branch 'main' into mrope_fix

edc340a

Merge branch 'main' into mrope_fix

6cc1e07

adobrzyn approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into mrope_fix

668fd34

adobrzyn merged commit 7a4d2fe into vllm-project:main May 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mrope accuracy fix for qwen#1437

Mrope accuracy fix for qwen#1437
adobrzyn merged 4 commits into
vllm-project:mainfrom
hsubramony:mrope_fix

hsubramony commented May 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if positions.shape[0] != num_tokens:
		positions = positions.view(-1, num_tokens)

Conversation

hsubramony commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 11, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented May 12, 2026

✅ CI Passed

Uh oh!

github-actions Bot commented May 13, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hsubramony commented May 11, 2026 •

edited

Loading