Skip to content

Make triton_kernels.tensor.convert_layout idempotent#10401

Merged
ptillet merged 7 commits into
mainfrom
roman/idempotent-convert-layout
May 29, 2026
Merged

Make triton_kernels.tensor.convert_layout idempotent#10401
ptillet merged 7 commits into
mainfrom
roman/idempotent-convert-layout

Conversation

@roman-openai
Copy link
Copy Markdown
Collaborator

@roman-openai roman-openai commented May 28, 2026

Make triton_kernels.tensor.convert_layout return its input when its storage already represents the requested layout.

Motivation

Callers should not need to guard an idempotent layout conversion. This avoids redundant unswizzle/swizzle copies for already-correct MXFP intermediates.

Strided layouts

StridedLayout identifies the packed dimension, not compact physical strides. For example:

pitched = torch.empty_strided((64, 128), (256, 1))
tensor = wrap_torch_tensor(pitched)
assert tensor.storage.layout == StridedLayout(-1)

converted = convert_layout(tensor, StridedLayout(-1))

Before this PR, the matching-layout conversion still copied through canonical storage and incidentally densified the tensor:

assert converted is not tensor
assert converted.storage.data.stride() == (128, 1)

After this PR, the existing valid storage is preserved:

assert converted is tensor
assert converted.storage.data.stride() == (256, 1)

Both strides are valid StridedLayout(-1) storage. Kernels and TMA checks consume physical strides separately. Invalid strided storage with no contiguous dimension is still rejected.

Swizzled layouts

Matching swizzled layouts are also already in the requested storage encoding. For example:

swizzled = convert_layout(tensor, HopperMXScaleLayout(-2, 8))
assert convert_layout(swizzled, HopperMXScaleLayout(-2, 8)) is swizzled

This avoids an unnecessary unswizzle/swizzle round trip and preserves any layout-specific padded storage. A different parameterized layout still converts normally:

assert convert_layout(swizzled, HopperMXScaleLayout(-2, 4)) is not swizzled

convert_layout changes storage encoding. It does not clone, densify, or scrub padding when the existing storage is already valid for the requested layout.

@roman-openai roman-openai marked this pull request as ready for review May 28, 2026 04:36
@roman-openai roman-openai requested a review from ptillet as a code owner May 28, 2026 04:36
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c80e52647c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +236 to +237
if tensor.storage.layout == layout and not layout_transformation_kwargs:
return tensor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check canonical strides before returning no-op

When wrap_torch_tensor sees a strided view such as base[::2, :], it records only the unit-stride dimension as StridedLayout(-1), even though the storage still has non-dense strides. With this early return, convert_layout(tensor, StridedLayout(-1)) now skips the existing unswizzle/swizzle path that would allocate canonical dense storage, so callers using convert_layout to normalize an already-labeled tensor can pass a view with gaps/padding to downstream kernels or reference paths that expect the layout transform's dense strides.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-dense case is intentional: StridedLayout identifies the packed/contiguous logical dimension, not compact physical strides. Consumers such as downcast_to_mxfp and matmul use the actual strides, so a valid pitched view remains a valid StridedLayout(-1) tensor. I documented that convert_layout does not densify/canonicalize already-valid storage and added test_downcast_to_mxfp_accepts_pitched_strided_input. I also added test_convert_layout_rejects_strided_view_without_contiguous_dimension so a view with no unit-stride dimension is still rejected rather than incorrectly no-oping.



def convert_layout(tensor: Tensor, layout: Layout, **layout_transformation_kwargs):
if tensor.storage.layout == layout and not layout_transformation_kwargs:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid dataclass equality on ragged layouts

When both the source and target layouts are BlackwellActMXScaleLayout instances with distinct non-None RaggedTensorMetadata, this == comparison descends into the metadata's torch.Tensor fields. Tensor equality produces a boolean tensor rather than a Python bool, so the layout check can raise before conversion instead of re-swizzling or returning a no-op; this affects ragged activation scale tensors whose target layout is reconstructed from equivalent metadata.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The no-op decision is now Layout.can_preserve_storage_as(), and BlackwellActMXScaleLayout overrides it with metadata identity (self.ragged_metadata is other.ragged_metadata) instead of tensor-valued dataclass equality. test_act_scale_storage_preservation covers storage reuse with the same metadata object and conversion with independently reconstructed metadata.

@roman-openai roman-openai marked this pull request as draft May 28, 2026 04:59
@roman-openai roman-openai changed the title Make triton_kernels convert_layout idempotent Make triton_kernels.tensor.convert_layout idempotent May 28, 2026
@roman-openai roman-openai marked this pull request as ready for review May 28, 2026 16:06
@ptillet ptillet merged commit 06e12a2 into main May 29, 2026
19 of 20 checks passed
@ptillet ptillet deleted the roman/idempotent-convert-layout branch May 29, 2026 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants