Skip to content

Conversation

@xingxing588
Copy link

detail log:
test_attention_ops.py::attention_ref

   output = torch.einsum("bhts,bshd->bthd", attention_drop, drop_v)
   if query_padding_mask is not None:
        output.masked_fill_((~query_padding_mask)[:, :, None, None], 0.0)
    log:output.shape torch.Size([1, 2, 4, 128])
    log:output.stride (256, 128, 256, 1)
    output = output.contiguous() --need contiguous
    log:output.shape torch.Size([1, 2, 4, 128])
    log:output.stride (1024, 512, 128, 1)
flag_gems/ops/attention.py::flash_attention_forward
    query = transpose(1, 2) got a temp layout
    log:out torch.Size([1, 4, 2, 128])
    log:out.stride (1024, 128, 512, 1)
    log:out.is_contiguous() False
    out = out.contiguous()
    log:out.contiguous() torch.Size([1, 4, 2, 128])
    log:out.contiguous().stride (1024, 128, 512, 1)
    log:out.contiguous().is_contiguous() False

PR Category

Type of Change

Description

Issue

Progress

  • Change is properly reviewed (1 reviewer required, 2 recommended).
  • Change is responded to an issue.
  • Change is fully covered by a UT.

Performance

    detail log:
    test_attention_ops.py::attention_ref

       output = torch.einsum("bhts,bshd->bthd", attention_drop, drop_v)
       if query_padding_mask is not None:
            output.masked_fill_((~query_padding_mask)[:, :, None, None], 0.0)
        log:output.shape torch.Size([1, 2, 4, 128])
        log:output.stride (256, 128, 256, 1)
        output = output.contiguous() --need contiguous
        log:output.shape torch.Size([1, 2, 4, 128])
        log:output.stride (1024, 512, 128, 1)
    flag_gems/ops/attention.py::flash_attention_forward
        query = transpose(1, 2) got a temp layout
        log:out torch.Size([1, 4, 2, 128])
        log:out.stride (1024, 128, 512, 1)
        log:out.is_contiguous() False
        out = out.contiguous()
        log:out.contiguous() torch.Size([1, 4, 2, 128])
        log:out.contiguous().stride (1024, 128, 512, 1)
        log:out.contiguous().is_contiguous() False
@CLAassistant
Copy link

CLAassistant commented Nov 25, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@kiddyjinjin kiddyjinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

non_null_window_right = -1

out = torch.empty_like(query)
out = torch.empty(query.shape, device=query.device, dtype=query.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why using torch.empty instead of torch.empty_like

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants