Skip to content

Conversation

@pralay-das
Copy link

@pralay-das pralay-das commented Oct 30, 2025

In this PR

  • change the interface of chunk_prefill execution engine, currently it is more compatible with cuda cutlass implementation
  • support for cutlass_mla_get_workspace_size op
  • verified changes with test file
    cmd: python -m pytest tests/test_flash_attention.py
    result: 96 passed, 182 skipped, 1 warning in 3.43s

@pralay-das pralay-das changed the title Added support for cutlass_mla_get_workspace_size (with new chunk prefill interface) Added support for cutlass_mla_get_workspace_size (with new execution engine interface) Oct 30, 2025
@pralay-das
Copy link
Author

pralay-das commented Oct 31, 2025

run-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants