Skip to content

[BugFix] Phaseout unused tests for gqa decode kernels and add the kernels to CI#1515

Merged
LeiWang1999 merged 1 commit intotile-ai:mainfrom
tzj-fxz:gqadecode1223
Dec 24, 2025
Merged

[BugFix] Phaseout unused tests for gqa decode kernels and add the kernels to CI#1515
LeiWang1999 merged 1 commit intotile-ai:mainfrom
tzj-fxz:gqadecode1223

Conversation

@tzj-fxz
Copy link
Contributor

@tzj-fxz tzj-fxz commented Dec 23, 2025

As title

Summary by CodeRabbit

  • New Features

    • Added streamlined entry points for directly executing flash decoding examples with variable-length support
    • Enhanced variable-length decoding as the default test path
    • Updated kernel block sizing defaults
  • Tests

    • Added integration tests for variable-length flash decoding examples

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

Walkthrough

The PR reorganizes flash decoding example entry points by adding main() functions to varlen decode examples, removing equal-sequence-length decode paths, adjusting default parameters (block_size from 64 to 128), updating comparison logic for tensor shape alignment, and adding corresponding test cases.

Changes

Cohort / File(s) Summary
Varlen decode example reorganization
examples/flash_decoding/example_gqa_decode_varlen_logits.py, examples/flash_decoding/example_gqa_decode_varlen_logits_paged.py
Added main() entry point functions to both files; removed test_equal_seqlen_decode_main() from paged variant; updated __main__ flow to default to varlen tests; changed block_size default from 64 to 128; improved comparison logic with shape-aligned slicing and multi-line assertion formatting
Test integration
examples/flash_decoding/test_example_flash_decoding.py
Added module imports and two new test functions (test_example_example_gqa_decode_varlen_logits() and test_example_example_gqa_decode_varlen_logits_paged()) that invoke the new main() entry points

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: phasing out old test paths and integrating new varlen decode tests into the CI test suite.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/flash_decoding/example_gqa_decode_varlen_logits.py (1)

200-200: Consider documenting or removing the commented autotune decorator.

The commented-out @autotune decorator suggests temporary disabling. If autotuning is not needed for CI tests, consider removing the line. Otherwise, add a comment explaining why it's disabled.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11f122e and 691448d.

📒 Files selected for processing (3)
  • examples/flash_decoding/example_gqa_decode_varlen_logits.py
  • examples/flash_decoding/example_gqa_decode_varlen_logits_paged.py
  • examples/flash_decoding/test_example_flash_decoding.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-18T04:50:00.512Z
Learnt from: silentCoder-dev
Repo: tile-ai/tilelang PR: 1464
File: testing/python/language/test_tilelang_language_rand.py:14-14
Timestamp: 2025-12-18T04:50:00.512Z
Learning: In `testing/python/language/test_tilelang_language_rand.py`, the TileLang kernel uses `blk_M = M` (single block) and calls `rng_rand()` four times per element to align results with the Triton implementation, which uses `blk_M = 128` (multiple blocks) and calls the RNG once per element. These differences compensate for internal RNG behavior differences between TileLang and Triton.

Applied to files:

  • examples/flash_decoding/example_gqa_decode_varlen_logits.py
🧬 Code graph analysis (3)
examples/flash_decoding/test_example_flash_decoding.py (2)
examples/flash_decoding/example_gqa_decode_varlen_logits.py (1)
  • main (770-784)
examples/flash_decoding/example_gqa_decode_varlen_logits_paged.py (1)
  • main (522-537)
examples/flash_decoding/example_gqa_decode_varlen_logits_paged.py (1)
examples/flash_decoding/example_gqa_decode_varlen_logits.py (2)
  • main (770-784)
  • test_varlen_decode_main (440-641)
examples/flash_decoding/example_gqa_decode_varlen_logits.py (2)
examples/flash_decoding/example_gqa_decode_varlen_logits_paged.py (2)
  • main (522-537)
  • test_varlen_decode_main (202-407)
examples/flash_decoding/example_gqa_decode.py (1)
  • main (440-483)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (8)
examples/flash_decoding/example_gqa_decode_varlen_logits_paged.py (2)

522-538: LGTM! Entry point for CI testing.

The main() function provides a clean entry point for the CI test suite with appropriate default parameters. The addition of page_block_size=128 is specific to this paged variant and aligns with the kernel's requirements.


563-563: LGTM! Simplified test execution path.

The change to directly call test_varlen_decode_main(args) aligns with the PR objective to phase out unused equal-length decode tests and focus on variable-length testing.

examples/flash_decoding/test_example_flash_decoding.py (2)

5-6: LGTM! Module imports for CI tests.

The imports enable the new test functions to call the main() entry points added to these modules.


20-26: No action needed. The varlen test modules (example_gqa_decode_varlen_logits and example_gqa_decode_varlen_logits_paged) show no evidence of SM90 correctness issues, unlike the base gqa_decode module which explicitly documents this issue in line 9 of the test file (# TODO(lei): fix the correctness of gqa decode on sm90). The varlen modules contain unrelated TODOs about max_seqlen_kv validation and split versions, with no SM90 or compute version constraints mentioned anywhere in their code. The decorators are not required for these tests.

Likely an incorrect or invalid review comment.

examples/flash_decoding/example_gqa_decode_varlen_logits.py (4)

623-627: LGTM! Improved shape alignment for comparison.

The change ensures both tensors are sliced to the same size before computing the difference, making the comparison more robust and preventing potential shape mismatch errors.


634-639: LGTM! Consistent shape alignment in assertion.

The multi-line assertion format improves readability, and the slicing on both tensors ensures shape consistency with the max_diff_s_tl calculation above.


770-785: LGTM! Entry point for CI testing.

The main() function provides a clean entry point for the CI test suite with appropriate default parameters. The configuration aligns with the updated defaults in the argument parser.


802-809: LGTM! Varlen testing now default.

The changes make variable-length testing the default path, aligning with the PR objective to phase out equal-length decode tests. The execution flow now consistently uses test_varlen_decode_main(args).

parser.add_argument("--k_seqlen", type=int, default=8192, help="Key sequence length")
parser.add_argument("--head_size", type=int, default=128, choices=[64, 128, 256], help="Head dimension")
parser.add_argument("--block_size", type=int, default=64, help="Block size for computation")
parser.add_argument("--block_size", type=int, default=128, help="Block size for computation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if block_size=64 is used elsewhere in the codebase
rg -nP --type=py 'block_size\s*=\s*64' examples/

Repository: tile-ai/tilelang

Length of output: 1496


🏁 Script executed:

#!/bin/bash
# Check the broader context of block_size usage
rg -nP --type=py 'block_size' examples/flash_decoding/ | head -20

Repository: tile-ai/tilelang

Length of output: 2950


🏁 Script executed:

#!/bin/bash
# Check the git history of this file to confirm the change from 64 to 128
git log -p --follow examples/flash_decoding/example_gqa_decode_varlen_logits.py | grep -A 5 -B 5 "block_size.*default" | head -30

Repository: tile-ai/tilelang

Length of output: 42


Parameter default changed from 64 to 128 inconsistent with codebase patterns.

The block_size default has been changed from 64 to 128, creating an inconsistency with the rest of the codebase where block_size=64 is the standard default across multiple modules (deepseek_nsa, deepseek_mla, and even the related example_gqa_decode_varlen_logits_paged.py file). Document this deviation and ensure the change is intentional and doesn't break existing workflows.

🤖 Prompt for AI Agents
In examples/flash_decoding/example_gqa_decode_varlen_logits.py around line 794,
the parser default for --block_size was changed from 64 to 128 which is
inconsistent with the rest of the codebase; either revert the default back to 64
to match deepseek_nsa/deepseek_mla and
example_gqa_decode_varlen_logits_paged.py, or if 128 is intentional, add a clear
inline comment and update documentation/examples to explain the deviation and
run a quick smoke test to ensure no workflows break.

@LeiWang1999 LeiWang1999 merged commit feb106b into tile-ai:main Dec 24, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants