Skip to content

[Refactor] re-implement vector subtype and its access method#1722

Merged
LeiWang1999 merged 1 commit intotile-ai:mainfrom
LeiWang1999:fp4_0123
Jan 23, 2026
Merged

[Refactor] re-implement vector subtype and its access method#1722
LeiWang1999 merged 1 commit intotile-ai:mainfrom
LeiWang1999:fp4_0123

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Jan 23, 2026

as title. Thanks for @Hamerlate to provide the test machine

Summary by CodeRabbit

  • Refactor

    • Improved FP4 tensor element handling with refined data structure organization and access patterns for various vector widths.
    • Enhanced FP4 data type implementation with explicit accessor methods replacing direct member access.
  • Tests

    • Updated FP4 copy operation tests to verify data integrity with corrected input shape parameters.

✏️ Tip: You can customize this high-level summary in your review settings.

- Refactored `fp4_e2_2_t` structure to include methods for accessing and setting low and high bits.
- Updated `PrintVecElemLoad` and `PrintVecElemStore` methods in `codegen_cuda.cc` to reflect changes in member access for `fp4_e2_4_t` and `fp4_e2_2_t`.
- Adjusted test for `tilelang_copy_fp4` to ensure correct input dimensions and output validation based on data type consistency.
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

This PR refactors FP4-e2m1 tensor storage and access patterns in CUDA. It replaces direct member access with a new encapsulated fp4_e2_2_t class providing x()/y() accessors, updates the code generator to use these accessors with adjusted indexing for larger lane counts, and adjusts corresponding test data shapes.

Changes

Cohort / File(s) Summary
FP4 Type Definitions
src/tl_templates/cuda/cuda_fp4.h
Replaces fp4_e2x2_t and fp4_e2x4_t type aliases with new fp4_e2_2_t class wrapping __nv_fp4x2_storage_t. Introduces x()/y() accessor methods and set_x()/set_y() mutators. Restructures fp4_e2_4_t to nest two fp4_e2_2_t members instead of four fp4_e2_t members. Updates packing/assignment semantics throughout.
CUDA Code Generation
src/target/codegen_cuda.cc
Updates PrintVecElemLoad and PrintVecElemStore to use new fp4_e2_2_t accessors. Shifts element indexing from direct i%4 to (i % 4) / 2 for lanes >= 4, grouping into 2-wide submembers. Replaces direct member access with x()/y() and set_x()/set_y() method calls.
FP4 Copy Test
testing/python/language/test_tilelang_language_copy.py
Adjusts FP4 copy test input shape from (M, N) to (M, N // 2) for int8 storage alignment. Adds runtime assertion verifying output matches input as int8 when src and dst dtypes match.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Four bits bundled, two by two,
Nested accessors, fresh and new,
Indexing shifts to group with care,
FP4 storage everywhere! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: refactoring the vector subtype (fp4_e2_2_t and fp4_e2_4_t) and their access methods from direct member access to accessor/mutator methods.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@LeiWang1999 LeiWang1999 merged commit 4ab369d into tile-ai:main Jan 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants