-
Notifications
You must be signed in to change notification settings - Fork 450
[Example] Add block level high performance gemv example #1097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThe GEMV example undergoes significant refactoring, replacing a single best-config system with a multi-layer autotuning approach. The changes introduce block-template and thread-template configuration generators, two new autotuned kernels (gemv_alloc_reducer and get_autotuned_kernel), and update the main function signature to support configurable benchmarking modes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
* add alloc_reducer gemv example * test
This pull request refactors and extends the GEMV example to introduce a new block reduction kernel (
gemv_alloc_reducer), reorganize autotuning configurations, and improve benchmarking and testing workflows. The changes make the codebase more modular and enable easier experimentation with different GEMV kernel variants.New kernel and autotuning infrastructure:
gemv_alloc_reducerkernel using block reduction and allocation-based reduction, with its own autotuning configuration generator (get_block_template_configs) and kernel definition usingtl.autotuneandtl.jitdecorators.get_thread_template_configs) and updated the kernel interface to useget_autotuned_kernelfor clarity and modularity.Benchmarking and correctness improvements:
gemv_alloc_reducerkernel, and expanded the benchmarking logic to compare both SIMT and block reduction implementations whendo_benchis set toFalse.Testing improvements:
main(do_bench=False), ensuring all kernel variants are tested without running the full benchmark suite.Code cleanup and bug fixes:
do_benchflag for more flexible execution.Summary by CodeRabbit
Release Notes
New Features
Changes