refactor: cleanup codebase after tvm-ffi refactor#1795
refactor: cleanup codebase after tvm-ffi refactor#1795yzh119 merged 18 commits intoflashinfer-ai:mainfrom
Conversation
Summary of ChangesHello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on refactoring and cleaning up the codebase following a previous Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a significant refactoring to unify the codegen logic for PyTorch and TVM by moving to tvm-ffi. This involves renaming many files from _ops.cu and _pybind.cu to _binding.cu, and removing the old tvm_binding directory. The changes in flashinfer/jit/cpp_ext.py correctly remove dependencies on PyTorch's C++ extension build system, which is a good step towards a more backend-agnostic FFI approach. The overall refactoring seems well-executed. However, I've found a systematic typo in the Python files that reference the newly renamed CUDA files: they use _bindings.cu (plural) while the renamed files are _binding.cu (singular). This will cause build failures and needs to be corrected across all affected files. I've added comments for each occurrence to help you fix them.
|
All necessary changes on flashinfer side to support tvm and mlc are included in #1836. |
📌 Description
The codegen logic for pytorch and tvm should unify after #1641 , and this PR cleans up the related codegen functions in tvm_bindings.
Other changes:
_ops.cuand_pybind.curenamed to_binding.cuuse_torch_streamin unittests, they are no longer required after [CYTHON] Fix stream passing bug apache/tvm-ffi#68🔍 Related Issues
#1641
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
cc @MasterJH5574 please let us know what changes do we need to make to help you bump to the latest version of flashinfer in MLC.