Conversation
This PR fixes a bug in stream passing which breaks expected stream passing behavior. Also added a regression case via load_inline_cuda to guard this issue (needs CUDA env to run atm).
Summary of ChangesHello @tqchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses and resolves a critical bug in the Cython-based Foreign Function Interface (FFI) that affected the proper handling and passing of CUDA streams. The fix ensures that device contexts are correctly identified, preventing issues with stream management. To safeguard against future regressions, comprehensive test cases have been introduced for CUDA stream passing, validating the behavior with both default and explicitly managed PyTorch streams. Additionally, the package version has been updated to reflect these changes. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in stream passing within the Cython FFI layer by switching from checking device_id to the more robust device_type to determine if a device context has been established. This is a good improvement for correctness. The PR also introduces a valuable regression test for CUDA stream handling, which validates behavior for both default and user-created streams. Additionally, the package version is updated. My review includes one suggestion to remove a redundant check in the new test code to improve its clarity.
<!-- .github/pull_request_template.md --> ## 📌 Description The codegen logic for pytorch and tvm should unify after #1641 , and this PR cleans up the related codegen functions in tvm_bindings. Other changes: 1. update tvm-ffi to 0.1.0b11 to incorporate apache/tvm-ffi#67 and apache/tvm-ffi#68 2. rename of source files: `_ops.cu` and `_pybind.cu` renamed to `_binding.cu` 3. remove torch related header include/library linking in ninja files (#1642 (comment)) 4. remove the use of `use_torch_stream` in unittests, they are no longer required after apache/tvm-ffi#68 ## 🔍 Related Issues #1641 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes cc @MasterJH5574 please let us know what changes do we need to make to help you bump to the latest version of flashinfer in MLC.
This PR fixes a bug in stream passing which breaks expected stream passing behavior. Also added a regression case via load_inline_cuda to guard this issue (needs CUDA env to run atm).