Skip to content

ci/cd: bring up flashinfer-cubin package#1718

Merged
yzh119 merged 6 commits intoflashinfer-ai:mainfrom
yzh119:flashinfer-cubin-package
Sep 19, 2025
Merged

ci/cd: bring up flashinfer-cubin package#1718
yzh119 merged 6 commits intoflashinfer-ai:mainfrom
yzh119:flashinfer-cubin-package

Conversation

@yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Sep 18, 2025

📌 Description

User prefers a standalone wheel for cubin files in flashinfer, this PR implements this feature.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new flashinfer-cubin package, designed to provide a standalone wheel for FlashInfer's CUDA cubin files. This change aims to modularize the distribution of pre-compiled kernels, allowing users to install them independently. The new package includes mechanisms for building the wheel, downloading cubins, and integrating seamlessly with the main FlashInfer library, offering greater flexibility and potentially reducing the footprint of the core FlashInfer installation.

Highlights

  • New flashinfer-cubin Package: Introduced a dedicated Python package, flashinfer-cubin, to distribute pre-compiled CUDA kernels (cubins) separately from the main FlashInfer library.
  • Automated Cubin Management: Added new scripts (build_wheel.py, download_cubins.py) and custom setuptools commands within setup.py to automate the downloading and packaging of cubin files directly into the new flashinfer-cubin wheel.
  • Prioritized Cubin Loading: Modified the main flashinfer library's cubin directory resolution logic to first check for cubins within the newly installed flashinfer-cubin package, then environment variables, before falling back to the default cache directory.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new flashinfer-cubin package to distribute pre-compiled CUDA kernels, which is a great addition for users who prefer a standalone wheel. The implementation is solid, including custom setuptools commands to download artifacts during the build process and logic to prioritize this package for locating cubins.

My review includes a few suggestions to improve maintainability and code style:

  • In flashinfer-cubin/setup.py, there's an opportunity to refactor duplicated code in the custom build commands into a shared helper function.
  • In flashinfer-cubin/flashinfer_cubin/__init__.py, the list_cubins function can be simplified using pathlib.
  • A minor style point in flashinfer-cubin/build_wheel.py regarding an inline import.

Overall, the changes are well-structured and address the feature request effectively. Addressing these points will make the new packaging code even cleaner and easier to maintain.

for dir_to_clean in [dist_dir, build_dir, egg_info_dir]:
if dir_to_clean.exists():
print(f"Cleaning {dir_to_clean}")
import shutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better code style and readability, it's recommended to move imports to the top of the file. Please move import shutil to the top-level imports section (e.g., after from pathlib import Path).

Comment on lines +29 to +40
def list_cubins():
"""List all available cubin files."""
if not CUBIN_DIR.exists():
return []

cubins = []
for root, _, files in os.walk(CUBIN_DIR):
for file in files:
if file.endswith(".cubin"):
rel_path = os.path.relpath(os.path.join(root, file), CUBIN_DIR)
cubins.append(rel_path)
return sorted(cubins)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list_cubins function can be made more concise and idiomatic by using pathlib.Path.rglob to find all .cubin files recursively. This also improves consistency by using pathlib features instead of mixing with os.walk and os.path.

Suggested change
def list_cubins():
"""List all available cubin files."""
if not CUBIN_DIR.exists():
return []
cubins = []
for root, _, files in os.walk(CUBIN_DIR):
for file in files:
if file.endswith(".cubin"):
rel_path = os.path.relpath(os.path.join(root, file), CUBIN_DIR)
cubins.append(rel_path)
return sorted(cubins)
def list_cubins():
"""List all available cubin files."""
if not CUBIN_DIR.exists():
return []
return sorted([str(p.relative_to(CUBIN_DIR)) for p in CUBIN_DIR.rglob("*.cubin")])

Comment on lines +38 to +119
class DownloadAndBuildPy(build_py):
"""Custom build command that downloads cubins before building."""

def run(self):
print("Downloading cubins from artifactory...")

# Create a temporary directory for cubins within the package
cubin_package_dir = Path(self.build_lib) / "flashinfer_cubin" / "cubins"
cubin_package_dir.mkdir(parents=True, exist_ok=True)

# Set environment variable to download to our package directory
original_cubin_dir = os.environ.get("FLASHINFER_CUBIN_DIR")
os.environ["FLASHINFER_CUBIN_DIR"] = str(cubin_package_dir)

try:
# Download all cubins using the existing download_artifacts function
download_artifacts()
print(f"Cubins downloaded to {cubin_package_dir}")

finally:
# Restore original environment variable
if original_cubin_dir:
os.environ["FLASHINFER_CUBIN_DIR"] = original_cubin_dir
else:
os.environ.pop("FLASHINFER_CUBIN_DIR", None)

# Create build metadata file with version information
package_dir = Path(self.build_lib) / "flashinfer_cubin"
build_meta_file = package_dir / "_build_meta.py"
version = get_version()

with open(build_meta_file, "w") as f:
f.write('"""Build metadata for flashinfer-cubin package."""\n')
f.write(f'__version__ = "{version}"\n')

print(f"Created build metadata file with version {version}")

# Continue with normal build
super().run()


class CustomSdist(sdist):
"""Custom sdist command that includes downloaded cubins."""

def run(self):
# Download cubins first
print("Downloading cubins for source distribution...")

cubin_package_dir = (
Path(self.distribution.package_dir.get("", "."))
/ "flashinfer_cubin"
/ "cubins"
)
cubin_package_dir.mkdir(parents=True, exist_ok=True)

# Set environment variable to download to our package directory
original_cubin_dir = os.environ.get("FLASHINFER_CUBIN_DIR")
os.environ["FLASHINFER_CUBIN_DIR"] = str(cubin_package_dir)

try:
download_artifacts()
finally:
if original_cubin_dir:
os.environ["FLASHINFER_CUBIN_DIR"] = original_cubin_dir
else:
os.environ.pop("FLASHINFER_CUBIN_DIR", None)

# Create build metadata file with version information for sdist
package_dir = (
Path(self.distribution.package_dir.get("", ".")) / "flashinfer_cubin"
)
build_meta_file = package_dir / "_build_meta.py"
version = get_version()

with open(build_meta_file, "w") as f:
f.write('"""Build metadata for flashinfer-cubin package."""\n')
f.write(f'__version__ = "{version}"\n')

print(f"Created build metadata file with version {version}")

# Continue with normal sdist
super().run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication between the run methods of DownloadAndBuildPy and CustomSdist. Both methods perform similar steps: setting up environment variables, downloading artifacts, and creating a _build_meta.py file. This duplicated logic can be extracted into a helper function to improve code maintainability and reduce redundancy.

Here's a suggestion on how you could refactor it:

from contextlib import contextmanager

@contextmanager
def _temp_env_var(key, value):
    original_value = os.environ.get(key)
    os.environ[key] = str(value)
    try:
        yield
    finally:
        if original_value is not None:
            os.environ[key] = original_value
        else:
            os.environ.pop(key, None)

def _prepare_package_data(base_path: Path):
    """Downloads cubins and creates metadata file."""
    print("Downloading cubins for packaging...")
    cubin_package_dir = base_path / "flashinfer_cubin" / "cubins"
    cubin_package_dir.mkdir(parents=True, exist_ok=True)

    with _temp_env_var("FLASHINFER_CUBIN_DIR", cubin_package_dir):
        download_artifacts()
    print(f"Cubins downloaded to {cubin_package_dir}")

    # Create build metadata file
    package_dir = base_path / "flashinfer_cubin"
    build_meta_file = package_dir / "_build_meta.py"
    version = get_version()

    with open(build_meta_file, "w") as f:
        f.write('"""Build metadata for flashinfer-cubin package."""\n')
        f.write(f'__version__ = "{version}"\n')
    print(f"Created build metadata file with version {version}")


class DownloadAndBuildPy(build_py):
    """Custom build command that downloads cubins before building."""
    def run(self):
        _prepare_package_data(Path(self.build_lib))
        super().run()


class CustomSdist(sdist):
    """Custom sdist command that includes downloaded cubins."""
    def run(self):
        _prepare_package_data(Path(self.distribution.package_dir.get("", ".")))
        super().run()

This refactoring would make the setup.py script cleaner and easier to maintain.

@yzh119 yzh119 merged commit 7ee54c7 into flashinfer-ai:main Sep 19, 2025
1 of 2 checks passed
yzh119 added a commit that referenced this pull request Sep 19, 2025
…#1737)

<!-- .github/pull_request_template.md -->

## 📌 Description

Follow up of #1718 , this PR adds the github workload to build
flashinfer-cubin wheel and publish it to pypi.

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants