Skip to content

bugfix: Fix flashinfer download-cubin#1729

Merged
yzh119 merged 2 commits intoflashinfer-ai:mainfrom
tiran:fix-download-cubin
Sep 19, 2025
Merged

bugfix: Fix flashinfer download-cubin#1729
yzh119 merged 2 commits intoflashinfer-ai:mainfrom
tiran:fix-download-cubin

Conversation

@tiran
Copy link
Copy Markdown
Contributor

@tiran tiran commented Sep 19, 2025

📌 Description

Summary: fix flashinfer download-cubin and double download speed

The previous version was attempting to use tqdm module as callable. The new version replaces the custom logging handler with tqdm.contrib.logging helper. Fixes 'module' object is not callable. Did you mean: 'tqdm.tqdm(...)'?

The ThreadPoolExecutor is now correctly wrapped in a context manager. The progress bar is updated by a future done callback.

requests.get creates a new session object for each GET request. This is pretty inefficient, because each request has to perform DNS lookup, TCP handshake, and HTTPS handshake including certificate validation.

A requests.Session can be shared between requests and across threads to make use of HTTP keep-alive. This change more than doubles the download speed and reduces the load on the server.

🔍 Related Issues

Fixes #1728

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @tiran, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical issues and introduces significant performance enhancements for the flashinfer download-cubin utility. It fixes a bug related to tqdm logging, ensures correct context management for multithreaded downloads, and dramatically improves download efficiency by leveraging requests.Session for persistent HTTP connections, ultimately doubling download speed.

Highlights

  • Logging and Progress Bar Fix: The custom logging handler for tqdm was replaced with tqdm.contrib.logging.tqdm_logging_redirect, resolving an issue where tqdm was incorrectly used as a callable and ensuring proper progress bar updates during cubin downloads.
  • ThreadPoolExecutor Context Management: The ThreadPoolExecutor is now correctly managed using a with statement, ensuring proper resource handling, and progress bar updates are handled efficiently via future done callbacks.
  • Download Performance Improvement: requests.Session is now utilized for downloading cubin files, enabling HTTP keep-alive and connection reuse. This significantly improves download speed by reducing overhead from repeated DNS lookups, TCP handshakes, and HTTPS handshakes for multiple requests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the flashinfer download-cubin functionality. It correctly uses tqdm.contrib.logging to fix a logging issue, properly manages ThreadPoolExecutor resources with a context manager to prevent leaks, and introduces requests.Session to dramatically speed up downloads by reusing connections. These are excellent changes that enhance performance and robustness. My review includes a couple of suggestions for further improving code clarity and explicitness in control flow.

Comment on lines +157 to 160
download_file(uri, cubin_path, session=session)
return load_cubin(cubin_path, sha256)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The download_file function returns a boolean indicating success or failure. It's better to check this return value explicitly rather than relying on load_cubin to fail implicitly. This makes the control flow clearer and more robust, as download_file already handles logging on failure.

Suggested change
download_file(uri, cubin_path, session=session)
return load_cubin(cubin_path, sha256)
if download_file(uri, cubin_path, session=session):
return load_cubin(cubin_path, sha256)
return b""

The previous version was attempting to use `tqdm` module as callable.
The new version replaces the custom logging handler with
`tqdm.contrib.logging` helper.

Fixes `'module' object is not callable. Did you mean: 'tqdm.tqdm(...)'?`

The `ThreadPoolExecutor` is now correctly wrapped in a context manager.
The progress bar is updated by a future done callback.

Signed-off-by: Christian Heimes <cheimes@redhat.com>
`requests.get` creates a new session object for each GET request. This
is pretty inefficient, because each request has to perform DNS lookup,
TCP handshake, and HTTPS handshake including certificate validation.

A `requests.Session` can be shared between requests and across threads
to make use of HTTP keep-alive. This change more than doubles the
download speed and reduces the load on the server.

Signed-off-by: Christian Heimes <cheimes@redhat.com>
@tiran tiran force-pushed the fix-download-cubin branch from 6a29b89 to 4de6fc3 Compare September 19, 2025 08:01
@EmilienM
Copy link
Copy Markdown
Contributor

@yzh119 for review, please

Copy link
Copy Markdown
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Also, heads up that we will provide the flashinfer-cubin package (see #1718)


def download_file(source, local_path, retries=3, delay=5, timeout=10, lock_timeout=30):
def download_file(
source, local_path, retries=3, delay=5, timeout=10, lock_timeout=30, session=None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to also document session

@yzh119 yzh119 changed the title Fix flashinfer download-cubin bugfix: Fix flashinfer download-cubin Sep 19, 2025
@yzh119 yzh119 merged commit 1e95001 into flashinfer-ai:main Sep 19, 2025
2 checks passed
@EmilienM
Copy link
Copy Markdown
Contributor

@yzh119 when you plan to tag a new release? Thank you for your reviews and help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flashinfer download-cubin broken: 'module' object is not callable. Did you mean: 'tqdm.tqdm(...)'?

3 participants