Skip to content

[CI] Fix LoRA downloading issues and respect offline flag#15813

Merged
mickqian merged 19 commits intosgl-project:mainfrom
Prozac614:fix/hf_download
Dec 30, 2025
Merged

[CI] Fix LoRA downloading issues and respect offline flag#15813
mickqian merged 19 commits intosgl-project:mainfrom
Prozac614:fix/hf_download

Conversation

@Prozac614
Copy link
Copy Markdown
Contributor

@Prozac614 Prozac614 commented Dec 25, 2025

Motivation

Currently, the maybe_download_model utility in hf_diffusers_utils.py fails when downloading LoRA adapters. This is because it enforces a strict verification check (expecting transformer/ and vae/ directories) which LoRAs do not possess. This leads to false positives where the code assumes the download failed and triggers a forced re-download, often causing ConnectionErrors in CI environments.

Modifications

CI fix: Fixed the cache problem for CI pip/uv .

Robust HF Downloads:

Increased retry limits (from 3 to 5) and extended etag_timeout to handle handshake delays better.

Refined snapshot_download to specifically target *.safetensors and *.json for LoRAs using allow_patterns, significantly reducing request volume.

LoRA Logic Fix: Skipped strict structural verification (transformer/vae folder checks) for LoRA models, as they follow a different file structure.

OOM Retry: Implemented a try-catch block for torch.cuda.OutOfMemoryError, allowing the system to attempt a retry (potentially after clearing cache) instead of crashing immediately.

Accuracy Tests

Benchmarking and Profiling

Checklist

@github-actions github-actions bot added the diffusion SGLang Diffusion label Dec 25, 2025
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Prozac614, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the model downloading utility to be more robust and flexible, particularly for non-standard Hugging Face models like LoRA adapters. It addresses previous failures caused by overly strict verification checks and enhances network resilience through an improved retry mechanism, ensuring smoother model acquisition in various environments, including CI.

Highlights

  • LoRA Downloading Fix: The maybe_download_model utility no longer enforces strict directory verification (e.g., transformer/ or vae/), resolving issues with downloading LoRA adapters, ControlNets, and other Hugging Face components that do not conform to standard pipeline structures.
  • Improved Retry Mechanism: The model download process now includes an enhanced retry mechanism with exponential backoff (up to 3 attempts) to gracefully handle network flakiness and transient ConnectionErrors.
  • Cache-First Strategy: The download logic prioritizes checking the local Hugging Face cache first, using local_files_only=True, before attempting a network download, improving efficiency and reducing unnecessary network requests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the maybe_download_model function to enhance its robustness and error handling when loading models. The changes introduce a cache-first strategy to check for locally available Hugging Face models, followed by a download strategy that includes a retry mechanism for network-related failures. It also adds specific exception handling for RepositoryNotFoundError, RevisionNotFoundError, and RequestException during downloads, while removing the previous _verify_model_complete function. A review comment points out that a raise ValueError statement at the end of the function is unreachable, as the preceding retry loop will always exit either by returning a value or raising an exception.

f"Could not find model at {model_name_or_path} and failed to download from HF Hub: {e}"
) from e

raise ValueError(f"Failed to load model {model_name_or_path}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This raise ValueError statement appears to be unreachable. The preceding for loop will always exit via a return on success or a raise on failure within one of its iterations. The loop cannot complete normally for this line to be executed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mickqian
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Dec 26, 2025
…to simplify local model verification and enhance error handling during download process.
…ng ValueError when model is not found in cache and download is disabled, improving robustness of the model downloading process in hf_diffusers_utils.py.
… support LoRA models by adding an is_lora parameter, enhancing local model verification and download process.
@Prozac614 Prozac614 changed the title [CI] Fix LoRA downloading issues and respect offline flag [WIP][CI] Fix LoRA downloading issues and respect offline flag Dec 28, 2025
…low_patterns, increasing MAX_RETRIES to 5, and improving download parameters for better performance.
@Prozac614 Prozac614 changed the title [WIP][CI] Fix LoRA downloading issues and respect offline flag [CI] Fix LoRA downloading issues and respect offline flag Dec 29, 2025
@mickqian mickqian merged commit f253f43 into sgl-project:main Dec 30, 2025
248 of 279 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants