Skip to content

[CLI] Better CLI errors formatting#3889

Merged
Wauplin merged 5 commits intomainfrom
better-cli-errors
Mar 6, 2026
Merged

[CLI] Better CLI errors formatting#3889
Wauplin merged 5 commits intomainfrom
better-cli-errors

Conversation

@Wauplin
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin commented Mar 5, 2026

Add repo_id, repo_type, bucket_id, etc. wherever possible when an error is raised in the CLI and printed gracefully.

Mostly human generated except the tests.


Note

Medium Risk
Touches hf_raise_for_status and error class definitions used across the library, so mistakes in URL parsing/attribute assignment could subtly change raised error details. The behavioral changes are mostly additive (better messages/metadata) and are covered by new unit tests.

Overview
Improves CLI-facing error output by replacing generic lambda formatters with dedicated helpers that include repo_type, repo_id, bucket_id, and (for missing entries) the request URL when available.

Enriches hf_raise_for_status by parsing repo/bucket identifiers from request URLs and attaching them to raised RepositoryNotFoundError, GatedRepoError, RevisionNotFoundError, RemoteEntryNotFoundError, and BucketNotFoundError instances; also adds these optional attributes to the corresponding error classes.

Adds focused tests for the new CLI formatters and for URL parsing helpers (_parse_repo_info_from_url, _parse_bucket_id_from_url) to validate the new messaging/context behavior.

Written by Cursor Bugbot for commit 9911947. This will update automatically on new commits. Configure here.

@Wauplin Wauplin requested a review from hanouticelina March 5, 2026 16:27
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread src/huggingface_hub/cli/_errors.py
cursoragent and others added 3 commits March 5, 2026 18:23
In _format_entry_not_found, the label appeared mid-sentence ('File not
found in Dataset ...') but was capitalized. Apply the rule consistently:
capitalize when starting a sentence, lowercase otherwise.

Update tests to match the corrected casing.

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
Use TypeVar to make _format() return the specific HfHubHTTPError
subclass type, so mypy recognizes attributes like repo_type, repo_id,
and bucket_id on the returned error objects.

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
…r_status

Mypy narrows the type of 'err' on the first assignment and then
rejects reassignments to different subclass types in later elif
branches. Use unique names (revision_err, entry_err, gated_err,
bucket_err, repo_err) to avoid cross-branch type conflicts.

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
@bot-ci-comment
Copy link
Copy Markdown

bot-ci-comment Bot commented Mar 5, 2026

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines +174 to +180
_REPO_ID_FROM_URL_REGEX = re.compile(r"^https?://[^/]+/api/(models|datasets|spaces)/([^/]+)(?:/([^/]+))?")

# Regex to extract bucket_id (namespace/name) from bucket API URLs.
_BUCKET_ID_FROM_URL_REGEX = re.compile(r"^https?://[^/]+/api/buckets/([^/]+/[^/]+)")

# Sub-paths that follow a repo_id in API URLs (not part of the repo name).
_REPO_URL_SUBPATHS = {"resolve", "tree", "blob", "raw", "refs", "commit", "discussions", "settings", "revision"}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: not 100% bullet-proof but parsing doesn't have to be perfect (it's just a convenience field for better errors)

Copy link
Copy Markdown
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! this is a real and a very nice UX improvement

@Wauplin Wauplin merged commit 098091f into main Mar 6, 2026
20 of 22 checks passed
@Wauplin Wauplin deleted the better-cli-errors branch March 6, 2026 13:30
yg7445 added a commit to yg7445/huggingface_hub that referenced this pull request Apr 10, 2026
Commit 098091f ("huggingface#3889") changed hf_raise_for_status() from inline
raises to storing exceptions in local variables before raising:

    entry_err = _format(RemoteEntryNotFoundError, message, response)
    entry_err.repo_type = repo_type
    raise entry_err from e

This creates a CPython reference cycle: entry_err.__cause__ -> e, and
e.__traceback__ -> frame -> f_locals['entry_err'] -> entry_err. The
cycle prevents the exception from being freed when except blocks exit.

When this exception propagates through callers (e.g. transformers'
cached_files -> LLM.__init__), the traceback chain holds a reference
to `self`, preventing refcount-based cleanup. In vllm, this means
`del llm` doesn't trigger the weakref finalizer that sends SIGTERM
to the EngineCore subprocess, so GPU memory is never released.

Fix by moving repo_type/repo_id/bucket_id assignment into helper
functions (_format_with_repo_info, _format_with_bucket_info) so the
exception is never stored as a local in hf_raise_for_status.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants