-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
huggingface-cli upload - Validate README.md before file hashing #2452
Conversation
9784fbf
to
27e6483
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hlky, thanks for the PR and detailed description. I've left a couple comments to address before merging. The two main things are:
- I'd prefer to separate the "yaml validation" from the "list and filter files" logic
- we have to validate the yaml in
create_commit
in any case (not a problem to do it twice ifupload_folder
has been called)
src/huggingface_hub/hf_api.py
Outdated
for relpath in filtered_repo_objects: | ||
if relpath == "README.md": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then, instead of the for ... in ...: if ... == "README.md": .... ... ... break
logic, could you use relpath_to_abspath
instead like this:
if "README.md" in filtered_repo_objects:
self._validate_yaml(
content=relpath_to_abspath["README.md"].read_text(),
repo_type=repo_type,
token=token,
)
This way, it's not even needed to build the CommitOperationAdd
just to read the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, much better to use read_text
from pathlib and avoid the for loop.
397f9d4
to
dfdf1e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt changes @hlky! Except for the docstring, I think we are close to merge this PR :) I'll also trigger the CI to check nothing broke.
src/huggingface_hub/hf_api.py
Outdated
headers = self._build_hf_headers(token=token) | ||
|
||
response = get_session().post( | ||
f"{ENDPOINT}/api/validate-yaml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"{ENDPOINT}/api/validate-yaml", | |
f"{self.endpoint}/api/validate-yaml", |
I just realized we must use self.endpoint
(configurable by user) rather than ENDPOINT
here! It was a mistake in existing implementation but better to fix it as part of this PR :)
src/huggingface_hub/hf_api.py
Outdated
def _validate_yaml(self, content: str, *, repo_type: Optional[str] = None, token: Union[bool, str, None] = None): | ||
headers = self._build_hf_headers(token=token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a small docstring would be great yes. Thanks in advance!
dfdf1e5
to
21af425
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
21af425
to
84921cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clear docstring! Looks ready to be merged! I'll just wait for the CI to pass before doing so :)
The only other thing I can think of is that there will be duplicate calls to the endpoint with |
Yeah, I thought the same as well - thanks for raising the question. Honestly, I don't think the extra overhead is really problematic. We could also have a small |
The fastai integration test failure appears to be because import requests
content = '---\ntags:\n- fastai\n---\n\n# Amazing!\n\n🥳 Congratulations on hosting your fastai model on the Hugging Face Hub!\n\...n## Intended uses & limitations\nMore information needed\n\n## Training and evaluation data\nMore information needed\n'
url = "https://hub-ci.huggingface.co/api/validate-yaml"
r = requests.post(url, {"content": content, "repoType": None}) # {'error': '"value" does not match any of the allowed types'}
r = requests.post(url, {"content": content, "repoType": "model"}) # {'errors': [], 'warnings': []} |
84921cf
to
46d98b9
Compare
Thanks for looking into it. Yes, I would simply duplicate this line in repo_type = repo_type if repo_type is not None else REPO_TYPE_MODEL |
src/huggingface_hub/hf_api.py
Outdated
@@ -4849,6 +4833,7 @@ def upload_folder( | |||
|
|||
``` | |||
""" | |||
repo_type = repo_type if repo_type is not None else REPO_TYPE_MODEL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this to _validate_yaml
directly to prevent errors in the future in case we use it somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've moved it to _validate_yaml
.
46d98b9
to
641b393
Compare
Perfect! Thanks for the iterations on this @hlky! Failing tests are unrelated to this PR so we are good to merge now! 🎉 |
README.md
validation is moved fromcreate_commit
to_prepare_upload_folder_additions
._prepare_upload_folder_additions
is moved intoHfApi
class, this provides access to_build_hf_headers
.repo_type
andtoken
are added to_prepare_upload_folder_additions
's parameters, these are needed for the request to/api/validate-yaml
endpoint.filter_repo_objects
stored infiltered_repo_objects
(converted to a list, as it's a generator) to avoid recalculation as it is now used to findREADME.md
then later to return theCommitOperationAdd
s.CommitOperationAdd
is created forREADME.md
to provideas_file
.break
added afterREADME.md
validation to avoid iterating other files.Closes #2451