[RFC] Contrib test suite + tests for timm and sentence_transformers#1200
[RFC] Contrib test suite + tests for timm and sentence_transformers#1200
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
osanseviero
left a comment
There was a problem hiding this comment.
Very cool! I left some minor questions, but overall this looks to be going in the right direction! 🔥
|
@osanseviero thanks for your feedback, it helped a lot !
This way we don't have to handle conflicts between different dependencies / versions. It makes local tests a bit more complex as it requires 1 env per contrib. Hopefully not much people will really need to run those stuff locally. What I plan to do is to add a script (either separate, either in makefile) to handle all the stuff with the venvs. WDYT in general ? |
|
Refactored a bit the # Setup all virtualenvs
make contrib_setup
# Run all tests
make contrib_tests
# Setup and run all tests at once
make contrib
# Delete all virtual envs (if corrupted)
make contrib_clearAnd for a specific lib: # Setup timm tests
make contrib_setup_timm
# Run timm tests
make contrib_test_timm
# Setup and run timm tests at once
make contrib_timm
# Delete timm virtualenv
make contrib_clear_timm |
osanseviero
left a comment
There was a problem hiding this comment.
Looking neat! I want to make a second pass through this PR and also let's see if others have any thoughts
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
| workflow_dispatch: | ||
| push: | ||
| branches: | ||
| - ci_contrib_* |
There was a problem hiding this comment.
should we run this for main as well?
There was a problem hiding this comment.
I don't think so. If we want to make trigger it manually from main it's possible but if we run it all the time, the main branch could end up in a ❌ status as I expect contrib tests to fail (code can break in downstream library without a change on our side).
|
|
||
|
|
||
| @contextlib.contextmanager | ||
| def production_endpoint() -> Generator: |
There was a problem hiding this comment.
Is this really needed? With HF_ENDPOINT you could change the endpoint you're using
There was a problem hiding this comment.
Is this really needed?
I'd say yes, kinda the same as what already exists in the hfh tests/. The problem with HF_ENDPOINT is that it is evaluated only once at startup. What I want here is to make all calls to staging environment (especially pushing to repos) except some calls that have to be made to production environment (especially loading models).
Another solution could be to upload test models to staging but then we wouldn't notice if a model changed in production.
|
Thanks for the review @osanseviero. I made some changes and addresses all of your comments. |
LysandreJik
left a comment
There was a problem hiding this comment.
Looks great! I left a few comments
| contrib: [ | ||
| "sentence_transformers", | ||
| "timm", | ||
| ] |
There was a problem hiding this comment.
Great idea to use a matrix here!
There was a problem hiding this comment.
Should this matrix be governed by a ls first so that we have individual jobs for each folder under contrib and without the need to specify each folder here? (nitpick though, this should (or shouldn't) be done in a follow up PR)
contrib/README.md
Outdated
| 4. Edit `makefile` to add the lib to `CONTRIB_LIBS` variable. Example: `CONTRIB_LIBS := timm transformers` | ||
| 5. Edit `.github/workflows/contrib-tests.yml` to add the lib to `matrix.contrib` list. Example: `contrib: ["timm", "transformers"]` |
There was a problem hiding this comment.
I'd eventually look into automating these two so that it's slightly less error-prone, but as said above: nitpick
There was a problem hiding this comment.
Now done by make style and make quality. Good call to reduce contribution efforts.
|
|
||
| Contrib tests can be [manually triggered in GitHub](https://github.com/huggingface/huggingface_hub/actions) with the `Contrib tests` workflow. | ||
|
|
||
| Tests are not run in the default test suite (for each PR) as this would slow down development process. The goal is to notice breaking changes, not to avoid them. In particular, it is interesting to trigger it before a release to make sure it will not cause too much friction. |
There was a problem hiding this comment.
We could also run them once a week just to check
There was a problem hiding this comment.
Cron job added by d5949fa. Will run every week on Saturday midnight.
contrib/requirements.txt
Outdated
| pytest | ||
| pytest-env No newline at end of file |
There was a problem hiding this comment.
imo we could just require the testing extra here instead of adding another requirements.txt file.
There was a problem hiding this comment.
Good call. Made the change.
| @pytest.mark.xfail( | ||
| reason=( | ||
| "Production endpoint is hardcoded in sentence_transformers when pushing to Hub." | ||
| ) | ||
| ) | ||
| def test_push_to_hub( | ||
| multi_qa_model: SentenceTransformer, repo_name: str, cleanup_repo: None | ||
| ) -> None: | ||
| multi_qa_model.save_to_hub(repo_name) |
There was a problem hiding this comment.
Would be nice to eventually have a test that doesn't fail to ensure that save to hub actually works :) we could have a specific org for that, like skops does, but I understand it's a bit complex to setup + very annoying to have testing artifacts on the actual hub
There was a problem hiding this comment.
Would be nice to eventually have a test that doesn't fail to ensure that save to hub actually works :)
Yes, completely agree on that. I'd like to do that later. I am about to open an issue/PR on sentence transformers side to test that properly.
Worse case scenario, I have set a reminder to myself in 10 days.
| def test_push_to_hub(repo_name: str, cleanup_repo: None) -> None: | ||
| model = timm.create_model("resnet18") | ||
| timm.models.hub.push_to_hf_hub(model, repo_name) |
There was a problem hiding this comment.
Should we also test that the model pushed is according to what we expect? For example, that we can redownload it and use it once again, as this would be the usual workflow?
There was a problem hiding this comment.
(discussed offline) Decision has been taken that the contrib/ test suite purpose is only to test the deprecation warnings in downstream libraries. Testing the validity of a pushed/downloaded model is therefore out of scope here.
This can be reevaluated in the future :)
osanseviero
left a comment
There was a problem hiding this comment.
Agree with @LysandreJik feedback, other than that it LGTM! 🔥 great work
My only concern is identifying issues to 3rd party libraries close to release rather than earlier in the process, so the more often we could run this (e.g. every week sounds great), the better
Codecov ReportBase: 84.37% // Head: 84.33% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1200 +/- ##
==========================================
- Coverage 84.37% 84.33% -0.04%
==========================================
Files 44 44
Lines 4365 4355 -10
==========================================
- Hits 3683 3673 -10
Misses 682 682
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
|
@osanseviero @LysandreJik thanks for the last reviews :)
So I think we are now finally good to go 😄 🔥 |
First discussed in #1190. Goal is to detect proactively breaking changes and deprecation warnings in downstream libraries. This is a very first implementation with tests for
timmandsentence_transformerslibraries to validate the concept. I think for a start we can have a github workflow only triggeredmanually or
contrib_ci_*branches. It doesn't really make sense to run it on each PR or on main.How it works ?
The contrib folder contains simple end-to-end scripts to test integration of
huggingface_hubin downstream libraries. The main goal is to proactively notice breaking changes and deprecation warnings. Each library is tested in its own virtualenv (with its own dependencies). Here is the workflow for "timm" library:./contrib/timm/.venv./contrib/timm/requirements.txt: install timm frommainbranch. Configurable for each lib.pytest ./contrib/timmSee #1200 (comment) for more details.
How to add a new library ?
To add another contrib lib, one must:
./contrib/transformersrequirements.txtfile specific to this lib. Example./contrib/transformers/requirements.txt./contrib/transformers/test_push_to_hub.pymake styleto editmakefileand.github/workflows/contrib-tests.ymlto add the new lib.Run contrib tests in CI
Contrib tests can be manually triggered in GitHub with the
Contrib testsworkflow. CI is also triggered on branches starting byci_contrib_*Tests are not run in the default test suite (for each PR) as this would slow down development process. The goal is to notice breaking changes, not to avoid them. In particular, it is interesting to trigger it before a release to make sure it will not cause too much friction.
Run contrib tests locally
Tests are separated to avoid conflicts between version dependencies. Before running tests, a virtual env must be setup for each contrib library. To do so, run:
See #1200 (comment) for more details.
Todo: