[ci] Simplify tests, add CI, patch paraphrase_mining_embeddings#2350
Merged
tomaarsen merged 12 commits intohuggingface:masterfrom Dec 12, 2023
Merged
[ci] Simplify tests, add CI, patch paraphrase_mining_embeddings#2350tomaarsen merged 12 commits intohuggingface:masterfrom
[ci] Simplify tests, add CI, patch paraphrase_mining_embeddings#2350tomaarsen merged 12 commits intohuggingface:masterfrom
Conversation
I'll fully deprecate Python 3.7 soon too
Ensure that i < j always holds true
They're simply too big when combined.
But now based on 100 test samples instead
These kinds of tests can be a bit flimsy, this should help
nikitajz
reviewed
Nov 30, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello!
Pull Request overview
paraphrase_mining_embeddingssuch thati < jis always true withiandjas paraphrase indices.Details
Tests
Before CI can be applied, we need to simplify the tests. In essence, this involved first verifying that the there is no regression on
masterby running the current (large) tests. Then, I decreased the number of testing samples to 100 across all tests, and the number of training samples down to 100-500, depending on the test setup.I then updated the expected scores across all pretrained models. Ideally, I think it would be best to test less of the pretrained models, as it allows us to actually cache the downloaded models, but alas, I think this should be fine too: the GitHub CI environment can usually download at ~250MB/s.
In a later PR I intend to improve the test suite in various ways. For example:
Test time seems to be between 6 and 15 minutes. The test time might improve further when #2345 is merged, in case we currently download unnecessary model files (e.g.
model.safetensorsandpytorch_model.bin).CI
I've added a simple continuous integration setup. CI is extremely important for large OS projects, and it should help simplify PRs etc. a ton. The CI caches Python dependencies, but not HF models. The reason is that the test suite simply uses too many large models that they cannot all be cached across all OS-es.
After v2.3.0 I intend to deprecate Python 3.6 and 3.7. I would have added Python 3.6 to the CI, but it did not work: The Ubuntu 22.04 GitHub CI image doesn't have Python 3.6 anymore, and for Windows it isn't possible to install a recent enough
tokenizersversion from scratch.Although 3.12 has been out for quite some time, the
torchsupport is still lacking, so that's why it was not included in the CI.Patch
paraphrase_mining_embeddingsWhen running the CI, I learned that for Python 3.10 and 3.11, on Linux only, the
paraphrase_mining_embeddingsfunction could return(score, i, j)triples wherei > j. This does not occur in other versions. I applied a simple patch to ensure thati < jalways holds, allowing the matching test to pass.Run the (fast) tests with
pytest, and the slow tests withpytest -m slow.