-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for auto-(de)compression #195
Conversation
Notes: - This is follow on work from #194 - Also added some integration tests - Note that not all stubs have been moved to this redirect path. Only the paths that could have compressed URLs in prod have been affected (Comps and Datasets). We can move all download paths to use this if we'd like. http://b/379756505
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move all download paths to use this if we'd like.
Given we have also integration tests that test it e2e. Probably for models to not have the GCS redirect in our unit tests given it never auto-compresses files.
def test_auto_decompress_file(self) -> None: | ||
with create_test_cache(): | ||
# sample_submission.csv is an auto-compressed CSV with the following columns | ||
expected_columns = ["TransactionId", "isFraud"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding integration tests with deeper assertion to ensure we don't regress on this!
def assert_files(test_case: unittest.TestCase, path: str, expected_files: list[str]) -> bool: | ||
def list_columns(path: str) -> list[str]: | ||
"""Assuming the path is a CSV, list all columns sorted lexicographically""" | ||
with open(path) as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you can use csv.reader
from the Python standard library.
https://docs.python.org/3/library/csv.html#csv.reader
But given this is really simple, I will leave it up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think given the simplicity, I'll leave it for now. If we want more robust features later or run into issues that the library better handles, I'd be happy to switch it up.
Notes:
http://b/379756505