Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for auto-(de)compression #195

Merged
merged 5 commits into from
Dec 16, 2024
Merged

Conversation

goeffthomas
Copy link
Contributor

Notes:

  • This is follow on work from Fix auto-compressed dataset downloads #194
  • Also added some integration tests
  • Note that not all stubs have been moved to this redirect path. Only the paths that could have compressed URLs in prod have been affected (Comps and Datasets). We can move all download paths to use this if we'd like.

http://b/379756505

Notes:
- This is follow on work from #194
- Also added some integration tests
- Note that not all stubs have been moved to this redirect path. Only the paths that could have compressed URLs in prod have been affected (Comps and Datasets). We can move all download paths to use this if we'd like.

http://b/379756505
@goeffthomas goeffthomas requested review from rosbo and neshdev December 16, 2024 21:50
Copy link
Contributor

@rosbo rosbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move all download paths to use this if we'd like.

Given we have also integration tests that test it e2e. Probably for models to not have the GCS redirect in our unit tests given it never auto-compresses files.

def test_auto_decompress_file(self) -> None:
with create_test_cache():
# sample_submission.csv is an auto-compressed CSV with the following columns
expected_columns = ["TransactionId", "isFraud"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding integration tests with deeper assertion to ensure we don't regress on this!

def assert_files(test_case: unittest.TestCase, path: str, expected_files: list[str]) -> bool:
def list_columns(path: str) -> list[str]:
"""Assuming the path is a CSV, list all columns sorted lexicographically"""
with open(path) as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can use csv.reader from the Python standard library.

https://docs.python.org/3/library/csv.html#csv.reader

But given this is really simple, I will leave it up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think given the simplicity, I'll leave it for now. If we want more robust features later or run into issues that the library better handles, I'd be happy to switch it up.

@goeffthomas goeffthomas merged commit 5fdb159 into main Dec 16, 2024
6 checks passed
@goeffthomas goeffthomas deleted the add-more-decompression-tests branch December 16, 2024 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants