Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix auto-compressed dataset downloads #194

Merged
merged 4 commits into from
Dec 16, 2024

Conversation

goeffthomas
Copy link
Contributor

Notes and other fixes:

  • Previously, we needed to unzip the file in the unit test because it was actually fetching the archive. This is because our DatasetHttpResolver doesn't put the filename in the path (it passes it in a query string). Rather than change the implementation in the resolver, I moved filename handling to the shared stubbed route. cc @jeward414
  • Added mocking for downloads to get 302 redirected (as it does in prod to GCS). This was needed to get response.url to be accurate in clients.py, which drives the auto-decompression feature.
  • Added other files to support the auto-(de)compression logic.
  • In the interest of keeping this PR small, a follow up PR will add tests for the Models and Competitions HTTP tests. That may necessitate moving some routes/etc. around to be shared beyond Datasets.

Fixes #187
http://b/379756505

Notes and other fixes:
- Previously, we needed to unzip the file in the unit test because it was actually fetching the archive. This is because our `DatasetHttpResolver` doesn't put the filename in the path (it passes it in a query string). Rather than change the implementation in the resolver, I moved filename handling to the shared stubbed route.
- Added mocking for downloads to get 302 redirected (as it does in prod to GCS). This was needed to get `response.url` to be accurate in `clients.py`, which drives the auto-decompression feature.
- Added other files to support the auto-(de)compression logic.
- In the interest of keeping this PR small, a follow up PR will add tests for the Models and Competitions HTTP tests. That may necessitate moving some routes/etc. around to be shared beyond Datasets.

Fixes #187
http://b/379756505
@goeffthomas goeffthomas requested review from rosbo and neshdev December 16, 2024 19:23
Copy link
Contributor

@rosbo rosbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@goeffthomas goeffthomas merged commit 1bc9814 into main Dec 16, 2024
6 checks passed
@goeffthomas goeffthomas deleted the fix-auto-compressed-dataset-files branch December 16, 2024 21:21
goeffthomas added a commit that referenced this pull request Dec 16, 2024
Notes:
- This is follow on work from #194
- Also added some integration tests
- Note that not all stubs have been moved to this redirect path. Only the paths that could have compressed URLs in prod have been affected (Comps and Datasets). We can move all download paths to use this if we'd like.

http://b/379756505
goeffthomas added a commit that referenced this pull request Dec 16, 2024
Notes:
- This is follow on work from #194
- Also added some integration tests
- Note that not all stubs have been moved to this redirect path. Only
the paths that could have compressed URLs in prod have been affected
(Comps and Datasets). We can move all download paths to use this if we'd
like.

http://b/379756505
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV files are downloaded as zip when downloading as single file
2 participants