Fix memory exhaustion when downloading large files #869

SiqiChen9 · 2025-11-24T14:40:31Z

Description

This PR fixes issue #754 where the Kaggle API would exhaust system memory when downloading large datasets.

Problem

The HTTP client was not using stream=True when making requests for file downloads. This caused the entire file content to be loaded into memory before being written to disk, making the system unstable when downloading large datasets.

Solution

Modified KaggleHttpClient.call() to detect file download response types (FileDownload and HttpRedirect)
Automatically enable streaming (stream=True) for these response types
The existing download_file() method already uses response.iter_content() for chunked reading, which now works properly with streaming enabled

Changes

src/kagglesdk/kaggle_http_client.py
- Added imports for FileDownload and HttpRedirect types
- Added logic to set stream=True in request settings for file downloads
pyproject.toml
- Removed kagglesdk from dependencies list
- Reason: kagglesdk source code is in src/kagglesdk/ and should use local code during editable install, not pull from PyPI

Impact

This fix improves memory usage for:

Competition file downloads (competition_download_file, competition_download_files)
Dataset downloads (dataset_download_file, dataset_download_files)
Model downloads (model_instance_version_download)
Kernel output downloads (kernels_output)
Leaderboard downloads (competition_leaderboard_download)

Testing

✅ Tested CLI: kaggle competitions list works correctly
✅ Verified streaming is enabled for file download requests
✅ Backward compatible - no changes to public API
Fixes Shouldn't API client pass stream = True to the requests when downloading datasets? #754

google-cla · 2025-11-24T14:40:36Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

stevemessick · 2025-11-24T16:32:59Z

@SiqiChen9 Thanks for the PR! Could you sign the Google CLA so I can merge it?

BTW why did you remove kagglesdk from the .toml file?

Enable streaming for file downloads by passing stream=True to requests. This prevents loading entire files into memory when downloading datasets, competitions, models, and kernel outputs. Fixes Kaggle#754

SiqiChen9 · 2025-11-24T16:59:52Z

@stevemessick Hi! Thanks for the quick response! I've signed the Google CLA.

Regarding the kagglesdk removal from [pyproject.toml] - you're absolutely right, that change should not be included in this PR. I apologize for the confusion.

I encountered this during my local development setup:

When running editable install (pip install -e .), I got an import error because PyPI's kagglesdk version was outdated compared to the local src/kagglesdk code
I removed it from dependencies to force using local code, but this was only relevant to my development environment

I've reverted the pyproject.toml change. The PR now only includes the streaming fix to [src/kagglesdk/kaggle_http_client.py].

Updated!

stevemessick

Thanks!

SiqiChen9 mentioned this pull request Nov 24, 2025

Shouldn't API client pass stream = True to the requests when downloading datasets? #754

Closed

Fix memory exhaustion when downloading large files

26cbc1a

Enable streaming for file downloads by passing stream=True to requests. This prevents loading entire files into memory when downloading datasets, competitions, models, and kernel outputs. Fixes Kaggle#754

SiqiChen9 force-pushed the fix-stream-download branch from 15e2ab4 to 26cbc1a Compare November 24, 2025 16:58

stevemessick approved these changes Nov 24, 2025

View reviewed changes

stevemessick merged commit 6147a8e into Kaggle:main Nov 24, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix memory exhaustion when downloading large files #869

Fix memory exhaustion when downloading large files #869

Uh oh!

SiqiChen9 commented Nov 24, 2025

Uh oh!

google-cla bot commented Nov 24, 2025

Uh oh!

stevemessick commented Nov 24, 2025

Uh oh!

SiqiChen9 commented Nov 24, 2025

Uh oh!

stevemessick left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix memory exhaustion when downloading large files #869

Fix memory exhaustion when downloading large files #869

Uh oh!

Conversation

SiqiChen9 commented Nov 24, 2025

Description

Problem

Solution

Changes

Impact

Testing

Uh oh!

google-cla bot commented Nov 24, 2025

Uh oh!

stevemessick commented Nov 24, 2025

Uh oh!

SiqiChen9 commented Nov 24, 2025

Uh oh!

stevemessick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants