Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cache for downloaded models. #8

Merged
merged 5 commits into from
Oct 19, 2023
Merged

Implement cache for downloaded models. #8

merged 5 commits into from
Oct 19, 2023

Conversation

rosbo
Copy link
Contributor

@rosbo rosbo commented Oct 18, 2023

Default cache directory is ~/.cache/kagglehub.
Can be overriden globally by KAGGLEHUB_CACHE env variable. Can be overriden for a single call using the cache_dir parameter.

Also include logic for parsing the model handle.

Next: Implement downloading file on cache miss.

http://b/305947384

@rosbo rosbo requested a review from Philmod October 18, 2023 20:53
@rosbo
Copy link
Contributor Author

rosbo commented Oct 18, 2023

It is failing b/c I am using the EnvironmentVarGuard in the test package.

However, the test package is intended only for internal Python use and is stripped from the Docker python image we use:
docker-library/python#277 (comment)

I will find another way to set the environment variable cleanly in tests.

rosbo added 3 commits October 18, 2023 21:18
Default cache directory is ~/.cache/kagglehub.
Can be overriden globally by KAGGLEHUB_CACHE env variable.
Can be overriden for a single call using the `cache_dir` parameter.

Also include logic for parsing the model handle.

Next: Implement downloading file on cache miss.

http://b/305949898

def _install_resolvers():
Copy link
Contributor Author

@rosbo rosbo Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this logic directly to __init__.py to avoid cyclical dependencies.


DEFAULT_CACHE_FOLDER = os.path.join(Path.home(), ".cache", "kagglehub")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible that this .cache directory doesn't exist? Should we check and create it otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will likely not exist. My thinking was that the caller (here, the HttpResolver) would call "os.makedirs()" with the path to the model before copying the file which will create all intermediaries folder.

@rosbo rosbo merged commit 7eba9b8 into main Oct 19, 2023
@rosbo rosbo deleted the cache branch October 19, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants