Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to improve the printing for cached downloads from kaggle/colab #142

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

mattdangerw
Copy link
Contributor

@mattdangerw mattdangerw commented Jul 12, 2024

This is an attempt to fix this issue when running kagglehub from a cache.

image

I don't think we want to log as if we are attaching an asset every time we read something from the cache.

  • We have separate UI in Kaggle with a live progress indicate showing how mounting a model is going.
  • We might hit kagglehub for the same asset multiple times, it shouldn't "attach" each time (nor does it).
  • Ideally we don't log anything if we don't find the file being requested.

@mattdangerw
Copy link
Contributor Author

Alternately we could figure out a way to only log each asset once, if it is found in the cache. That seems slightly misleading, as IIUC the whole directory is just mounted once, not asset by asset. And it would involve some state to track which call is the first download call for each asset.

@@ -65,8 +65,6 @@ def __call__(self, h: ModelHandle, path: Optional[str] = None, *, force_download
if not os.path.exists(cached_path):
# Only print this if the model is not already mounted.
logger.info(f"Mounting files to {cached_path}...", extra={**EXTRA_CONSOLE_BLOCK})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neshdev, When you worked on reducing log spam, did you mean to add the EXTRA_CONSOLE_BLOCK to line #69 instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - It was only supposed to log if the cache is not found. I must have flipped the branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw Can you remove the extra={**EXTRA_CONSOLE_BLOCK} at line 67 and add it at line 69 instead. No need to delete line 69.

This means the log will be logged to a file but not shown to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw, whenever you get a chance to implement the change ^^, we can include it in our next release: #143

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! I think. take a look.

@rosbo
Copy link
Contributor

rosbo commented Jul 16, 2024

/gcbrun

@rosbo rosbo mentioned this pull request Jul 16, 2024
@rosbo
Copy link
Contributor

rosbo commented Jul 16, 2024

/gcbrun

@rosbo rosbo merged commit 92da387 into Kaggle:main Jul 16, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants