Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use version number in notebook cache directories #212

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

dster2
Copy link
Contributor

@dster2 dster2 commented Jan 22, 2025

Despite the fact that neither Datasets nor Notebooks datasources currently support multi-versioning when attached to a Kaggle Notebook Session, our existing Dataset support in HTTP resolving does use a cache dir with the version number so that multiple versions are supported in that mode. Here we update the recent Notebook support to match that behavior.

Some minor tangentially related cleanup for consistency reasons too.

http://b/384702998

@dster2 dster2 requested a review from jeward414 January 22, 2025 02:28
@neshdev neshdev self-requested a review January 22, 2025 02:43
Copy link
Member

@neshdev neshdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you provide testing evidence? It seems that the integration should cover it. Normally, we at least have 1 notebook on kaggle to ensure everything is working as expected and we record it in a bug.

@dster2
Copy link
Contributor Author

dster2 commented Jan 22, 2025

@neshdev Thanks for the review, could you please clarify what you'd like added? I added an integration test which should cover the support added here, as I think you noted, but then I'm not sure what you're looking for?

Indeed I forgot the bug, added that.

@neshdev
Copy link
Member

neshdev commented Jan 22, 2025

You can take a look at this PR. Basically, just a kaggle notebook with kagglehub installed from from current branch.

This is an example: b/351825012, but maybe overkill.

@dster2
Copy link
Contributor Author

dster2 commented Jan 22, 2025

Ah very cool, thanks. Sorry for archiving that bug, I made some mistakes on Taskflow 😅

But unfortunately the functionality added here does NOT work within Kaggle, the Kaggle Cache does not support multiple simultaneous versions of the same Notebook (or Dataset) currently (only for Models) though we would like to add that in the future. Nonetheless we do support multi-versioning with the HTTP Resolver for Datasets, and I thought Notebooks should match that behavior (and it's a nice touch for Packages).

@neshdev
Copy link
Member

neshdev commented Jan 22, 2025

I see. We can review more during our knowledge sharing session. LGTM if its working as expected for you.

Copy link
Contributor

@jeward414 jeward414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dster2 dster2 merged commit b7e3d58 into main Jan 22, 2025
6 checks passed
@dster2 dster2 deleted the notebook-version-cache branch January 22, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants