Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook Output Download versioning #206

Merged
merged 16 commits into from
Jan 14, 2025
Merged

Notebook Output Download versioning #206

merged 16 commits into from
Jan 14, 2025

Conversation

jeward414
Copy link
Contributor

@jeward414 jeward414 commented Jan 6, 2025

b/384702998

also enables the unit test checks for the kaggle cache resolver -> b/381119949

Copy link
Contributor

@dster2 dster2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, thanks Jonathan!

@@ -330,6 +342,9 @@ def _build_list_model_instance_version_files_url_path(h: ModelHandle) -> str:
def _build_get_dataset_url_path(h: DatasetHandle) -> str:
return f"datasets/view/{h.owner}/{h.dataset}"

def _build_get_notebook_url_path(h: NotebookHandle) -> str:
return f"kernels/output/download/{h.owner}/{h.notebook}?version_number={h.version}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Ack that you're copying existing datasets code, but this is slightly fragile in that it's silently assuming the optional field h.version is set, which is true for now but could lead to tricky bugs later. We could put a check + raise here (and for datasets) to make that assumption explicit. Or make the code robust to handle unset h.version.

Copy link
Contributor Author

@jeward414 jeward414 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is set whenever it checks what the current version is if a specific version isn't provided. However, I do understand that it could be trickier down the line

@jeward414 jeward414 marked this pull request as ready for review January 10, 2025 15:48
@jeward414 jeward414 requested a review from dster2 January 14, 2025 01:30
Copy link
Contributor

@dster2 dster2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add this code here

if h.is_versioned():
    kernel_ref["VersionNumber"] = str(h.version)

And you might need a few more unit / integration tests for that case? 😬 Hopefully just remove the skips here?

But overall LGTM pending the comments, thanks for your hard work on this!


total_size = 0
if not isinstance(resource_handle, NotebookHandle):
total_size = int(response.headers["Content-Length"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it seems to work, this approach feels a bit fragile to me. Also I think _download_file might behave strangely, e.g. rendering tqdm progress bar with total=0. IMO it'd be stronger to build more generalized handling of supporting both scenarios. So we'd skip the NotebookHandle check and set

total_size = int(response.headers["Content-Length"]) if "Content-Length" in response.headers else None

(or similar) Then we'd add a if _is_resumable(response) and total_size to current line 176 to skip the resuming logic which currently wants (though may not need) total_size (and the kernel output zip file fails _is_resumable check anyway). And then update _download_file to be a bit more graceful with the total_size is None case, probably just skipping tqdm entirely?

api_client.download_file(url_path, archive_path, h)

# Create the directory to extract the archive to.
os.makedirs(out_path, exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be nicer if _extract_archive handled this internally... but again fine to punt :)

@jeward414 jeward414 changed the title Notebook Output Download versioning draft Notebook Output Download versioning Jan 14, 2025
@jeward414 jeward414 merged commit 9dffcc2 into main Jan 14, 2025
6 checks passed
@jeward414 jeward414 deleted the n_o_draft branch January 14, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants