Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets upload #149

Merged
merged 9 commits into from
Jul 31, 2024
Merged

Datasets upload #149

merged 9 commits into from
Jul 31, 2024

Conversation

rosbo
Copy link
Contributor

@rosbo rosbo commented Jul 31, 2024

Created from @jeward414's PR (#138) and added some changes.

b/348364344

@rosbo rosbo requested a review from jeward414 July 31, 2024 17:54
try:
api_client = KaggleApiV1Client()
api_client.post(
f"/dataset/{owner_slug}/{dataset_slug}/delete",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a typo in the path for this handler. It uses dataset instead of datasets: https://github.com/Kaggle/kaggleazure/blob/d707a9773ff13d87e12400ed5f72ba0f8286f4e7/Kaggle.Sdk/datasets/dataset_api_service.proto#L136

This is unfortunate, we can't change it without breaking older libraries. But we could add an additional binding to support both dataset/ and datasets/ path. (see example: https://github.com/Kaggle/kaggleazure/blob/d707a9773ff13d87e12400ed5f72ba0f8286f4e7/Kaggle.Sdk/datasets/dataset_api_service.proto#L72-L74)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add the additional path after this goes in

if not credentials:
self.fail("Make sure to set your Kaggle credentials before running the tests")
self.owner_slug = credentials.username
self.dataset_slug = f"dataset-{self.dataset_uuid}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeward414 The issue was that the dataset_slug included an underscore before and underscore are not allowed in dataset slugs. They were normalized to a - instead. Which is why the creation worked, but then the next version failed because the real slug had a - instead of a _.

with TemporaryDirectory() as temp_dir:
# Create new folder within temp_dir
inner_folder_path = Path(temp_dir) / "inner_folder"
inner_folder_path.mkdir()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed an issue where the directory was touched (.touch()) instead of created (.mkdir()).

if not quiet:
logger.info("Upload successful: " + file_path + " (" + File.get_size(content_length) + ")")
return token


def normalize_patterns(*, default: List[str], additional: Optional[Union[List[str], str]]) -> List[str]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this from models_helpers to reuse in dataset context.

handle: (string) the dataset handle.
local_dataset_dir: (string) path to a file in a local directory.
version_notes: (string) Optional to write dataset versions.
ignore_patterns (str or List[str], optional):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for ignore_patterns which Li added in #147

Copy link
Contributor

@jeward414 jeward414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jeward414 jeward414 merged commit 0fd2c94 into main Jul 31, 2024
7 checks passed
@jeward414 jeward414 deleted the datasets_upload branch July 31, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants