Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add transfer_manager.upload_chunks_concurrently using the XML MPU API #1115

Merged
merged 12 commits into from
Sep 18, 2023

Conversation

andrewsg
Copy link
Contributor

@andrewsg andrewsg commented Sep 2, 2023

This adds support for XML MPU multiprocess uploads to Transfer Manager.

@andrewsg andrewsg added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Sep 2, 2023
@andrewsg andrewsg requested review from a team as code owners September 2, 2023 01:01
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: storage Issues related to the googleapis/python-storage API. labels Sep 2, 2023
Copy link
Contributor

@cojenco cojenco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 🎉 have a few questions

google/cloud/storage/transfer_manager.py Outdated Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Show resolved Hide resolved
@andrewsg andrewsg removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Sep 7, 2023
@andrewsg
Copy link
Contributor Author

andrewsg commented Sep 7, 2023

Added integration tests. PTAL

google/cloud/storage/transfer_manager.py Outdated Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Outdated Show resolved Hide resolved
tests/unit/test_transfer_manager.py Show resolved Hide resolved
@andrewsg
Copy link
Contributor Author

PTAL once more; this should be the final feature set I think

filename,
blob,
content_type=None,
chunk_size=TM_DEFAULT_CHUNK_SIZE,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically MPU part_size? Are using chunk_size for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the part size. I am using chunk_size for consistency with the similar download method.

the documentation at https://cloud.google.com/storage/docs/multipart-uploads
before using this feature.

The library will attempt to cancel uploads that fail due to an exception.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link

@danielduhh danielduhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add the upload_sharded command name for metrics on in an other PR?

@andrewsg
Copy link
Contributor Author

@danielduhh Yes, actually that's already ready for review and I'll have it up as soon as this one is reviewed and merged.

Copy link
Contributor

@cojenco cojenco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just have two questions

@@ -1697,7 +1697,7 @@ def _get_writable_metadata(self):

return object_metadata

def _get_upload_arguments(self, client, content_type):
def _get_upload_arguments(self, client, content_type, filename=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just a reminder there are changes in the tm-metrics branch relevant to this private method as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, already resolved via merge in a different branch

headers = {}
# Handle standard writable metadata
for key, value in metadata.items():
if key in METADATA_HEADER_TRANSLATION:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ are there any writable metadata fields that don't require translation?

Just wanted to make sure that they are all captured in _headers_from_metadata. If not, this works!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the complete list according to the documentation. Note that custom metadata and some other features like encryption are not supported through this translation dictionary but handled differently.

@andrewsg andrewsg merged commit 56aeb87 into main Sep 18, 2023
5 checks passed
@andrewsg andrewsg deleted the xml-mpu branch September 18, 2023 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants