Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continue existing multi-part upload #4961

Closed
waynr opened this issue Oct 19, 2023 · 6 comments · Fixed by #4971
Closed

continue existing multi-part upload #4961

waynr opened this issue Oct 19, 2023 · 6 comments · Fixed by #4971
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface

Comments

@waynr
Copy link

waynr commented Oct 19, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I am implementing an HTTP API that supports chunked uploads of data across multiple HTTP requests. In order to support this I need to be able to continue a pre-established multipart upload.

It's not clear to me either from the documentation or from looking at the AWS implementation of ObjectStore.put_multipart that this would be possible since it appears to create a new multipart upload each time. I am leaning toward "eh, probably not possible".

Describe the solution you'd like

I would like some API for an ObjectStore to get a PutPart for an existing multipart upload given that multipart upload's MultipartId.

The implementation could be as simple as providing a constructor method on S3MultiPartUpload. But that would break the trait-level API boundary, making this approach only applicable to the AWS implementation of ObjectStore so I imagine it would be best to add a new ObjectStore trait method.

Describe alternatives you've considered

Right now my project has its own ObjectStore trait that is a bit simpler and catered specifically to my needs (also a bit easier to understand since it uses AWS's own SDK library). I'll probably continue using this for the foreseeable future, but would prefer to use the established object_store crate if possible.

Additional context

@waynr waynr added the enhancement Any new improvement worthy of a entry in the changelog label Oct 19, 2023
@waynr waynr changed the title continue existing multi-part uploade continue existing multi-part upload Oct 19, 2023
@tustvold
Copy link
Contributor

The challenge is many stores have restrictions on part sizes, with some forcing all but the last to be larger than a given size, and in some cases all having the same fixed size. This means you can't easily flush the writer and then pick it up again.

Is there a particular reason you can't cache the AsyncWrite in say a HashMap and use it across multiple requests that way?

@waynr
Copy link
Author

waynr commented Oct 19, 2023

Is there a particular reason you can't cache the AsyncWrite in say a HashMap and use it across multiple requests that way?

Yeah, the service I'm writing is meant to be stateless with multiple instances running behind loadbalancers and no guarantee of affinity between client and backend. With my current implementation I insert session information (including upload id and most recent chunk number) to a database table at the end of a given request, re-load it on subsequent requests from the same client, and continue the multipart upload using that.

The challenge is many stores have restrictions on part sizes, with some forcing all but the last to be larger than a given size, and in some cases all having the same fixed size. This means you can't easily flush the writer and then pick it up again.

The spec I'm implementing allows the service to inform clients using an http response header what the minimum chunk size is, which provides a reasonable level of assurance that all chunks will meet the minimum size requirement. So for my use case it's up to my code that would be using object_store::ObjectStore to reject requests that don't meet the minimum size requirement (likewise with anyone else using a hypothetical extension that allows continuation of a multipart upload).

I imagine this should be compatible with ObjectStore as long as implementations aren't eagerly breaking down the data they receive along minimum part size and accidentally leaving an incomplete / too-small part and on the object store they are targeting when there are potentially more parts to come. If they are doing that as an implementation detail of their AsyncWrite impls then maybe I'm just barking up the wrong tree here 🙃

@tustvold
Copy link
Contributor

If they are doing that as an implementation detail of their AsyncWrite impls then maybe I'm just barking up the wrong tree here

Unfortunately that is precisely what https://docs.rs/object_store/latest/object_store/multipart/struct.WriteMultiPart.html does...

I suppose we could expose https://docs.rs/object_store/latest/object_store/multipart/trait.PutPart.html, the use of AsyncWrite has always struck me as a bit peculiar 🤔

@waynr
Copy link
Author

waynr commented Oct 19, 2023

// While S3 and Minio support variable part sizes, R2 requires they all be exactly the same size.

Ah, I see why it would have to be implemented that way...I guess R2 wouldn't work for implementing the spec I'm working on the way I want to do it since there is no way to tell the client the exact size the parts need to be 🤔

@jacksonrnewhouse
Copy link

This is very similar in use-case and ask to #4608, which I opened a few months ago to address this. Now that PutPart has been made public, that gives a slightly cleaner API. Currently upgrading my fork to 0.7, but just wanted to 👍 exposing direct access to multipart upload APIs.

tustvold added a commit to tustvold/arrow-rs that referenced this issue Oct 21, 2023
alamb pushed a commit that referenced this issue Oct 25, 2023
* Add MultiPartStore (#4961) (#4608)

* Parse CompleteMultipartUploadResult (#4965)

* More docs

* Add integration test

* Fix azure

* More docs

* Don't gate multipart behind feature flag
@tustvold tustvold added the object-store Object Store Interface label Nov 2, 2023
@tustvold
Copy link
Contributor

tustvold commented Nov 2, 2023

label_issue.py automatically added labels {'object-store'} from #4971

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants