continue existing multi-part upload #4961

waynr · 2023-10-19T18:40:49Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I am implementing an HTTP API that supports chunked uploads of data across multiple HTTP requests. In order to support this I need to be able to continue a pre-established multipart upload.

It's not clear to me either from the documentation or from looking at the AWS implementation of ObjectStore.put_multipart that this would be possible since it appears to create a new multipart upload each time. I am leaning toward "eh, probably not possible".

Describe the solution you'd like

I would like some API for an ObjectStore to get a PutPart for an existing multipart upload given that multipart upload's MultipartId.

The implementation could be as simple as providing a constructor method on S3MultiPartUpload. But that would break the trait-level API boundary, making this approach only applicable to the AWS implementation of ObjectStore so I imagine it would be best to add a new ObjectStore trait method.

Describe alternatives you've considered

Right now my project has its own ObjectStore trait that is a bit simpler and catered specifically to my needs (also a bit easier to understand since it uses AWS's own SDK library). I'll probably continue using this for the foreseeable future, but would prefer to use the established object_store crate if possible.

Additional context

The text was updated successfully, but these errors were encountered:

tustvold · 2023-10-19T20:12:38Z

The challenge is many stores have restrictions on part sizes, with some forcing all but the last to be larger than a given size, and in some cases all having the same fixed size. This means you can't easily flush the writer and then pick it up again.

Is there a particular reason you can't cache the AsyncWrite in say a HashMap and use it across multiple requests that way?

waynr · 2023-10-19T20:40:08Z

Is there a particular reason you can't cache the AsyncWrite in say a HashMap and use it across multiple requests that way?

Yeah, the service I'm writing is meant to be stateless with multiple instances running behind loadbalancers and no guarantee of affinity between client and backend. With my current implementation I insert session information (including upload id and most recent chunk number) to a database table at the end of a given request, re-load it on subsequent requests from the same client, and continue the multipart upload using that.

The challenge is many stores have restrictions on part sizes, with some forcing all but the last to be larger than a given size, and in some cases all having the same fixed size. This means you can't easily flush the writer and then pick it up again.

The spec I'm implementing allows the service to inform clients using an http response header what the minimum chunk size is, which provides a reasonable level of assurance that all chunks will meet the minimum size requirement. So for my use case it's up to my code that would be using object_store::ObjectStore to reject requests that don't meet the minimum size requirement (likewise with anyone else using a hypothetical extension that allows continuation of a multipart upload).

I imagine this should be compatible with ObjectStore as long as implementations aren't eagerly breaking down the data they receive along minimum part size and accidentally leaving an incomplete / too-small part and on the object store they are targeting when there are potentially more parts to come. If they are doing that as an implementation detail of their AsyncWrite impls then maybe I'm just barking up the wrong tree here 🙃

tustvold · 2023-10-19T20:43:13Z

If they are doing that as an implementation detail of their AsyncWrite impls then maybe I'm just barking up the wrong tree here

Unfortunately that is precisely what https://docs.rs/object_store/latest/object_store/multipart/struct.WriteMultiPart.html does...

I suppose we could expose https://docs.rs/object_store/latest/object_store/multipart/trait.PutPart.html, the use of AsyncWrite has always struck me as a bit peculiar 🤔

waynr · 2023-10-19T20:47:08Z

// While S3 and Minio support variable part sizes, R2 requires they all be exactly the same size.

Ah, I see why it would have to be implemented that way...I guess R2 wouldn't work for implementing the spec I'm working on the way I want to do it since there is no way to tell the client the exact size the parts need to be 🤔

jacksonrnewhouse · 2023-10-21T00:56:55Z

This is very similar in use-case and ask to #4608, which I opened a few months ago to address this. Now that PutPart has been made public, that gives a slightly cleaner API. Currently upgrading my fork to 0.7, but just wanted to 👍 exposing direct access to multipart upload APIs.

* Add MultiPartStore (#4961) (#4608) * Parse CompleteMultipartUploadResult (#4965) * More docs * Add integration test * Fix azure * More docs * Don't gate multipart behind feature flag

tustvold · 2023-11-02T10:34:36Z

label_issue.py automatically added labels {'object-store'} from #4971

waynr added the enhancement Any new improvement worthy of a entry in the changelog label Oct 19, 2023

waynr changed the title ~~continue existing multi-part uploade~~ continue existing multi-part upload Oct 19, 2023

tustvold added a commit to tustvold/arrow-rs that referenced this issue Oct 21, 2023

Add MultiPartStore (apache#4961) (apache#4608)

9f961f4

tustvold mentioned this issue Oct 21, 2023

Add MultiPartStore (#4961) (#4608) #4971

Merged

alamb closed this as completed in #4971 Oct 25, 2023

tustvold added the object-store Object Store Interface label Nov 2, 2023

This was referenced Feb 29, 2024

In Object Store, return version & etag on multipart put. #5443

Closed

Revisit Design of ObjectStore::put_multipart #5458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continue existing multi-part upload #4961

continue existing multi-part upload #4961

waynr commented Oct 19, 2023

tustvold commented Oct 19, 2023

waynr commented Oct 19, 2023 •

edited

Loading

tustvold commented Oct 19, 2023

waynr commented Oct 19, 2023

jacksonrnewhouse commented Oct 21, 2023

tustvold commented Nov 2, 2023

continue existing multi-part upload #4961

continue existing multi-part upload #4961

Comments

waynr commented Oct 19, 2023

tustvold commented Oct 19, 2023

waynr commented Oct 19, 2023 • edited Loading

tustvold commented Oct 19, 2023

waynr commented Oct 19, 2023

jacksonrnewhouse commented Oct 21, 2023

tustvold commented Nov 2, 2023

waynr commented Oct 19, 2023 •

edited

Loading