Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aws::S3::Object.copy_from fails cross-regions when multipart_copy: true #1083

Closed
whazzmaster opened this issue Feb 4, 2016 · 6 comments
Closed
Assignees

Comments

@whazzmaster
Copy link

We're attempting to copy objects from a bucket in one region to a bucket in another region with the multipart_copy option set to true. We've found that the operation succeeds when multipart_copy is not specified in the Object.copy_from call, but as soon as we add it we instead receive an Aws::S3::Errors:Http301Error.

Digging in with the aws.rb client, we additionally found that when invoking copy_from with multipart_copy: true the HEAD call erroneously constructs an endpoint using the target region endpoint but the source region bucket, e.g.,

Aws> result = target_object.copy_from(source_object, multipart_copy: true)
opening connection to esp-dev-singapore.s3-us-west-2.amazonaws.com:443...
...
[Aws::S3::Client 301 0.474308 0 retries]

As opposed to doing the same call without multipart_copy: true...

Aws> result = target_object.copy_from(source_object)
opening connection to esp-dev-oregon.s3-us-west-2.amazonaws.com:443...
...
[Aws::S3::Client 200 3.406666 0 retries]

Is is a known restriction that you cannot copy using multipart across regions?

@awood45
Copy link
Member

awood45 commented Feb 8, 2016

There is no such restriction, so this is something we're going to investigate.

@whazzmaster
Copy link
Author

I'm checking in on this issue- it's holding up some major code cleanup we have staged. I'm also happy to assist; if there's a root cause identified I can work on PR. I did some debugging on the issue originally but things got confusing once it got down into the ObjectMultipartCopier-- I lost the thread.

@bigtiger
Copy link

👍

@whazzmaster
Copy link
Author

Update

Debugged through the multipart copy today and found the bug, as well as a workaround.

Bug

See aws-sdk-resources/lib/aws-sdk-resources/services/s3/object_multipart_copier.rb:118:

def source_size(options)
  if options[:content_length]
    options.delete(:content_length)
  else
    bucket, key = options[:copy_source].match(/([^\/]+?)\/(.+)/)[1,2]
    @client.head_object(bucket:bucket, key:key).content_length # Uses target client instead of source client
  end
end

The source_size method attempts to get the content_length of the source object, but uses the client supplied with the target object. Hence it queries the target region for the content length of the object that does not exist there, and then cannot find it.

I believe that what needs to happen is that, in ObjectCopier#copy_object needs to pass in the source client explicitly (if it exists) to the options passed into ObjectMultipartCopier#copy. Subsequently, in ObjectMultipartCopier#source_size, if source_client exists in the options then use that to fetch the content length of the source object (in the source region, via the source client). If the client is not in the options, fall back to using the target client as happens today.

Workaround

As you can see in the above code snippet, if :content_length is present in the options, it does not need to fetch the content length using a client. Hence, per the code snippet in the original issue submission, if you do the copy as such:

target_object.copy_from(source_object, multipart_copy: true, content_length: source_object.content_length)

...it short-circuits the fetch of the content_length and the copy proceeds correctly.

@trevorrowe
Copy link
Member

@whazzmaster Good insights.

I believe that what needs to happen is that, in ObjectCopier#copy_object needs to pass in the source client explicitly (if it exists) to the options passed into ObjectMultipartCopier#copy

You are correct. What is needed is a client configured for the region of the source bucket. It should be possible to automatically extract this from the source object if it has been passed in as an instance of Aws::S3::Object or Aws::S3::ObjectVersion.

For situations where an object is not given, we could add the following options:

  • :source_region - Use this to construct a new client, using general configuration from the current client, overriding only the region.
  • :source_client - Exactly what you describe above

I think what I'm still considering is should we continue attempting to copy from the same region when we are not certain? If so, I think we should also rescue the 301 error an raise a more helpful error, that we could not discover the size of the source object and that they will need to pass one of the above options.

Thoughts?

Source client, or possibly source region.

Alternatively it could pass in a source region. The SDK could construct a client for the appropriate region and then HEAD the object there. This has the additional benefit that it would allow the customer to pass in :source_region when the source object is a hash or string.

Another option is to support a HINT from the user about the size of the source object or the prefered

@whazzmaster
Copy link
Author

@trevorrowe Ack- just pushed a PR and didn't see your response (just happened to notice I hadn't started my mail client after reboot this morning 😿)

The PR extracts the client from the supplied source object passed to copy_from if it is an S3::Object. If none is found it falls back on the existing behavior.

Conceptually though I like your idea about being more explicit at the external API, and allowing for an explicit source client to be supplied. I think the specific option keys probably depend on whether region-to-region copying should be recognized as a 'non-normal' action that requires extra options to be passed. Considering the existing ability to pass in a variety of different option types, allowing source_client and source_region depending on user's needs feels like the way to go.

Ultimately I'd say that this PR fixes the bug, but the behavior can be robustified via more/better options to the copy_from call.

@trevorrowe trevorrowe self-assigned this Feb 29, 2016
awood45 added a commit that referenced this issue Mar 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants