[v2] Implement full object checksum validation for multipart downloads #9734
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements full object checksum validation for multipart downloads when using the high-level
s3
command.Currently, checksum validation is only done at the low-level S3 client level via Botocore. This current validation only happens when either the object is retrieved in a single GET request, or if the requested range of an object happens to fall in the same boundary as the part size (if the object was uploaded via MPU).
At a high-level:
HeadObject
(existing behavior), where it retrieves the stored checksum value and algorithm, if in the response.FULL_OBJECT
and the checksum algorithm is CRC-based, then it provides the checksum value and algorithm to the future meta.FullObjectChecksum
object that's responsible for storing part checksums and then later combining them to a full object checksum and validation.PartStreamingChecksumBody
. This class calculates the checksum for each part, unless the underlying stream is already calculating the checksum in which case the underlying stream's checksum is reused.FullObjectChecksum
combines all part-level checksums and then validates the calculated checksum against the stored checksum.Note that
s3transfer.checksums.combine_crc32
will only be added to the S3Transfer library, not in AWS CLI v2's vended library because AWS CLI v2 can just use CRT's future bindings. It's included in this PR for now so reviewers can play around with it.Calculating full object checksums for multipart uploads will reuse some code from this PR. To get a sense for what that may look like, refer to this POC PR: #9660
To manually test:
Upload an object with CRC32 with a single PUT request:
Ensure checksum type is
FULL_OBJECT
and has CRC32 checksum value:Download object using multipart ranged GETs:
In the debug logs you should see something like: