Skip to content

Adding MD5 from UploadFileOptions#20356

Closed
siminsavani-msft wants to merge 2 commits intomainfrom
siminsavani/UploadContentMD5
Closed

Adding MD5 from UploadFileOptions#20356
siminsavani-msft wants to merge 2 commits intomainfrom
siminsavani/UploadContentMD5

Conversation

@siminsavani-msft
Copy link
Copy Markdown
Contributor

@siminsavani-msft siminsavani-msft commented Mar 7, 2023

Fixes #20092

Adds MD5 from UploadFileOptions to UploadOptions and CommitBlockListOptions.

  • The purpose of this PR is explained in this or a referenced issue.
  • The PR does not update generated files.
  • Tests are included and/or updated for code changes.
  • Updates to CHANGELOG.md are included.
  • MIT license headers are included in each file.

@ghost ghost added the Storage Storage Service (Queues, Blobs, Files) label Mar 7, 2023
Copy link
Copy Markdown
Member

@jhendrixMSFT jhendrixMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth doing a scrub to see if there are other places where we missed propagating option values.


md5sum, resp, err := performUploadWithMD5Check(s.T(), _require, s.T().Name(), fileSize)

_require.ErrorContains(err, "400 The MD5 value specified in the request did not match with the MD5 value calculated by the server.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we getting this error for large file uploads?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For larger files, the service won't be able to calculate the MD5 and this error is expected as the user provided MD5 and the response MD5 won't match. We don't get this error as of now because the user provided MD5 doesn't get passed at all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a transactional md5 will always be calculated by the service. If the service is failing the transactional md5 check it's because the check is actually failing. I imagine this is because you've configured the transactional md5 on put blocklist. A transactional md5 for that API is literally checking the content integrity of the blocklist itself (it's one of the only places .NET SDK currently doesn't calculate one).

Transactional md5 validates individual requests and nothing more, meaning it would need to validate individual blocks. If one large md5 is being passed in for a partitioned upload, the client should fail fast since the checksum cannot be split.

Tags: o.Tags,
Metadata: o.Metadata,
Tier: o.AccessTier,
TransactionalContentMD5: o.TransactionalContentMD5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two features, one is to just set the MD5 sum in the blob and other one is transactional MD5. For transactional options crc64 is also supported. Transactional fields are only for validation and will not set any properties in the blob. Just double check that we are good with all these combinations as I do not see transactional CRC here, just had doubts on we are still missing some things.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful about the way we chose to expose transactional MD-5 and CRC-64 to the customer. We will be implement end-to-end content validation in the Go SDK this fall, after the FE has implement trailing CRC-64 header.

This work will look something like this - Azure/azure-sdk-for-net#29778

We do not need to calculate CRC-64 or MD5 within the Go SDK at this time, but we need to design our public interfaces in a way that will support this option in the future.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Sean. In .NET we had to retire our byte[] contentHash arguments and replace them with a whole configuration setup. It would be best to avoid writing "instant legacy" APIs and this probably merits further offline discussion of how to balance immediate needs with medium-term plans for transactional content integrity.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In .NET, if the customer populates TransferValidationOptions.PrecalculatedChecksum and TransferValidationOptions.ChecksumAlgorithm, the SDK uses their pre-calculated checksum. If the customer only populates TransferValidationOptions.ChecksumAlgorithm, the SDK calculates the checksum for the user.

We don't want to implement the SDK calculating the checksum for the user until ~6 months from now when the service supports trailing checksum footers, but we need to design our public interface to be compatible with this functionality.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaschrep-msft, what should we do in in the case where the customer provides a precalculated checksum for a multi-part upload, before we implement calculating checksums in the SDK?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fail fast. If a customer provides a transactional checksum that the SDK cannot make use of, we shouldn't pretend an integrity check has occurred when it actually hasn't.

In .NET, before we had SDK calculation on caller behalf, we didn't accept the parameter on any API that had the possibility of splitting an upload, as we didn't want customers to write code that suddenly broke once they passed a file size threshold they were not aware of. Customers had to use direct to REST apis if they wanted this feature.

@bhansalivikas bhansalivikas added this to the azblob v1.0.1 milestone Mar 8, 2023
@seanmcc-msft seanmcc-msft mentioned this pull request Mar 21, 2023
5 tasks
@siminsavani-msft siminsavani-msft deleted the siminsavani/UploadContentMD5 branch April 4, 2023 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AzBlob Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[azblob] file upload validation

6 participants