-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add the ability to deduplicate uploads #236
Comments
The one other thing: We should also allow the registry to respond with a "409 Conflict", if this repository already contains the blob, and the user is authorized for this given repo to read that blob. |
This only requires that the client possesses an unfinalized hash of the content, which is not really much different from possessing a different hash function of the content. Most proof of data possession algorithms will choose a random piece of the content (eg from a precomputed set) so the client has to actually have the whole content, or some other less deterministic computation. |
@justincormack If we limit the approach to a Merkle Damgård construction, I think that's much more digestable, so you don't have to do random seeks. TTFB is on the order of ~1 second, and the cache can only store on the order of ~1MB data per blob. If it's a BLAKE3 style hash, you can keep the terminal leaves of the tree and rehydrate from that. Although, how often is an unfinalized hash of the contents common? |
Like Justin mentioned, past thoughts for this Stephen Day had begun an exposed implementation of sha256 for having access to this unfinalized hash, but then this unfinalized hash state is also exposed to being tampered with. The more probable option was to work from offsets of the original blob. (Like a challenge-response of being given a few offsets, and then the client side hashes the tar at these points, and returns checksum(s)). |
@vbatts If e were to choose an offset based algorithm, would it have something like:
The problem that I kind of have is that I have two kinds of storage: Redis and S3. S3 is very slow and API operations are "costly", so we have to be conservative. Redis falls over at around 1 MB objects. Although we can have a merkle tree which makes the above very doable, each (1MB) "chunk" turns into about ~1KB of state in the leaf of the merkle tree. These leaves can be combined if they're in the middle of the object. I guess my next question is what level of security do people want, given that this would be an opt-in option? |
So, what if the structure was something like: {
"deduplicateUpload": {
"generator": "blake3"
"points": [
{
"length": 57,
"seed": "foo",
"offset": 0
}
}
} We guarantee the points are non-overlapping. We may extend the length of the blob overall, and the client should be prepared for that. Although this would be difficult to make efficient in sha256, it is easy enough to do efficiently in merkle constructs. |
Based on the previous conversation, there are three security models:
I propose we solve for use case #1 as it does not require solving any cryptography problems. I suggest we also solve a subset of #2, where there is an access control system already in place in which the registry can validate a given user has access to the given blob. I propose we state: To obtain a session ID, perform a POST request to a URL in the following format: The client MAY specify a query parameter "digest", with a well-formed descriptor. If the blob already exists and the user has access to it, the registry MAY respond with a
Alternatively, if the registry does not blindly accept the dedupe / content, it may respond a different response code in the future. If the client does not understand the response code, then it should fall back and retry without a digest. |
Yes I think just solving for cases where leaking existence is not a problem is the best solution. |
If I'm understanding @justincormack right, I agree. |
The original reason we didn't include this was that an attacker could block another upload. We added something like this in the containerd content store API by allowing the client to set upload identifiers. I think this could be done safely on registries by having a client-provided identifier, namespaced by the repository, provided at upload creation time. While a cryptographic solution would be ideal, there would be some difficulties in choosing an algorithm that would be performant and meet all use cases. The endpoint should already be well protected by the uploader, preventing the blocking of uploads by an attacker. The blast radius there would be a single repository. This could likely be implemented as an addition parameter on the upload post call. From there, you'd need something to inform clients that the upload is ongoing elsewhere. Since we already provide progress, we could use that endpoint to provide status, as well. |
How so, isn't the Location header for uploads dynamically generated? |
Yes, it is. The behavior we avoided was having user generated ids. User generated ids would solve this problem, as has been demonstrated in containerd's content store upload model. I'm saying that using a user set id is a decent solution, with appropriate ACLs in place. Requiring a complex hash algorithm isn't necessary on the server-side. A dedupe key is sufficient, as long as the client agree on the algorithm for calculating it. |
Different registries handle security differently, based on their use case. Whether it's docker hub as a public registry, or mcr, nvida, redhat as software registries. Can we enable the handshake and let the specific registry decide how it handles what can be cross mounted? |
@SteveLasker I'm not sure what you're getting at. Wouldn't registries with a security model that's not aligned with this just respond with 202 Accepted and continue as normal with the upload? This has the ability to "fail okay" on old registries and registries that do not support this security model. |
Here you refereed to 3 models, and scoping to 1 with a subset of 2. |
@SteveLasker Wouldn't the algorithm be:
I'm not sure where / why a handshake needs to occur as opposed to this fallback behaviour? |
I build I push app1 to I build At this point, As long as the token used to push has access to both repos ( So, you should be able to get all 3 scenarios if we define the semantics for how a client can identify the blob that's being uploaded (securely) and allow the registry to define whether it will support crossmount. |
I think trying to build a secure protocol for cross-repo mounts (especially one that's replay resistant) is out of scope of this proposal. I think the "try and fall back" approach is the upper bound of complexity in this proposal. |
|
To capture some of the discussion from the call... There are two opportunities to do this without really changing the spec (much):
Doing this as part of a HEAD request is problematic:
If we want to do this during cross-repo mounting, we can just make the One drawback of this approach is that it is susceptible to timing attacks. Assuming that an existence check is faster than an existence check + an auth check, a client can (probabilisticly) detect the presence of a blob in a registry even if they don't have access to it by comparing the latency of 202 responses to mount attempts. Certain registries will almost certainly not want to implement this, but fallback behavior is already well-specified. Common use cases for this would be automatically mounting publicly available blobs (I took some liberties and implemented this with GCR a while ago, using our (partial) Docker Hub mirror if you want to test against a registry) or mounting across somewhat trusted boundaries (e.g. an org-wide registry where you have read access to everything). |
I have 3 major concerns to your proposal:
|
@shizhMSFT Have you read the updated proposal in the PR?
|
Success: #275 |
Bit of a late follow up, but I started testing this in my client side code yesterday, pointing to registries that didn't support the feature yet, and with the registry:2 image and Docker Hub, the behavior is the ideal fallback. Both return a Location header when I call a "mount" without a "from". |
Thanks @sudo-bmitch that's great news. |
Problem statement
We have many repositories. On the order of ~tens of thousands. These are all managed by independent internal teams. Some teams may be building from a common base layer like Ubuntu. When Ubuntu is revved, it forces a re-upload of that layer to every repository. The security requirement of this solution is that the distribution server must be able to prove that the user has the file they're referring to. The distribution is NOT required to keep the information secret as to whether is has a given blob when the client attempts to prove its possession of the file, only access to those contents.
The current solution is to use "cross repository blob mounts", but that's non-trivial to implement for many users. Specifically, the:
Rule #3 is difficult in a couple respects. (1) If a lineage of multiple images is used -- say ubuntu -> titusoss/ubuntu -> titusoss/ubuntu-ping, it requires this logic of the "original" repository a blob came from, otherwise a dangling blob uploaded to another repo may be garbage collected without an associated manifest if the upload isn't completed quickly enough. This is easy if you know all the manifests that the blobs, but that's not in the image. (2) It also requires that any user can create a repository and read / write to any repo.
Proposal
When the user begins an upload session, and calls
POST /v2/<name>/blobs/uploads/
, they can optionally supply an argument,digest
, with a digest value of the file that comes out at the end. If the registry would like to allow the user to prove that they have this file, it will begin a protocol to allow the user to prove it. The client may choose not to opt into the deduplication method.The registry will respond with some data that allows the user to generate a random byte of strings of a particular length and value. This length should be of a reasonable size. The user's job is to then append the value to the blob, and run the given digest algorithm over their new value, and send it to the server. This is made to be possible because many hash functions have a relatively small internal state that can be checkpointed and stored alongside the blob on the server. Client-side, the user may value CPU over bandwidth, or choose to store similar metadata themselves.
Properties
generator
stringThis must be a
XOF
, which is used to generate the data to append to the original value. We can come up with the allowed values later.length
intAn integer greater than 0 which is used to define the output length to generate from the given XOF in bytes.
seed
stringThe input into the
XOF
Example:
The above defined request is asking the use to prove that they have contents, as described by
sha256:d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5
. In this case, that described the stringHello, world!\n
The user needs to do the following
Hello, world!\n
.That's roughly described in the following python session:
This gave us the resultant hash
958864784c4661cd235c474a4105deedc00ba21ca372e39e03891a5c3d32696f
. It must be the same type as the original hash type.To complete the upload, put with the
dedupe
header:At this point, the registry will validate it, and accept it if deduped, or not.
Security implications
FAQ
Should the PUT be a JSON document instead of the query parameter.
Maybe. I'm not sure.
Doesn't this require that more state is kept in the server, what's the implication there?
This state can probably be passed back and forth in the state query parameter, or embedded in the location object in another way. Alternatively, this information is pretty tiny (above example ~4 bytes).
Won't rehashing be horribly expensive?
The SHA256's hasher can be "checkpointed" and the state can be saved to disk on most implementations.
How do we deal with time, and CPUs becoming faster or one of our hash functions being weakened? ("security")
You can keep cranking up the length output by the XOF to force more hashing. This also has no requirement to work with only one hash function.
The text was updated successfully, but these errors were encountered: