-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the SnapshotMetadata service. #551
Added the SnapshotMetadata service. #551
Conversation
What about this PR #522? |
It has been closed. |
// SNAPSHOT_METADATA_SERVICE indicates that the Plugin provides | ||
// RPCs to retrieve metadata on the allocated blocks of a single | ||
// snapshot, or the changed blocks between a pair of snapshots of | ||
// the same block volume. | ||
// The presence of this capability determines whether the CO will | ||
// attempt to invoke the OPTIONAL SnapshotMetadata service RPCs. | ||
SNAPSHOT_METADATA_SERVICE = 4 [(alpha_enum_value) = true]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I blindly copied this pattern for the SnapshotMetadata service as it is an optional service.
In the Kubernetes CO implementation proposal, the existence of this gRPC service would be advertised by the presence of a CR, so this capability is not strictly required.
It occurs to me that defining this capability implies that if an SP wanted to use a separate container for the SnapshotMetadata service that they'd have to also implement an Identity service in this container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback I received verbally on this is that every CSI server process must provide an Identity service.
f4a058e
to
e1bb128
Compare
cc @bswartz |
cc @jdef |
spec.md
Outdated
// allocated blocks of a snapshot: i.e. this identifies the data ranges | ||
// that have valid data as they were the target of some previous write | ||
// operation on the volume. | ||
message GetAllocatedRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and GetDeltaRequest
) need to provide credentials like the secrets
as in CreateVolumeRequest
:
map<string, string> secrets = 5 [(csi_secret) = true];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. Many providers might be using this for volume operations @carlbraganza wdyt?
We also might want to add a way to pass the provider-specific params just like the way it's done with CreateVolumeRequest
?
// Plugin specific creation-time parameters passed in as opaque
// key-value pairs. This field is OPTIONAL. The Plugin is responsible
// for parsing and validating these parameters. COs will treat
// these as opaque.
map<string, string> parameters = 4;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters are probably not required, as the CSI-driver should be able to fetch those based on the snapshot_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic In K8s, do the secret data come from either of:
csi.storage.k8s.io/snapshotter-secret-[name|namespace]
entries of theVolumeSnapshotClass.Parameters
csi.storage.k8s.io/provisioner-secret-[name|namespace]
entries of aStorageClass.Parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and
GetDeltaRequest
) need to provide credentials like thesecrets
as inCreateVolumeRequest
:map<string, string> secrets = 5 [(csi_secret) = true];
why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?
Ceph-CSI for example has a single instance for the CSI-drivers, but (in Kubernetes) the StorageClass points to a particular cluster. Different StorageClasses can point to different clusters, each with their own secrets (also linked in the StorageClass). Because of the stateless nature, there is no guarantee that secrets can be (temporarily) stored for plugin before APIs of this new SnapshotMetadata service are called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic In K8s, do the secret data come from either of:
* `csi.storage.k8s.io/snapshotter-secret-[name|namespace]` entries of the `VolumeSnapshotClass.Parameters` * `csi.storage.k8s.io/provisioner-secret-[name|namespace]` entries of a `StorageClass.Parameters`
Yes, that is right. The kubernetes-csi/external-snapshotter resolves those secrets and passes the contents in the CSI procedures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdef, I would say yes, this is an exercise in future proofing. Every other CSI RPC that we've added without a secrets field has later needed a secrets field to be added. My stance is that the default should be to require the CO to provide pre-RPC secrets and to figure out what those look like, unless we can prove it doesn't make sense to do so.
The underlying problem are the few CSI drivers that use per-volume secrets which the CO is responsible for managing. Those drivers find it impossible to implement new features until a secrets field is added and the CO is updated to provide the appropriate per-volume secrets.
// in this list is controlled by the max_results value in the | ||
// GetAllocatedRequest message. | ||
// This field is OPTIONAL. | ||
repeated BlockMetadata block_metadata = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated BlockMetadata block_metadata = 3; | |
repeated BlockMetadata block_metadata = 3; | |
// Indicates there are no more allocated blocks in the list. | |
// This field is REQUIRED. | |
bool end_of_list = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, a gRPC stream returns an EOF on proper termination and an error otherwise. Would that not suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here is when max_results is specified, and you get an EOF, how do you know if you need to call it one more time? Or even if max_results is set to zero, can the SP decide to send less than all the results in one stream?
During my review I was going to add like a whole paragraph explaining the relationship between the max_results field and the number of elements in the block_metadata field, and I decided it was too confusing and it would be better to make it explicit.
The core of the problem is that we've introduced BOTH pagination and streaming and this means the client can't rely on the stream ending to indicate that there are no more pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core of the problem is that we've introduced BOTH pagination and streaming and this means the client can't rely on the stream ending to indicate that there are no more pages.
Hmm ... I'm not sure I understand your confusion! Perhaps we can use this snippet of Go client prototype code (written by @PrasadG193) to dissect the problem?
c.initGRPCClient(snapMetaSvc.Spec.CACert, snapMetaSvc.Spec.Address, saToken, snapNamespace)
stream, err := c.client.GetDelta(ctx, &pgrpc.GetDeltaRequest{
BaseSnapshotId: snapNamespace + "/" + baseVolumeSnapshot,
TargetSnapshotId: snapNamespace + "/" + targetVolumeSnapshot,
StartingOffset: 0,
MaxResults: uint32(256),
})
if err != nil {
return err
}
done := make(chan bool)
fmt.Println("Resp received:")
go func() {
for {
resp, err := stream.Recv()
if err == io.EOF {
done <- true //means stream is finished
return
}
if err != nil {
log.Fatalf("cannot receive %v", err)
}
respJson, _ := json.Marshal(resp)
fmt.Println(string(respJson))
}
}()
<-done //we will wait until all response is received
The example above doesn't really illustrate processing of the multiple entries in the returned block_metadata
slice, as the response is simply dumped as a string in JSON format, but it is clear that instead of the fmt.Println
a real client would process the block_metadata
slice ("page") in the resp
. The stream processing loop itself is exited only on a non-nil err
value from stream.Recv()
, where io.EOF
is not really an error but the formal proper end-of-stream indicator.
The corresponding server (sidecar) side logic for this primarily consists of a loop doing this:
func (s *Server) GetDelta(req *pgrpc.GetDeltaRequest, cbtClientStream pgrpc.SnapshotMetadata_GetDeltaServer) error {
... // call the SP service after assembling the parameters
done := make(chan bool)
go func() {
for {
... // receive resp from the SP service
log.Print("Received response from csi driver, proxying to client")
if err := cbtClientStream.Send(resp); err != nil {
log.Printf(fmt.Sprintf("cannot send %v", err))
return
}
}
}()
<-done //we will wait until all response is received
return nil
}
(The prototype sidecar has very little error handling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. You're using an error to signal a normal condition. So you always take an extra trip through the loop, except when there are are no records to return. My main concern was how we signal termination to the client, and the error semantics don't show up in the protobuf definitions so I overlooked it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it appears to be the idiomatic way in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't hate this way of doing it, but even if it's idiomatic in Go, I'm not sure we use errors in this way anywhere else in CSI. I just want to highlight that a boolean flag in the return message would achieve the same effect as using an error to signal the end of iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not both? Conventionally, client side should keep the connection alive until io.EOF
, to avoid leaks. The end_of_list
is a confirmation from the server to the client that everything has been sent. If there is an io.EOF
but end_of_list
is false
, then the client knows it got an incomplete list, and will need to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, its a bit paranoic!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a server-side graceful termination scenario, where the sidecar (or gRPC) had a chance to send an io.EOF
before an unexpected termination, and not all the block metadata has been retrieved from the driver plugin. There is no way for the backup software to know the list it got is incomplete. Is there not a valid scenario?
spec.md
Outdated
// allocated blocks of a snapshot: i.e. this identifies the data ranges | ||
// that have valid data as they were the target of some previous write | ||
// operation on the volume. | ||
message GetAllocatedRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and
GetDeltaRequest
) need to provide credentials like thesecrets
as inCreateVolumeRequest
:map<string, string> secrets = 5 [(csi_secret) = true];
why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?
@@ -412,6 +412,16 @@ service GroupController { | |||
} | |||
} | |||
|
|||
service SnapshotMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering about "stream" for the responses. we don't stream any other type of response. i notice that each of the response types already has repeated
block metadata. in practice, how large do we expect these metadata chains to be? it also looks like there's a mechanism to resume from some offset (perhaps in case the stream is interrupted?) - which means that clients already have to keep track of what they've consumed.
why not just send a very large batch of metadata for each request? the golang default gRPC max message size is 4MiB. is that not good enough?
is the use of streaming an example of premature optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a new gRPC pattern for the CSI API, but I don't think this is a case of premature optimization.
The amount of changed block data would be proportional to the size of the volume, the amount of change, and the mechanism chosen to represent changed areas (FIXED_LENGH or VARIABLE_LENGTH). For example, assuming a 1TiB volume (2^40) and 1MiB blocks, there would be 1MiB such blocks; with 8KiB blocks there would 128MiB such blocks. In the worst case for FIXED_LENGTH all the blocks could have changed. In the worst case for VARIABLE_LENGTH every alternative block could have changed. 16 bytes (+ some overhead?) are used for each changed block datum, which should also be factored in, and can approach or exceed this 4MiB threshold.
Beyond the length of a single stream response, assuming a fixed slice length forces the client to support this size, which could break some memory conservative clients. (This is why there is an option for the client to request a max_results
limit).
Recall too, that an earlier iteration of this KEP was criticized for passing all this metadata through the Kubernetes API server. The direct connection from the backup app to this new CSI service bypasses the K8s API server but cannot reduce the quantity of metadata!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdef We discussed this in the SIG Data Protection WG meeting. Some relevant observations:
- We're dealing with metadata access here, and not a control path. There is no existing precedent for such operations in the current CSI spec.
- The performance of a gRPC stream would better than the comparative use of a classic pagination API.
- A classic pagination API would involve multiple round-trip calls, versus a single call with the stream API.
- A classic pagination API requires the SP to fill the next page of metadata to return. Depending on the underlying APIs, this may require the SP to reissue the original allocated/diff call and skip the metadata records already returned in previous calls. In contrast, an SP server processing a stream API can send multiple responses without reissuing the allocated/diff call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last bullet point is probably the most convincing here
@carlbraganza can you make sure you've signed the CLA. See https://github.com/container-storage-interface/spec/blob/master/CONTRIBUTING.md#licence-and-cla |
Thanks. Looking into getting this signed. |
spec.md
Outdated
|
||
// The FIXED_LENGTH value indicates that data ranges are | ||
// returned in fixed size blocks. | ||
FIXED_LENGTH=1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consistent spacing around =
sign, here and below
spec.md
Outdated
``` | ||
|
||
|
||
#### GetMetadataAllocated RPC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: other RPC section headers don't end w/ RPC
. i haven't scanned the entire doc, but whatever we do here should be consistent w/ the rest (or, at least, the majority of the rest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like other RPC sections have the name in the form: RPC Name
(within backquotes). The errors subsection has no special formatting. Changing...
- The **FIXED_LENGTH** style requires a fixed value for the `size_bytes` field in all tuples in a given sequence. | ||
- The **VARIABLE_LENGTH** style does not constrain the value of the `size_bytes` field in a sequence. | ||
|
||
The Snapshot Metadata service permits either style at the discretion of the plugin as long as the style does not change mid-stream in any given RPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to imply that subsequent invocations of the same RPC may specify different length styles - even for the same snapshot. is this intended? i hope so, because otherwise the plugin would need to maintain state to enforce consistent styles across multiple RPCs - and we probably don't want to force plugins to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The spec leaves it to the SP to chose the style on a per RPC basis; consistency is required only within each RPC.
|
||
The Snapshot Metadata service is an optional service that is used to retrieve metadata on the allocated blocks of a single snapshot, or the changed blocks between a pair of snapshots of the same block volume. | ||
Retrieval of the data blocks of a snapshot is not addressed by this service and it is assumed that existing mechanisms to read snapshot data will suffice. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is general language around concurrency of RPCs issued by CO to a plugin for some volume - some recommendation meant to ease the burden on plugins. I don't see the same wording applied for snapshots (and we should clarify that, in general). But my hunch is that the spirit of such calls would be the same, limiting concurrency w/ respect to calls that pertain to a particular snapshot (or snapshot group). Has thought been given to the concurrency of these new RPCs? i.e. is the intent to align w/ the current spirit of the spec w/ respect to concurrency, or is a different concurrency model needed/wanted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not looked at that in detail previously.
Looking at the Concurrency
and Error Scheme
sections it appears that returning an ABORTED
error code would suffice if the SP decides to limit concurrent operations. The existing recommendation that the caller retry with an exponential back off is also sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for being patient. completed another round of review
spec.md
Outdated
// The plugin will determine an appropriate value if 0, and is | ||
// always free to send less than the requested value. | ||
// This field is OPTIONAL. | ||
uint32 max_results = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another "max" field uses int32
instead, another int64
. maybe be consistent and not specify unsigned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I'll use the int32
data type used in ListVolumesRequest
and ListSnapshotsRequest
. The odd one out is NodeGetInfoResponse
which uses int64
.
spec.md
Outdated
map<string, string> secrets = 4 [(csi_secret) = true]; | ||
} | ||
|
||
// GetMetadataAllocatedResponse messages are returned in gRPC stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar: ".. are returned in streaming fashion." or something
spec.md
Outdated
// This value must be the same in all such messages returned by | ||
// the stream. | ||
// This field is REQUIRED. | ||
uint64 volume_capacity_bytes = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is int64
elsewhere...
spec.md
Outdated
|
||
// This is a list of data range tuples. | ||
// If the value of max_results in the GetMetadataAllocatedRequest | ||
// message is non-zero, then the number of entries in this list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is greater than zero (if we drop the unsigned)
spec.md
Outdated
// MUST be less than or equal to that value. | ||
// The byte_offset field of each message MUST be strictly increasing. | ||
// The SP MUST NOT return a BlockMetadata message with a byte_offset | ||
// less than than the value of starting_offset in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is inconsistent w/ the downward rounding option granted to plugins as previously described - please reconcile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, if the CO requests a starting offset of 600 and the plugin realigns/rounds to boundary starting at offset 512, then the first block metadata response could be reported w/ an initial offset of 512 - which is not strictly increasing from the CO-specified starting offset.
or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think you're correct here - this is not quite right. Rewording to:
// The data range of the first BlockMetadata message in the list MUST
// include the starting_offset requested in the GetMetadataAllocatedRequest
// message.
spec.md
Outdated
map<string, string> secrets = 5 [(csi_secret) = true]; | ||
} | ||
|
||
// GetMetadataDeltaResponse messages are returned in gRPC stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same grammar nit
spec.md
Outdated
// This value must be the same in all such messages returned by | ||
// the stream. | ||
// This field is REQUIRED. | ||
uint64 volume_capacity_bytes = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64
spec.md
Outdated
|
||
// This is a list of data range tuples. | ||
// If the value of max_results in the GetMetadataDeltaRequest message | ||
// is non-zero, then the number of entries in this list MUST be less |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
greater than zero
spec.md
Outdated
// than or equal to that value. | ||
// The byte_offset field of each message MUST be strictly increasing. | ||
// The SP MUST NOT return a BlockMetadata message with a byte_offset | ||
// less than than the value of starting_offset in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same conflict w/ earlier, optional rounding-down by plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to:
// The data range of the first BlockMetadata message in the list MUST
// include the starting_offset requested in the GetMetadataDeltaRequest
// message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdef This was subsequently modified in response to feedback from @bswartz. Please see this comment below: #551 (comment)
##### GetMetadataDelta Errors | ||
If the plugin is unable to complete the `GetMetadataDelta` call successfully it must return a non-OK gRPC code in the gRPC status. | ||
|
||
The following conditions are well defined: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar concerns w/ snapshots in not-ready states: does NOT_FOUND work or some other code needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my earlier response.
@jdef This comment was in an outdated part of the document so I'm not sure if it was addressed:
This was brought to our attention by @nixpanic, because the Ceph CSI plugin supports multiple sets of secrets (per Pool, I believe). |
Thanks @jdef. I tried to address all the items you surfaced. PTAL. |
csi.proto
Outdated
// The plugins may round down this offset to the nearest alignment | ||
// boundary based on the BlockMetadataType used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be pedantic about this but the explanation isn't clear enough that someone could write a test to verify this behavior. Is the SP only allowed to round down if the type is FIXED_LENGTH? And is it only allowed to round by the exactly the size of one block? Would an SP that rounds down by more than one block (because internally it operates on groups of blocks) be in violation of the spec? If an SP uses VARIABLE_LENGTH blocks, how much can it round down by?
My worry is that if we leave it up to SPs and COs to figure out who should discard the "repeated" metadata blocks that might result from an interrupted transmission, they might make incompatible assumptions such that in cases of actual interruption, one or more changed blocks would get missed, but only on certain SP+CO combinations, and the bug is unlikely to get caught because it's a rare thing for these connections to get interrupted unless you specifically test for it. This bug would result in very subtle data corruption, which wouldn't even have a chance of breaking anything until a corrupted backup is restored, and even then the corruption might not get noticed, but it would be there.
I want to be crystal clear about what the SP is allowed to do with this field and what assumptions the CO is allowed to make, so we can write automated tests that ensure there is no chance of a misunderstanding by a SP or a CO implementer resulting in data corruption. I realize that the earlier wording of this section (which I had recommended) no longer made sense after the additional changes we agreed to, but we still need more clarity here.
csi.proto
Outdated
// The data range of the first BlockMetadata message in the list MUST | ||
// include the starting_offset requested in the | ||
// GetMetadataAllocatedRequest message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, this is the language I was looking for in my earlier comment. I think this may be clear enough because it's precise. Do we need to add any clarification in the case that this message is streamed and there are multiple responses for a single request? In that case the restriction mentioned only applies to the first response. The second and later responses must follow after the last block of the previous response (which is hopefully obvious).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whew!
Okay. I'll clarify that it is the first tuple in the stream that must have this property. Will now read as:
// The data range of the first BlockMetadata message in the stream
// of GetMetadataAllocatedResponse messages returned MUST include the
// starting_offset requested in the GetMetadataAllocatedRequest
// message.
The language is likewise modified in the description of the GetMetadataDeltaResponse
message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! @bswartz please see the follow up comment below: #551 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out the case about the block containing starting_offset
not containing any changes -- I had missed this. I'm convinced we can be more precise -- I'm going to take a swing a recommending some new language which is impossible to misunderstand.
After responding to @bswartz's previous comment, it occurred to me that requiring the
The language is likewise modified in the description of the I don't know if this negatively impact's @bswartz's request for precise language that can be tested, but it is more accurate. |
@bswartz wrote:
I think this is a great idea. I'm adding the following section between the description of the metadata format and the RPC definitions:
(Updated after @jdef's feedback and formatted for display above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
playing catch-up here, trying to review individual commits. additional word-smithing may be needed.
spec.md
Outdated
The SP must always examine the starting_offset and ensure that metadata sent in | ||
it MAY attempt to **continue** the operation by re-sending its request message but with a | ||
starting_offset adjusted to the NEXT byte position beyond that of any metadata already received. | ||
The SP MUST always examine the starting_offset and ensure that metadata sent in | ||
the gRPC stream logically includes the starting_offset but not any byte position prior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"but not any byte position prior" seems to contradict the gRPC method docs, which seem to indicate that starting_offset
can live in the window defined by (offset, offset+size). or am I reading this wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're correct! I think I got stuck in the previous thread on continuation. Changing to:
The SP MUST always ensure that the
starting_offset
requested be considered in the computation of the data range for the first message in the returned gRPC stream, though the data range of the first message is not required to actually include thestarting_offset
if there is no applicable data between thestarting_offset
and the start of the data range returned by the first message.
This took quite a few rewrites to be unambiguous and yet semi-readable; please feel free to offer alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
spec.md
Outdated
// The SP MUST NOT return any data ranges for which the byte_offset | ||
// plus the size_bytes is less than starting_offset. | ||
// The SP MAY return data ranges with byte_offset less than | ||
// starting_offset if starting_offset is less than byte_offset plus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... byte_offset less than starting_offset if starting_offset is less than byte_offset ..."
sorry, i'm still confused by this phrasing. what am i missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is language I recommended, and what I was trying to achieve is to constrain the SP's behavior in case of restarting an abnormally terminated stream. Basically I want to be very precise about what starting_offset means, so that SPs don't take liberties and start somewhere other than where the CO expects. Unfortunately, the only way to do that is to describe what mathematical relationships must hold between the values in the request and response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given a requested starting_offset S, and the first block_metadata B0 of the first SP response in a stream, the SP MUST ensure that B0.byte_offset <= S < B0.byte_offset + B0.size_bytes.
.. is this what you're getting at? if so, can we represent the mathematical relationship using language like this instead of the two sentences that attempt to model this in English? I see where I went wrong reading it the first time, and I probably would have not had the same trouble w/ the same in equation form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm perfectly fine using language like that. I had worried that introducing new variables like S and B0 might be extra confusing, but this seems like a case where it will be a little confusing either way, and I do like the clarity of mathematical symbols to express what is really a mathematical relationship.
@carlbraganza do you think you can reword using the language that @jdef recommends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the 2 sentences above with the following:
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that start before but not overlap
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then the following must always be true:
// B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes.
spec.md
Outdated
// The SP MUST NOT return any data ranges for which the byte_offset | ||
// plus the size_bytes is less than starting_offset. | ||
// The SP MAY return data ranges with byte_offset less than | ||
// starting_offset if starting_offset is less than byte_offset plus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change occurences of "0 based" -> "zero based" to maintain consistency as suggested by jdef
spec.md
Outdated
// the requested starting_offset: i.e. if S is the requested | ||
// starting_offset, and B0 is block_metadata[0] of the first message | ||
// in the response stream, then the following must always be true: | ||
// B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes. | |
// S < B0.byte_offset or-else S < B0.byte_offset + B0.size_bytes. |
^ this seems more readable? (for me, anyway); otherwise 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. How about if I change it to:
...
in the response stream, then either (S < B0.byte_offset) must be
true or else (S < B0.byte_offset + B0.size_bytes) must be true.
I think this accurately replaces ||
in my original sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some more typos, and rather than just recommend the minimal fix, I am recommending a significant simplification of the language, because using mathematical expressions allows us to be more precise with less verbiage.
csi.proto
Outdated
// The SP MUST ensure that the returned response stream does not | ||
// contain BlockMetadata tuples that start before but not overlap | ||
// the requested starting_offset: i.e. if S is the requested | ||
// starting_offset, and B0 is block_metadata[0] of the first message | ||
// in the response stream, then either (S < B0.byte_offset) must be | ||
// true or else (S < B0.byte_offset + B0.size_bytes) must be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have precise math expressions, we can make the English verbiage a bit more terse. Note that I'm not nitpicking here, the previous language had typos that needed fixing anyways, so I'm going to recommend a wholesale simplification of this block. Also, the two inequalities expressed were redundant, because the second implies the first as long as size_bytes is non-negative (and we actually require it to be positive).
// The SP MUST ensure that the returned response stream does not | |
// contain BlockMetadata tuples that start before but not overlap | |
// the requested starting_offset: i.e. if S is the requested | |
// starting_offset, and B0 is block_metadata[0] of the first message | |
// in the response stream, then either (S < B0.byte_offset) must be | |
// true or else (S < B0.byte_offset + B0.size_bytes) must be true. | |
// The SP MUST ensure that the returned response stream does not | |
// contain BlockMetadata tuples that end before the requested | |
// starting_offset: i.e. if S is the requested starting_offset, and B0 is | |
// block_metadata[0] of the first message in the response stream, | |
// then S < B0.byte_offset + B0.size_bytes must be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Ben. That is simpler. Will now read as:
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that end before the requested
// starting_offset: i.e. if S is the requested starting_offset, and
// B0 is block_metadata[0] of the first message in the response
// stream, then (S < B0.byte_offset + B0.size_bytes) must be true.
csi.proto
Outdated
int64 byte_offset = 1; | ||
|
||
// This is the size of the data range. | ||
// size_bytes MUST be greater than 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// size_bytes MUST be greater than 0. | |
// size_bytes MUST be greater than zero. |
spec.md
Outdated
int64 byte_offset = 1; | ||
|
||
// This is the size of the data range. | ||
// size_bytes MUST be greater than 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// size_bytes MUST be greater than 0. | |
// size_bytes MUST be greater than zero. |
csi.proto
Outdated
// This is a list of data range tuples. | ||
// If the value of max_results in the GetMetadataAllocatedRequest | ||
// message is greater than zero, then the number of entries in this | ||
// list MUST be less than or equal to that value. | ||
// The byte_offset field of each message MUST be strictly increasing. | ||
// The computation of the data range in the first BlockMetadata | ||
// message in the stream of GetMetadataAllocatedResponse messages | ||
// returned MUST consider the starting_offset requested in the | ||
// GetMetadataAllocatedRequest message. | ||
// The SP MUST NOT return data ranges for which byte_offset plus | ||
// size_bytes is less than or equal to starting_offset. | ||
// Allocated data ranges which overlap the starting_offset | ||
// or for which byte_offset plus size_bytes is greater than | ||
// starting_offset MUST be returned by the SP. | ||
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | ||
// for any two BlockMetadata messages, A and B, if | ||
// A.byte_offset < B.byte_offset then | ||
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another simplification of the English since we can lean on mathematical expressions for precision. Also I think that we don't need to repeat the language about starting_offset here. A simple reference to the other section should be enough.
// This is a list of data range tuples. | |
// If the value of max_results in the GetMetadataAllocatedRequest | |
// message is greater than zero, then the number of entries in this | |
// list MUST be less than or equal to that value. | |
// The byte_offset field of each message MUST be strictly increasing. | |
// The computation of the data range in the first BlockMetadata | |
// message in the stream of GetMetadataAllocatedResponse messages | |
// returned MUST consider the starting_offset requested in the | |
// GetMetadataAllocatedRequest message. | |
// The SP MUST NOT return data ranges for which byte_offset plus | |
// size_bytes is less than or equal to starting_offset. | |
// Allocated data ranges which overlap the starting_offset | |
// or for which byte_offset plus size_bytes is greater than | |
// starting_offset MUST be returned by the SP. | |
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | |
// for any two BlockMetadata messages, A and B, if | |
// A.byte_offset < B.byte_offset then | |
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. | |
// This is a list of data range tuples. | |
// If the value of max_results in the GetMetadataAllocatedRequest | |
// message is greater than zero, then the number of entries in this | |
// list MUST be less than or equal to that value. | |
// The SP MUST respect the value of starting_offset in the request. | |
// The byte_offset field of each BlockMetadata message MUST be | |
// strictly increasing and BlockMetadata messages MUST NOT overlap: | |
// i.e. for any two BlockMetadata messages, A and B, if A is returned | |
// before B, then A.byte_offset + A.size_bytes <= B.byte_offset MUST | |
// be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to read:
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is greater than zero, then the number of entries in this
// list MUST be less than or equal to that value.
// The SP MUST respect the value of starting_offset in the request.
// The byte_offset fields of adjacent BlockMetadata messages
// MUST be strictly increasing and messages MUST NOT overlap:
// i.e. for any two BlockMetadata messages, A and B, if A is returned
// before B, then (A.byte_offset + A.size_bytes <= B.byte_offset)
// MUST be true.
// This MUST also be true if A and B are from block_metadata lists in
// different GetMetadataAllocatedResponse messages in the gRPC stream.
// This field is OPTIONAL.
I added an additional sentence to extend the notion of "adjacent" to include the entire gRPC stream.
csi.proto
Outdated
// The SP MUST ensure that the returned response stream does not | ||
// contain BlockMetadata tuples that start before but not overlap | ||
// the requested starting_offset: i.e. if S is the requested | ||
// starting_offset, and B0 is block_metadata[0] of the first message | ||
// in the response stream, then either (S < B0.byte_offset) must be | ||
// true or else (S < B0.byte_offset + B0.size_bytes) must be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The SP MUST ensure that the returned response stream does not | |
// contain BlockMetadata tuples that start before but not overlap | |
// the requested starting_offset: i.e. if S is the requested | |
// starting_offset, and B0 is block_metadata[0] of the first message | |
// in the response stream, then either (S < B0.byte_offset) must be | |
// true or else (S < B0.byte_offset + B0.size_bytes) must be true. | |
// The SP MUST ensure that the returned response stream does not | |
// contain BlockMetadata tuples that end before the requested | |
// starting_offset: i.e. if S is the requested starting_offset, and B0 is | |
// block_metadata[0] of the first message in the response stream, | |
// then S < B0.byte_offset + B0.size_bytes must be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled like the previous case.
csi.proto
Outdated
// This is a list of data range tuples. | ||
// If the value of max_results in the GetMetadataDeltaRequest message | ||
// is greater than zero, then the number of entries in this list MUST | ||
// be less than or equal to that value. | ||
// The byte_offset field of each message MUST be strictly increasing. | ||
// The computation of the data range in the first BlockMetadata | ||
// message in the stream of GetMetadataDeltaResponse messages | ||
// returned MUST consider the starting_offset requested in the | ||
// GetMetadataDeltaRequest message. | ||
// The SP MUST NOT return data ranges for which byte_offset plus | ||
// size_bytes is less than or equal to starting_offset. | ||
// Changed data ranges which overlap the starting_offset | ||
// or for which byte_offset plus size_bytes is greater than | ||
// starting_offset MUST be returned by the SP. | ||
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | ||
// for any two BlockMetadata messages, A and B, if | ||
// A.byte_offset < B.byte_offset then | ||
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This is a list of data range tuples. | |
// If the value of max_results in the GetMetadataDeltaRequest message | |
// is greater than zero, then the number of entries in this list MUST | |
// be less than or equal to that value. | |
// The byte_offset field of each message MUST be strictly increasing. | |
// The computation of the data range in the first BlockMetadata | |
// message in the stream of GetMetadataDeltaResponse messages | |
// returned MUST consider the starting_offset requested in the | |
// GetMetadataDeltaRequest message. | |
// The SP MUST NOT return data ranges for which byte_offset plus | |
// size_bytes is less than or equal to starting_offset. | |
// Changed data ranges which overlap the starting_offset | |
// or for which byte_offset plus size_bytes is greater than | |
// starting_offset MUST be returned by the SP. | |
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | |
// for any two BlockMetadata messages, A and B, if | |
// A.byte_offset < B.byte_offset then | |
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. | |
// This is a list of data range tuples. | |
// If the value of max_results in the GetMetadataAllocatedRequest | |
// message is greater than zero, then the number of entries in this | |
// list MUST be less than or equal to that value. | |
// The SP MUST respect the value of starting_offset in the request. | |
// The byte_offset field of each BlockMetadata message MUST be | |
// strictly increasing and BlockMetadata messages MUST NOT overlap: | |
// i.e. for any two BlockMetadata messages, A and B, if A is returned | |
// before B, then A.byte_offset + A.size_bytes <= B.byte_offset MUST | |
// be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled like the previous case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
skimmed through the changes yesterday, LGTM only remaining nit is the needless newline whitespaces that pad the new protobuf blocks |
@carlbraganza Can you squash the commits? |
@carlbraganza I can't find evidence of a signed CLA. Did that end up happening? |
BTW I can squash the commits too, but we should add some more detail to the squashed commit message. |
f1eae06
to
afc847a
Compare
Rebased on master and squashed comments. |
As soon as we get a signed CLA from the author(s) we can merge this. |
I can confirm we've received CLA from Veeam including Carl Braganza |
I looked at the build test failure - my guess is that I used the wrong version of protoc. |
You probably need to rebase on HEAD and rebuild. We merged a patch that changed the Makefile significantly and also updated the protoc compiler version |
afc847a
to
aa0e11d
Compare
* Add draft of CSI CBT KEP Signed-off-by: Ivan Sim <[email protected]> * Update KEP status Signed-off-by: Ivan Sim <[email protected]> * Initial structure. Filled in the Proposal, Caveats and Risks. Put in the CSI spec in the Details section. * Removed distracting links to common K8s definitions. Clarified the proposal. * More caveats. Better grammar. * Use "snapshot access session". * addressed most of the feedback in the PR. * Updated role figure. * More refinements. * Session figure. Renamed figure files. * Fix background of session figure. * Updated figures and roles. * Propose a new role for session data. * GRPC spec * Don't propose roles. * Add user stories in the proposal (#2) * Add user stories in the proposal Signed-off-by: Prasad Ghangal <[email protected]> * Remove acceptance criteria for the user stories * Make changes suggested by Carl --------- Signed-off-by: Prasad Ghangal <[email protected]> * Added details to the manager, sidecar and SP service sections. Fixed session figure errors and rewrote the client gRPC description in the risks section. * Called out UNRESOLVED issues. More on the SP service and sidecar. * Resolved issues with expiry and advertising. * Updated TOC * Fixed typo and svg space rendering. * Fixed typo in perms figure. * Typo in session figure. More detail in user stories. * Add SnapshotSession CRDs (#5) * Add SnapshotSession CRDs * Add CR descriptions * Address review comments * Address review comments * Remove typo * Remove unnecessary new line * Added image of the flow when the TokenRequest and TokenReview APIs are used. * Fixed figure spacing * Updated permissions svg; removed session. * Updated figures. Removed session figure. * Added explanation of permissions. * Updated overview and risks. * Updated RPC and components. * Completed remaining rewrite. * Updated to CSI spec to reflect container-storage-interface/spec#551 * Removed the security_token and namespace from the gRPC spec. Pass the security token via the metadata authorization key. Pass the namespace as part of the K8s snapshot id string. * Update sections on test plan, PRR and graduation criteria Signed-off-by: Ivan Sim <[email protected]> * More neutral language on passing the auth token. * Updated to reflect changes in the CSI spec PR. * Use a separate gRPC API for the sidecar. * Replaced authorization gRPC metadata with a security_token field in request messages. * Fixed typo. * Updated CSI spec; downplayed similarity between the K8s and CSI gRPC services. * Add beta and GA graduation criteria Signed-off-by: Ivan Sim <[email protected]> * Updated CSI spec again - no unsigned numbers used. * Update KEP milestone to v1.30 Signed-off-by: Ivan Sim <[email protected]> * Update 'Scalability' section Signed-off-by: Ivan Sim <[email protected]> * Add sig-auth as participating sigs Signed-off-by: Ivan Sim <[email protected]> * Require that the CR be named for the driver. * Removed the label requirement for the CR. * Replaced johnbelamaric with soltysh for PRR approver. * Bump up milestone to v1.31 * Change KEP status to implementable --------- Signed-off-by: Ivan Sim <[email protected]> Signed-off-by: Prasad Ghangal <[email protected]> Signed-off-by: Ivan Sim <[email protected]> Co-authored-by: Carl Braganza <[email protected]> Co-authored-by: Prasad Ghangal <[email protected]>
What type of PR is this?
This is a proposed addition to the CSI specification.
What this PR does / why we need it:
It adds a SnapshotMetadata service through which one can obtain allocated or changed block metadata for snapshots.
Which issue(s) this PR fixes:
n/a
Special notes for your reviewer:
Derived from kubernetes/enhancements#4082
Does this PR introduce an API-breaking change?: