Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the SnapshotMetadata service. #551

Conversation

carlbraganza
Copy link
Contributor

@carlbraganza carlbraganza commented Aug 18, 2023

What type of PR is this?
This is a proposed addition to the CSI specification.

What this PR does / why we need it:
It adds a SnapshotMetadata service through which one can obtain allocated or changed block metadata for snapshots.

Which issue(s) this PR fixes:
n/a

Special notes for your reviewer:
Derived from kubernetes/enhancements#4082

Does this PR introduce an API-breaking change?:

NONE

carlbraganza added a commit to ihcsim/enhancements that referenced this pull request Aug 18, 2023
@xing-yang
Copy link
Contributor

What about this PR #522?

@carlbraganza
Copy link
Contributor Author

What about this PR #522?

It has been closed.

Comment on lines +643 to +650
// SNAPSHOT_METADATA_SERVICE indicates that the Plugin provides
// RPCs to retrieve metadata on the allocated blocks of a single
// snapshot, or the changed blocks between a pair of snapshots of
// the same block volume.
// The presence of this capability determines whether the CO will
// attempt to invoke the OPTIONAL SnapshotMetadata service RPCs.
SNAPSHOT_METADATA_SERVICE = 4 [(alpha_enum_value) = true];
Copy link
Contributor Author

@carlbraganza carlbraganza Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I blindly copied this pattern for the SnapshotMetadata service as it is an optional service.
In the Kubernetes CO implementation proposal, the existence of this gRPC service would be advertised by the presence of a CR, so this capability is not strictly required.

It occurs to me that defining this capability implies that if an SP wanted to use a separate container for the SnapshotMetadata service that they'd have to also implement an Identity service in this container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback I received verbally on this is that every CSI server process must provide an Identity service.

carlbraganza added a commit to ihcsim/enhancements that referenced this pull request Aug 18, 2023
@carlbraganza carlbraganza force-pushed the snapshot-metadata-service branch from f4a058e to e1bb128 Compare August 18, 2023 20:44
@xing-yang
Copy link
Contributor

cc @bswartz

@bswartz bswartz self-requested a review August 23, 2023 16:08
@xing-yang
Copy link
Contributor

cc @jdef

spec.md Outdated
// allocated blocks of a snapshot: i.e. this identifies the data ranges
// that have valid data as they were the target of some previous write
// operation on the volume.
message GetAllocatedRequest {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and GetDeltaRequest) need to provide credentials like the secrets as in CreateVolumeRequest:

map<string, string> secrets = 5 [(csi_secret) = true];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. Many providers might be using this for volume operations @carlbraganza wdyt?
We also might want to add a way to pass the provider-specific params just like the way it's done with CreateVolumeRequest?

  // Plugin specific creation-time parameters passed in as opaque 
  // key-value pairs. This field is OPTIONAL. The Plugin is responsible 
  // for parsing and validating these parameters. COs will treat 
  // these as opaque.
  map<string, string> parameters = 4;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters are probably not required, as the CSI-driver should be able to fetch those based on the snapshot_id.

Copy link
Contributor Author

@carlbraganza carlbraganza Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic In K8s, do the secret data come from either of:

  • csi.storage.k8s.io/snapshotter-secret-[name|namespace] entries of the VolumeSnapshotClass.Parameters
  • csi.storage.k8s.io/provisioner-secret-[name|namespace] entries of a StorageClass.Parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and GetDeltaRequest) need to provide credentials like the secrets as in CreateVolumeRequest:

map<string, string> secrets = 5 [(csi_secret) = true];

why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?

Ceph-CSI for example has a single instance for the CSI-drivers, but (in Kubernetes) the StorageClass points to a particular cluster. Different StorageClasses can point to different clusters, each with their own secrets (also linked in the StorageClass). Because of the stateless nature, there is no guarantee that secrets can be (temporarily) stored for plugin before APIs of this new SnapshotMetadata service are called.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic In K8s, do the secret data come from either of:

* `csi.storage.k8s.io/snapshotter-secret-[name|namespace]` entries of the `VolumeSnapshotClass.Parameters`

* `csi.storage.k8s.io/provisioner-secret-[name|namespace]` entries of a `StorageClass.Parameters`

Yes, that is right. The kubernetes-csi/external-snapshotter resolves those secrets and passes the contents in the CSI procedures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdef, I would say yes, this is an exercise in future proofing. Every other CSI RPC that we've added without a secrets field has later needed a secrets field to be added. My stance is that the default should be to require the CO to provide pre-RPC secrets and to figure out what those look like, unless we can prove it doesn't make sense to do so.

The underlying problem are the few CSI drivers that use per-volume secrets which the CO is responsible for managing. Those drivers find it impossible to implement new features until a secrets field is added and the CO is updated to provide the appropriate per-volume secrets.

spec.md Show resolved Hide resolved
spec.md Show resolved Hide resolved
spec.md Show resolved Hide resolved
spec.md Show resolved Hide resolved
spec.md Show resolved Hide resolved
csi.proto Outdated Show resolved Hide resolved
csi.proto Show resolved Hide resolved
// in this list is controlled by the max_results value in the
// GetAllocatedRequest message.
// This field is OPTIONAL.
repeated BlockMetadata block_metadata = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
repeated BlockMetadata block_metadata = 3;
repeated BlockMetadata block_metadata = 3;
// Indicates there are no more allocated blocks in the list.
// This field is REQUIRED.
bool end_of_list = 4;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, a gRPC stream returns an EOF on proper termination and an error otherwise. Would that not suffice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is when max_results is specified, and you get an EOF, how do you know if you need to call it one more time? Or even if max_results is set to zero, can the SP decide to send less than all the results in one stream?

During my review I was going to add like a whole paragraph explaining the relationship between the max_results field and the number of elements in the block_metadata field, and I decided it was too confusing and it would be better to make it explicit.

The core of the problem is that we've introduced BOTH pagination and streaming and this means the client can't rely on the stream ending to indicate that there are no more pages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the problem is that we've introduced BOTH pagination and streaming and this means the client can't rely on the stream ending to indicate that there are no more pages.

Hmm ... I'm not sure I understand your confusion! Perhaps we can use this snippet of Go client prototype code (written by @PrasadG193) to dissect the problem?

	c.initGRPCClient(snapMetaSvc.Spec.CACert, snapMetaSvc.Spec.Address, saToken, snapNamespace)
	stream, err := c.client.GetDelta(ctx, &pgrpc.GetDeltaRequest{
		BaseSnapshotId:   snapNamespace + "/" + baseVolumeSnapshot,
		TargetSnapshotId: snapNamespace + "/" + targetVolumeSnapshot,
		StartingOffset:   0,
		MaxResults:       uint32(256),
	})
	if err != nil {
		return err
	}
	done := make(chan bool)
	fmt.Println("Resp received:")
	go func() {
		for {
			resp, err := stream.Recv()
			if err == io.EOF {
				done <- true //means stream is finished
				return
			}
			if err != nil {
				log.Fatalf("cannot receive %v", err)
			}
			respJson, _ := json.Marshal(resp)
			fmt.Println(string(respJson))
		}
	}()

	<-done //we will wait until all response is received

The example above doesn't really illustrate processing of the multiple entries in the returned block_metadata slice, as the response is simply dumped as a string in JSON format, but it is clear that instead of the fmt.Println a real client would process the block_metadata slice ("page") in the resp. The stream processing loop itself is exited only on a non-nil err value from stream.Recv(), where io.EOF is not really an error but the formal proper end-of-stream indicator.

The corresponding server (sidecar) side logic for this primarily consists of a loop doing this:

func (s *Server) GetDelta(req *pgrpc.GetDeltaRequest, cbtClientStream pgrpc.SnapshotMetadata_GetDeltaServer) error {
	... // call the SP service after assembling the parameters
	done := make(chan bool)
	go func() {
		for {
			... // receive resp from the SP service
			log.Print("Received response from csi driver, proxying to client")
			if err := cbtClientStream.Send(resp); err != nil {
				log.Printf(fmt.Sprintf("cannot send %v", err))
				return
			}
		}
	}()
	<-done //we will wait until all response is received
	return nil
}

(The prototype sidecar has very little error handling)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. You're using an error to signal a normal condition. So you always take an extra trip through the loop, except when there are are no records to return. My main concern was how we signal termination to the client, and the error semantics don't show up in the protobuf definitions so I overlooked it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it appears to be the idiomatic way in Go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't hate this way of doing it, but even if it's idiomatic in Go, I'm not sure we use errors in this way anywhere else in CSI. I just want to highlight that a boolean flag in the return message would achieve the same effect as using an error to signal the end of iteration.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not both? Conventionally, client side should keep the connection alive until io.EOF, to avoid leaks. The end_of_list is a confirmation from the server to the client that everything has been sent. If there is an io.EOF but end_of_list is false, then the client knows it got an incomplete list, and will need to handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, its a bit paranoic!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a server-side graceful termination scenario, where the sidecar (or gRPC) had a chance to send an io.EOF before an unexpected termination, and not all the block metadata has been retrieved from the driver plugin. There is no way for the backup software to know the list it got is incomplete. Is there not a valid scenario?

csi.proto Show resolved Hide resolved
csi.proto Show resolved Hide resolved
csi.proto Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
spec.md Outdated
// allocated blocks of a snapshot: i.e. this identifies the data ranges
// that have valid data as they were the target of some previous write
// operation on the volume.
message GetAllocatedRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and GetDeltaRequest) need to provide credentials like the secrets as in CreateVolumeRequest:

map<string, string> secrets = 5 [(csi_secret) = true];

why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?

@@ -412,6 +412,16 @@ service GroupController {
}
}

service SnapshotMetadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering about "stream" for the responses. we don't stream any other type of response. i notice that each of the response types already has repeated block metadata. in practice, how large do we expect these metadata chains to be? it also looks like there's a mechanism to resume from some offset (perhaps in case the stream is interrupted?) - which means that clients already have to keep track of what they've consumed.

why not just send a very large batch of metadata for each request? the golang default gRPC max message size is 4MiB. is that not good enough?

is the use of streaming an example of premature optimization?

Copy link
Contributor Author

@carlbraganza carlbraganza Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a new gRPC pattern for the CSI API, but I don't think this is a case of premature optimization.

The amount of changed block data would be proportional to the size of the volume, the amount of change, and the mechanism chosen to represent changed areas (FIXED_LENGH or VARIABLE_LENGTH). For example, assuming a 1TiB volume (2^40) and 1MiB blocks, there would be 1MiB such blocks; with 8KiB blocks there would 128MiB such blocks. In the worst case for FIXED_LENGTH all the blocks could have changed. In the worst case for VARIABLE_LENGTH every alternative block could have changed. 16 bytes (+ some overhead?) are used for each changed block datum, which should also be factored in, and can approach or exceed this 4MiB threshold.

Beyond the length of a single stream response, assuming a fixed slice length forces the client to support this size, which could break some memory conservative clients. (This is why there is an option for the client to request a max_results limit).

Recall too, that an earlier iteration of this KEP was criticized for passing all this metadata through the Kubernetes API server. The direct connection from the backup app to this new CSI service bypasses the K8s API server but cannot reduce the quantity of metadata!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdef We discussed this in the SIG Data Protection WG meeting. Some relevant observations:

  • We're dealing with metadata access here, and not a control path. There is no existing precedent for such operations in the current CSI spec.
  • The performance of a gRPC stream would better than the comparative use of a classic pagination API.
    • A classic pagination API would involve multiple round-trip calls, versus a single call with the stream API.
    • A classic pagination API requires the SP to fill the next page of metadata to return. Depending on the underlying APIs, this may require the SP to reissue the original allocated/diff call and skip the metadata records already returned in previous calls. In contrast, an SP server processing a stream API can send multiple responses without reissuing the allocated/diff call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last bullet point is probably the most convincing here

@saad-ali
Copy link
Member

@carlbraganza
Copy link
Contributor Author

@carlbraganza can you make sure you've signed the CLA. See https://github.com/container-storage-interface/spec/blob/master/CONTRIBUTING.md#licence-and-cla

Thanks. Looking into getting this signed.

spec.md Show resolved Hide resolved
spec.md Outdated

// The FIXED_LENGTH value indicates that data ranges are
// returned in fixed size blocks.
FIXED_LENGTH=1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistent spacing around = sign, here and below

spec.md Show resolved Hide resolved
spec.md Outdated
```


#### GetMetadataAllocated RPC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: other RPC section headers don't end w/ RPC. i haven't scanned the entire doc, but whatever we do here should be consistent w/ the rest (or, at least, the majority of the rest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like other RPC sections have the name in the form: RPC Name (within backquotes). The errors subsection has no special formatting. Changing...

- The **FIXED_LENGTH** style requires a fixed value for the `size_bytes` field in all tuples in a given sequence.
- The **VARIABLE_LENGTH** style does not constrain the value of the `size_bytes` field in a sequence.

The Snapshot Metadata service permits either style at the discretion of the plugin as long as the style does not change mid-stream in any given RPC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to imply that subsequent invocations of the same RPC may specify different length styles - even for the same snapshot. is this intended? i hope so, because otherwise the plugin would need to maintain state to enforce consistent styles across multiple RPCs - and we probably don't want to force plugins to do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The spec leaves it to the SP to chose the style on a per RPC basis; consistency is required only within each RPC.


The Snapshot Metadata service is an optional service that is used to retrieve metadata on the allocated blocks of a single snapshot, or the changed blocks between a pair of snapshots of the same block volume.
Retrieval of the data blocks of a snapshot is not addressed by this service and it is assumed that existing mechanisms to read snapshot data will suffice.

Copy link
Member

@jdef jdef Nov 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is general language around concurrency of RPCs issued by CO to a plugin for some volume - some recommendation meant to ease the burden on plugins. I don't see the same wording applied for snapshots (and we should clarify that, in general). But my hunch is that the spirit of such calls would be the same, limiting concurrency w/ respect to calls that pertain to a particular snapshot (or snapshot group). Has thought been given to the concurrency of these new RPCs? i.e. is the intent to align w/ the current spirit of the spec w/ respect to concurrency, or is a different concurrency model needed/wanted?

Copy link
Contributor Author

@carlbraganza carlbraganza Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not looked at that in detail previously.

Looking at the Concurrency and Error Scheme sections it appears that returning an ABORTED error code would suffice if the SP decides to limit concurrent operations. The existing recommendation that the caller retry with an exponential back off is also sufficient.

Copy link
Member

@jdef jdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for being patient. completed another round of review

spec.md Outdated
// The plugin will determine an appropriate value if 0, and is
// always free to send less than the requested value.
// This field is OPTIONAL.
uint32 max_results = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another "max" field uses int32 instead, another int64. maybe be consistent and not specify unsigned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll use the int32 data type used in ListVolumesRequest and ListSnapshotsRequest. The odd one out is NodeGetInfoResponse which uses int64.

spec.md Outdated
map<string, string> secrets = 4 [(csi_secret) = true];
}

// GetMetadataAllocatedResponse messages are returned in gRPC stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar: ".. are returned in streaming fashion." or something

spec.md Outdated
// This value must be the same in all such messages returned by
// the stream.
// This field is REQUIRED.
uint64 volume_capacity_bytes = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is int64 elsewhere...

spec.md Outdated

// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is non-zero, then the number of entries in this list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is greater than zero (if we drop the unsigned)

spec.md Outdated
// MUST be less than or equal to that value.
// The byte_offset field of each message MUST be strictly increasing.
// The SP MUST NOT return a BlockMetadata message with a byte_offset
// less than than the value of starting_offset in the
Copy link
Member

@jdef jdef Nov 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is inconsistent w/ the downward rounding option granted to plugins as previously described - please reconcile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, if the CO requests a starting offset of 600 and the plugin realigns/rounds to boundary starting at offset 512, then the first block metadata response could be reported w/ an initial offset of 512 - which is not strictly increasing from the CO-specified starting offset.

or am I missing something?

Copy link
Contributor Author

@carlbraganza carlbraganza Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think you're correct here - this is not quite right. Rewording to:

// The data range of the first BlockMetadata message in the list MUST
// include the starting_offset requested in the GetMetadataAllocatedRequest
// message.

spec.md Outdated
map<string, string> secrets = 5 [(csi_secret) = true];
}

// GetMetadataDeltaResponse messages are returned in gRPC stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same grammar nit

spec.md Outdated
// This value must be the same in all such messages returned by
// the stream.
// This field is REQUIRED.
uint64 volume_capacity_bytes = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int64

spec.md Outdated

// This is a list of data range tuples.
// If the value of max_results in the GetMetadataDeltaRequest message
// is non-zero, then the number of entries in this list MUST be less
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greater than zero

spec.md Outdated
// than or equal to that value.
// The byte_offset field of each message MUST be strictly increasing.
// The SP MUST NOT return a BlockMetadata message with a byte_offset
// less than than the value of starting_offset in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same conflict w/ earlier, optional rounding-down by plugin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

// The data range of the first BlockMetadata message in the list MUST
// include the starting_offset requested in the GetMetadataDeltaRequest
// message.

Copy link
Contributor Author

@carlbraganza carlbraganza Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdef This was subsequently modified in response to feedback from @bswartz. Please see this comment below: #551 (comment)

##### GetMetadataDelta Errors
If the plugin is unable to complete the `GetMetadataDelta` call successfully it must return a non-OK gRPC code in the gRPC status.

The following conditions are well defined:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar concerns w/ snapshots in not-ready states: does NOT_FOUND work or some other code needed?

Copy link
Contributor Author

@carlbraganza carlbraganza Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my earlier response.

@carlbraganza
Copy link
Contributor Author

carlbraganza commented Nov 20, 2023

@jdef This comment was in an outdated part of the document so I'm not sure if it was addressed:

why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?

This was brought to our attention by @nixpanic, because the Ceph CSI plugin supports multiple sets of secrets (per Pool, I believe).

@carlbraganza
Copy link
Contributor Author

thanks for being patient. completed another round of review

Thanks @jdef. I tried to address all the items you surfaced. PTAL.

csi.proto Outdated
Comment on lines 2040 to 2041
// The plugins may round down this offset to the nearest alignment
// boundary based on the BlockMetadataType used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be pedantic about this but the explanation isn't clear enough that someone could write a test to verify this behavior. Is the SP only allowed to round down if the type is FIXED_LENGTH? And is it only allowed to round by the exactly the size of one block? Would an SP that rounds down by more than one block (because internally it operates on groups of blocks) be in violation of the spec? If an SP uses VARIABLE_LENGTH blocks, how much can it round down by?

My worry is that if we leave it up to SPs and COs to figure out who should discard the "repeated" metadata blocks that might result from an interrupted transmission, they might make incompatible assumptions such that in cases of actual interruption, one or more changed blocks would get missed, but only on certain SP+CO combinations, and the bug is unlikely to get caught because it's a rare thing for these connections to get interrupted unless you specifically test for it. This bug would result in very subtle data corruption, which wouldn't even have a chance of breaking anything until a corrupted backup is restored, and even then the corruption might not get noticed, but it would be there.

I want to be crystal clear about what the SP is allowed to do with this field and what assumptions the CO is allowed to make, so we can write automated tests that ensure there is no chance of a misunderstanding by a SP or a CO implementer resulting in data corruption. I realize that the earlier wording of this section (which I had recommended) no longer made sense after the additional changes we agreed to, but we still need more clarity here.

csi.proto Outdated
Comment on lines 2013 to 2015
// The data range of the first BlockMetadata message in the list MUST
// include the starting_offset requested in the
// GetMetadataAllocatedRequest message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, this is the language I was looking for in my earlier comment. I think this may be clear enough because it's precise. Do we need to add any clarification in the case that this message is streamed and there are multiple responses for a single request? In that case the restriction mentioned only applies to the first response. The second and later responses must follow after the last block of the previous response (which is hopefully obvious).

Copy link
Contributor Author

@carlbraganza carlbraganza Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew!

Okay. I'll clarify that it is the first tuple in the stream that must have this property. Will now read as:

// The data range of the first BlockMetadata message in the stream
// of GetMetadataAllocatedResponse messages returned MUST include the
// starting_offset requested in the GetMetadataAllocatedRequest
// message.

The language is likewise modified in the description of the GetMetadataDeltaResponse message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! @bswartz please see the follow up comment below: #551 (comment)

Copy link
Contributor

@bswartz bswartz Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the case about the block containing starting_offset not containing any changes -- I had missed this. I'm convinced we can be more precise -- I'm going to take a swing a recommending some new language which is impossible to misunderstand.

@carlbraganza carlbraganza requested a review from bswartz November 28, 2023 00:07
@carlbraganza
Copy link
Contributor Author

carlbraganza commented Nov 28, 2023

After responding to @bswartz's previous comment, it occurred to me that requiring the starting_offset to be present in the first tuple range is not correct, because there may not be allocated/changed blocks in the range including that offset. What we really want is that the SP should consider this starting_offset in its computation when generating the data stream. Accordingly, I changed the text to the following:

// The computation of the data range in the first BlockMetadata
// message in the stream of GetMetadataAllocatedResponse messages
// returned MUST consider the starting_offset requested in the
// GetMetadataAllocatedRequest message, though it is not required
// that the first data range tuple returned actually include the
// starting_offset, as the first allocated block range may start
// after this offset.

The language is likewise modified in the description of the GetMetadataDeltaResponse message.

I don't know if this negatively impact's @bswartz's request for precise language that can be tested, but it is more accurate.

@carlbraganza
Copy link
Contributor Author

carlbraganza commented Nov 29, 2023

@bswartz wrote:

I'm thinking somewhere in the spec.md file (not the proto file) in the section about change-block tracking we need to mention in plain English how the SP is responsible for ensuring EVERY changed block gets communicated to the CO before reporting the end of stream, and that the CO is responsible for picking the correct starting_offset when restarting an interrupted connection, in order to ensure that it doesn't miss any blocks. The stakes are high for not screwing up this metadata transmission, so it's worth recommending caution to implementers not to be careless.

I think this is a great idea. I'm adding the following section between the description of the metadata format and the RPC definitions:

### Special considerations for processing a metadata gRPC stream

The remote procedure calls of this service return snapshot metadata within a gRPC stream of response messages.
There are some important things for the SP and CO to consider here:

- In normal operation an SP MUST terminate the stream only after ALL metadata is transmitted, 
  using the language specific idiom for terminating a gRPC stream source.
  This results in the CO receiving its language specific notification idiom for the end of a gRPC 
  stream, which provides a **definitive indication that all available metadata has been received**.

- If the SP encounters an error while recovering the metadata it MUST abort transmission of the 
  stream with its language specific error idiom.
  This results in the CO receiving the error in its language specific idiom, which will be different 
  from the language specific idiom for the end of a gRPC stream.

- It is possible that the gRPC stream gets interrupted for arbitrary reasons beyond the control 
  of either the SP or the CO.
  The SP will get an error when writing to the stream and MUST abort its transmission.
  The CO will receive an error in its language specific idiom, which will be different from the 
  language specific idiom for the end of a gRPC stream.

- In all circumstances where the CO receives an error when reading from the gRPC stream it 
  MAY attempt to **continue** the operation by re-sending its request message but with a 
 `starting_offset` adjusted to the NEXT byte position beyond that of any metadata already received.
  The SP MUST always ensure that the `starting_offset` requested be considered in the 
  computation of the data range for the first message in the returned gRPC stream, though the 
  data range of the first message is not required to actually include the `starting_offset` if there 
  is no applicable data between the `starting_offset` and the start of the data range returned by the first message.

(Updated after @jdef's feedback and formatted for display above).

@carlbraganza carlbraganza requested a review from bswartz November 29, 2023 04:21
Copy link
Member

@jdef jdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

playing catch-up here, trying to review individual commits. additional word-smithing may be needed.

spec.md Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
spec.md Outdated
The SP must always examine the starting_offset and ensure that metadata sent in
it MAY attempt to **continue** the operation by re-sending its request message but with a
starting_offset adjusted to the NEXT byte position beyond that of any metadata already received.
The SP MUST always examine the starting_offset and ensure that metadata sent in
the gRPC stream logically includes the starting_offset but not any byte position prior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"but not any byte position prior" seems to contradict the gRPC method docs, which seem to indicate that starting_offset can live in the window defined by (offset, offset+size). or am I reading this wrong?

Copy link
Contributor Author

@carlbraganza carlbraganza Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct! I think I got stuck in the previous thread on continuation. Changing to:

The SP MUST always ensure that the starting_offset requested be considered in the computation of the data range for the first message in the returned gRPC stream, though the data range of the first message is not required to actually include the starting_offset if there is no applicable data between the starting_offset and the start of the data range returned by the first message.

This took quite a few rewrites to be unambiguous and yet semi-readable; please feel free to offer alternatives.

@carlbraganza carlbraganza requested a review from jdef December 15, 2023 18:54
Copy link
Contributor

@bswartz bswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

spec.md Outdated Show resolved Hide resolved
spec.md Outdated
// The SP MUST NOT return any data ranges for which the byte_offset
// plus the size_bytes is less than starting_offset.
// The SP MAY return data ranges with byte_offset less than
// starting_offset if starting_offset is less than byte_offset plus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... byte_offset less than starting_offset if starting_offset is less than byte_offset ..."

sorry, i'm still confused by this phrasing. what am i missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is language I recommended, and what I was trying to achieve is to constrain the SP's behavior in case of restarting an abnormally terminated stream. Basically I want to be very precise about what starting_offset means, so that SPs don't take liberties and start somewhere other than where the CO expects. Unfortunately, the only way to do that is to describe what mathematical relationships must hold between the values in the request and response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a requested starting_offset S, and the first block_metadata B0 of the first SP response in a stream, the SP MUST ensure that B0.byte_offset <= S < B0.byte_offset + B0.size_bytes.

.. is this what you're getting at? if so, can we represent the mathematical relationship using language like this instead of the two sentences that attempt to model this in English? I see where I went wrong reading it the first time, and I probably would have not had the same trouble w/ the same in equation form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm perfectly fine using language like that. I had worried that introducing new variables like S and B0 might be extra confusing, but this seems like a case where it will be a little confusing either way, and I do like the clarity of mathematical symbols to express what is really a mathematical relationship.

@carlbraganza do you think you can reword using the language that @jdef recommends?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the 2 sentences above with the following:

// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that start before but not overlap
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then the following must always be true:
// B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes.

spec.md Outdated
// The SP MUST NOT return any data ranges for which the byte_offset
// plus the size_bytes is less than starting_offset.
// The SP MAY return data ranges with byte_offset less than
// starting_offset if starting_offset is less than byte_offset plus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link

@PrasadG193 PrasadG193 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change occurences of "0 based" -> "zero based" to maintain consistency as suggested by jdef

csi.proto Outdated Show resolved Hide resolved
spec.md Outdated Show resolved Hide resolved
@carlbraganza carlbraganza requested a review from jdef February 21, 2024 20:55
spec.md Outdated
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then the following must always be true:
// B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes.
// S < B0.byte_offset or-else S < B0.byte_offset + B0.size_bytes.

^ this seems more readable? (for me, anyway); otherwise 👍

Copy link
Contributor Author

@carlbraganza carlbraganza Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. How about if I change it to:

...
in the response stream, then either (S < B0.byte_offset) must be
true or else (S < B0.byte_offset + B0.size_bytes) must be true.

I think this accurately replaces || in my original sentence.

Copy link
Member

@jdef jdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mod nit

Copy link
Contributor

@bswartz bswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some more typos, and rather than just recommend the minimal fix, I am recommending a significant simplification of the language, because using mathematical expressions allows us to be more precise with less verbiage.

csi.proto Outdated
Comment on lines 1975 to 1980
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that start before but not overlap
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then either (S < B0.byte_offset) must be
// true or else (S < B0.byte_offset + B0.size_bytes) must be true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have precise math expressions, we can make the English verbiage a bit more terse. Note that I'm not nitpicking here, the previous language had typos that needed fixing anyways, so I'm going to recommend a wholesale simplification of this block. Also, the two inequalities expressed were redundant, because the second implies the first as long as size_bytes is non-negative (and we actually require it to be positive).

Suggested change
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that start before but not overlap
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then either (S < B0.byte_offset) must be
// true or else (S < B0.byte_offset + B0.size_bytes) must be true.
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that end before the requested
// starting_offset: i.e. if S is the requested starting_offset, and B0 is
// block_metadata[0] of the first message in the response stream,
// then S < B0.byte_offset + B0.size_bytes must be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Ben. That is simpler. Will now read as:

// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that end before the requested
// starting_offset: i.e. if S is the requested starting_offset, and
// B0 is block_metadata[0] of the first message in the response
// stream, then (S < B0.byte_offset + B0.size_bytes) must be true.

csi.proto Outdated
int64 byte_offset = 1;

// This is the size of the data range.
// size_bytes MUST be greater than 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// size_bytes MUST be greater than 0.
// size_bytes MUST be greater than zero.

spec.md Outdated
int64 byte_offset = 1;

// This is the size of the data range.
// size_bytes MUST be greater than 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// size_bytes MUST be greater than 0.
// size_bytes MUST be greater than zero.

csi.proto Outdated
Comment on lines 2016 to 2033
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is greater than zero, then the number of entries in this
// list MUST be less than or equal to that value.
// The byte_offset field of each message MUST be strictly increasing.
// The computation of the data range in the first BlockMetadata
// message in the stream of GetMetadataAllocatedResponse messages
// returned MUST consider the starting_offset requested in the
// GetMetadataAllocatedRequest message.
// The SP MUST NOT return data ranges for which byte_offset plus
// size_bytes is less than or equal to starting_offset.
// Allocated data ranges which overlap the starting_offset
// or for which byte_offset plus size_bytes is greater than
// starting_offset MUST be returned by the SP.
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e.
// for any two BlockMetadata messages, A and B, if
// A.byte_offset < B.byte_offset then
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another simplification of the English since we can lean on mathematical expressions for precision. Also I think that we don't need to repeat the language about starting_offset here. A simple reference to the other section should be enough.

Suggested change
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is greater than zero, then the number of entries in this
// list MUST be less than or equal to that value.
// The byte_offset field of each message MUST be strictly increasing.
// The computation of the data range in the first BlockMetadata
// message in the stream of GetMetadataAllocatedResponse messages
// returned MUST consider the starting_offset requested in the
// GetMetadataAllocatedRequest message.
// The SP MUST NOT return data ranges for which byte_offset plus
// size_bytes is less than or equal to starting_offset.
// Allocated data ranges which overlap the starting_offset
// or for which byte_offset plus size_bytes is greater than
// starting_offset MUST be returned by the SP.
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e.
// for any two BlockMetadata messages, A and B, if
// A.byte_offset < B.byte_offset then
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true.
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is greater than zero, then the number of entries in this
// list MUST be less than or equal to that value.
// The SP MUST respect the value of starting_offset in the request.
// The byte_offset field of each BlockMetadata message MUST be
// strictly increasing and BlockMetadata messages MUST NOT overlap:
// i.e. for any two BlockMetadata messages, A and B, if A is returned
// before B, then A.byte_offset + A.size_bytes <= B.byte_offset MUST
// be true.

Copy link
Contributor Author

@carlbraganza carlbraganza Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to read:

// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is greater than zero, then the number of entries in this
// list MUST be less than or equal to that value.
// The SP MUST respect the value of starting_offset in the request.
// The byte_offset fields of adjacent BlockMetadata messages
// MUST be strictly increasing and messages MUST NOT overlap:
// i.e. for any two BlockMetadata messages, A and B, if A is returned
// before B, then (A.byte_offset + A.size_bytes <= B.byte_offset)
// MUST be true.
// This MUST also be true if A and B are from block_metadata lists in
// different GetMetadataAllocatedResponse messages in the gRPC stream.
// This field is OPTIONAL.

I added an additional sentence to extend the notion of "adjacent" to include the entire gRPC stream.

csi.proto Outdated
Comment on lines 2058 to 2063
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that start before but not overlap
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then either (S < B0.byte_offset) must be
// true or else (S < B0.byte_offset + B0.size_bytes) must be true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that start before but not overlap
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then either (S < B0.byte_offset) must be
// true or else (S < B0.byte_offset + B0.size_bytes) must be true.
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that end before the requested
// starting_offset: i.e. if S is the requested starting_offset, and B0 is
// block_metadata[0] of the first message in the response stream,
// then S < B0.byte_offset + B0.size_bytes must be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled like the previous case.

csi.proto Outdated
Comment on lines 2100 to 2117
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataDeltaRequest message
// is greater than zero, then the number of entries in this list MUST
// be less than or equal to that value.
// The byte_offset field of each message MUST be strictly increasing.
// The computation of the data range in the first BlockMetadata
// message in the stream of GetMetadataDeltaResponse messages
// returned MUST consider the starting_offset requested in the
// GetMetadataDeltaRequest message.
// The SP MUST NOT return data ranges for which byte_offset plus
// size_bytes is less than or equal to starting_offset.
// Changed data ranges which overlap the starting_offset
// or for which byte_offset plus size_bytes is greater than
// starting_offset MUST be returned by the SP.
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e.
// for any two BlockMetadata messages, A and B, if
// A.byte_offset < B.byte_offset then
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataDeltaRequest message
// is greater than zero, then the number of entries in this list MUST
// be less than or equal to that value.
// The byte_offset field of each message MUST be strictly increasing.
// The computation of the data range in the first BlockMetadata
// message in the stream of GetMetadataDeltaResponse messages
// returned MUST consider the starting_offset requested in the
// GetMetadataDeltaRequest message.
// The SP MUST NOT return data ranges for which byte_offset plus
// size_bytes is less than or equal to starting_offset.
// Changed data ranges which overlap the starting_offset
// or for which byte_offset plus size_bytes is greater than
// starting_offset MUST be returned by the SP.
// BlockMetadata messages MUST NOT overlap for a given snapshot, i.e.
// for any two BlockMetadata messages, A and B, if
// A.byte_offset < B.byte_offset then
// A.byte_offset + A.size_bytes <= B.byte_offset MUST be true.
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is greater than zero, then the number of entries in this
// list MUST be less than or equal to that value.
// The SP MUST respect the value of starting_offset in the request.
// The byte_offset field of each BlockMetadata message MUST be
// strictly increasing and BlockMetadata messages MUST NOT overlap:
// i.e. for any two BlockMetadata messages, A and B, if A is returned
// before B, then A.byte_offset + A.size_bytes <= B.byte_offset MUST
// be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled like the previous case.

@carlbraganza carlbraganza requested a review from bswartz February 23, 2024 21:28
Copy link
Contributor

@bswartz bswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jdef
Copy link
Member

jdef commented Feb 28, 2024

skimmed through the changes yesterday, LGTM

only remaining nit is the needless newline whitespaces that pad the new protobuf blocks

@xing-yang
Copy link
Contributor

@carlbraganza Can you squash the commits?

@bswartz
Copy link
Contributor

bswartz commented Mar 7, 2024

@carlbraganza I can't find evidence of a signed CLA. Did that end up happening?

@bswartz
Copy link
Contributor

bswartz commented Mar 7, 2024

BTW I can squash the commits too, but we should add some more detail to the squashed commit message.

@carlbraganza carlbraganza force-pushed the snapshot-metadata-service branch 2 times, most recently from f1eae06 to afc847a Compare March 7, 2024 20:37
@carlbraganza
Copy link
Contributor Author

Rebased on master and squashed comments.

@bswartz
Copy link
Contributor

bswartz commented Mar 7, 2024

As soon as we get a signed CLA from the author(s) we can merge this.

@saad-ali
Copy link
Member

I can confirm we've received CLA from Veeam including Carl Braganza

@carlbraganza
Copy link
Contributor Author

I looked at the build test failure - my guess is that I used the wrong version of protoc.

@bswartz
Copy link
Contributor

bswartz commented Mar 12, 2024

You probably need to rebase on HEAD and rebuild. We merged a patch that changed the Makefile significantly and also updated the protoc compiler version

@carlbraganza carlbraganza force-pushed the snapshot-metadata-service branch from afc847a to aa0e11d Compare March 12, 2024 19:34
@bswartz bswartz merged commit 2696773 into container-storage-interface:master Mar 14, 2024
2 checks passed
@carlbraganza carlbraganza deleted the snapshot-metadata-service branch March 15, 2024 00:09
k8s-ci-robot pushed a commit to kubernetes/enhancements that referenced this pull request Jun 12, 2024
* Add draft of CSI CBT KEP

Signed-off-by: Ivan Sim <[email protected]>

* Update KEP status

Signed-off-by: Ivan Sim <[email protected]>

* Initial structure.
Filled in the Proposal, Caveats and Risks.
Put in the CSI spec in the Details section.

* Removed distracting links to common K8s definitions.
Clarified the proposal.

* More caveats.  Better grammar.

* Use "snapshot access session".

* addressed most of the feedback in the PR.

* Updated role figure.

* More refinements.

* Session figure. Renamed figure files.

* Fix background of session figure.

* Updated figures and roles.

* Propose a new role for session data.

* GRPC spec

* Don't propose roles.

* Add user stories in the proposal (#2)

* Add user stories in the proposal

Signed-off-by: Prasad Ghangal <[email protected]>

* Remove acceptance criteria for the user stories

* Make changes suggested by Carl

---------

Signed-off-by: Prasad Ghangal <[email protected]>

* Added details to the manager, sidecar and SP service sections.
Fixed session figure errors and rewrote the client gRPC
description in the risks section.

* Called out UNRESOLVED issues.
More on the SP service and sidecar.

* Resolved issues with expiry and advertising.

* Updated TOC

* Fixed typo and svg space rendering.

* Fixed typo in perms figure.

* Typo in session figure. More detail in user stories.

* Add SnapshotSession CRDs (#5)

* Add SnapshotSession CRDs

* Add CR descriptions

* Address review comments

* Address review comments

* Remove typo

* Remove unnecessary new line

* Added image of the flow when the TokenRequest and TokenReview APIs are used.

* Fixed figure spacing

* Updated permissions svg; removed session.

* Updated figures. Removed session figure.

* Added explanation of permissions.

* Updated overview and risks.

* Updated RPC and components.

* Completed remaining rewrite.

* Updated to CSI spec to reflect container-storage-interface/spec#551

* Removed the security_token and namespace from the gRPC spec.
Pass the security token via the metadata authorization key.
Pass the namespace as part of the K8s snapshot id string.

* Update sections on test plan, PRR and graduation criteria

Signed-off-by: Ivan Sim <[email protected]>

* More neutral language on passing the auth token.

* Updated to reflect changes in the CSI spec PR.

* Use a separate gRPC API for the sidecar.

* Replaced authorization gRPC metadata with a security_token field in request messages.

* Fixed typo.

* Updated CSI spec; downplayed similarity between the K8s and CSI gRPC services.

* Add beta and GA graduation criteria

Signed-off-by: Ivan Sim <[email protected]>

* Updated CSI spec again - no unsigned numbers used.

* Update KEP milestone to v1.30

Signed-off-by: Ivan Sim <[email protected]>

* Update 'Scalability' section

Signed-off-by: Ivan Sim <[email protected]>

* Add sig-auth as participating sigs

Signed-off-by: Ivan Sim <[email protected]>

* Require that the CR be named for the driver.

* Removed the label requirement for the CR.

* Replaced johnbelamaric with soltysh for PRR approver.

* Bump up milestone to v1.31

* Change KEP status to implementable

---------

Signed-off-by: Ivan Sim <[email protected]>
Signed-off-by: Prasad Ghangal <[email protected]>
Signed-off-by: Ivan Sim <[email protected]>
Co-authored-by: Carl Braganza <[email protected]>
Co-authored-by: Prasad Ghangal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants