Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Signature/Digest Algorithm Forward Compatibility #124

Open
stevetodd opened this issue Apr 12, 2021 · 20 comments
Open

Signature/Digest Algorithm Forward Compatibility #124

stevetodd opened this issue Apr 12, 2021 · 20 comments

Comments

@stevetodd
Copy link
Contributor

The current encoding has to fail the connection when an unrecognized signature or digest material code is used. When these codes are used in attachments, there is not enough information for the protocol decoder to continue processing since the length of the material is unknown. Without that length, it is impossible for the encoder to know how far to advance beyond the unrecognized material. As a result, the connection enters a failed state.

To further illustrate, validator v1 provides a watcher with a receipt using a signature algorithm newly standardized in the KERI community. The watcher recognizes the code and successfully verifies the receipt. The watcher is then contacted by another validator, v2, that hasn't been updated yet and the watcher provides the receipt of v1 to v2. v2 encounters the signature with the new algorithm, doesn't recognize it, and doesn't know how long the signature is. v2 has to stop processing the stream. The watcher has no indication of why the connection failed. This can also happen with witnesses, though the validator case is interesting since it is not under the control or influence of the controller.

This is mitigated if the material is located in an attachment grouping that has a base64 4 char quadlet length, though it renders that group unreadable. If an attachment with an unknown code is outside one of these groupings, however, the failure is unrecoverable.

@SmithSamuelM
Copy link
Contributor

SmithSamuelM commented Apr 12, 2021

This is handled in the latest python code in The msg processor logic in Kevery.msgProcessor(). Note this uses new tritet sniff defined in KID 001 to determine if subsequent data in stream is either 1) another message or 2) a counter and if 2) if the counter is in qb64 or qb2. this resolves all the issues you mention above as per the design outlined in KID001. Knowing qb64 or qb2 for a counter allows the processor to count either qb2 triplets (3 bytes) or qb64 quadlets (4 bytes). The counter's count is the same in either case by design (hence not counting bytes but counting quadlets/triplets appropriately. When in qb2 the counter is counting triplets and in qb64 the counter is counting quadlets (as per the design described in KID001). This new design as outlined in KID001 requires that all attachments be in a counted group. No bare primitives. But even with bare primitives each primitive is self-framing so its never the case that the stream parser does not know how to parse a primitive but may not know how many primitives to parse. Without a counter there is no way to know if the stream cold start state has changed because primitives doe not indicate stream state but counters do. So as per the revised requirement in KID001. All attachment primitives are in a counted group and may be in nested counted groups.

Multiple hetero attachments may benefit from a prepended pipeline counter when provided that counts total quadlets/teriplets in attachments but this is not required given the sniffer that is looking for either the next counted group or the next message.

A set of attachments may be terminated in one of four ways:

  1. pipeline framing counter (explicitly counts number of quadlets/triplets as determined by form of counter (qb2 or qb64) As per the rule if a group counter is in qb64/qb2 then all the groups is counts must be qb64/qb2 appropriately.

  2. no pipeline framing counter but msg processor is running in framed mode where it depends on the stream always supplying full set of attachments and an empty stream indicates end of attachments unless another message is part of the stream in which case the msg presence terminates attachments. But if empty stream without another message then attachments terminated.

  3. no pipeline framing counter and msg processor not running in framed mode so it must wait for next msg to arrive to know that previous set of attachments for previous message has completed.

  4. Not yet implemented is support for non pipeline group counter that counts total groups in attachments. This allows message processor to know how many heterogenous attachment groups to process without either being in framing mode or waiting for next message. This is similar to 1) but does not allow pipelining because pipelining requires being able to extract full set of attachments from stream without parsing into attachments. 4) requires parsing but with known end of attachment condition.

Cases 1) and 4) can be nested i.e. 4) can be after 1)

The most performant reliable approach is to use 1) or 4) or both.

Case 3) has the problem that the delay for the next message may be a long time and the processor cannot complete the attachments until it gets the next message but 3) is safe.

Case 2) has a corner condition where the buffer size of the either side of the TCP buffer (source/destination) may not accommodate all the attachments from one message and the end of the buffer exactly coincides with the end of an attachment group (ending in the middle of a group is handled as that is detectable while parsing). In this rare corner case where the buffer fills on exactly a group boundary, one or more following attachment groups would be discarded when they show up later when the TCP buffer is emptied enough to accommodate byes from the next group. But as long as one byte makes it in then its detectable.. This may be extremely unlikely due to how TCP works under the hood. Normally if the destination buffer is full TCP keeps the buffered attachments in its source buffer and delivers them on the next iteration. Alternatively if the source buffer fills and not the destination buffer then they may be delivered incomplete with enough time on the destination end to believe that the the stream is empty. So 2 is not completely safe but usable in testing mode but not recommend in production. In production one should use 1) and/or 4) when provided and fall back to 3) when not.

As the combination is guaranteed to be safe as long as the TCP connection does not close mid attachments. When a TCP connection closes prematurely for whatever reason, it fails catastrophically where any in transit bytes are lost with no way of recovery. An external framing mechanism that wraps bare TCP is required for recovery. Given that to ensure totally reliable TCP connections one has to add a reliable framing wrapper to TCP connections then UDP becomes the better choice. To restate, ultimately the final solution for maximum scalability and reliability in KERI for bare metal is to use UDP with its own framing mechanism wrapper that is not susceptible to TCP failure. If one has to wrap TCP with a reliable framing wrapper to account for unrecoverable TCP connection closure then one is better off to apply that wrapper to UDP instead because then one also benefits from UDPs better scalability wrt TCP. Ultimately this is where KERI will end up. With UDP being the perfomrnat protocol and TCP used for testing and less demanding application. We started with TCP because it was the easiest place to start to work out the kinks. And it is good enough given the rarity of premature closure in lightly loaded applications of TCP. But at scale UDP with a fragmentation framing wrapper will be better in every way to TCP without such as wrapper. With UDP datagrams we are always running in framed mode but with the framing wrapper whose frames are always reliable and safer than unwrapped TCP.

See the design of RAET for an example of a reliable framing (fragmentation) wrapper for UDP. https://github.com/RaetProtocol/raet that is both more reliable and more performant than unwrapped TCP or even a partially wrapped TCP using ZeroMQ for pub/sub.

My plan is to adapt a simple but composable (qb2/qb64) version of RAET using counters to enable a framing wrapper in UDP for KERI.

@stevetodd
Copy link
Contributor Author

Ok, so the blast radius is limited to the group. From my reading of the code, keripy currently wraps all the attachments in a single group, which means that at best the only the items before the unknown code can be processed. Do you have plans to limit that further? It doesn't seem like you are aiming at wrapping each item in a counted group, or are you? What am I missing that makes it so that we don't lose some of the attachments when an unknown code appears?

@pfeairheller
Copy link
Contributor

@stevetodd I'm confused by what you're asking here. You ask about how keripy is wrapping out going messages and its concern about someone else receiving a bad message code. I assume that the sender of a message intends to only send valid attachment codes so it is safe to wrap the attachments as it sees fit (optimizing at much as possible I would imagine) and not worry about how the receiver might handle a bad code that it doesn't mean to send.

I do get your concern about the bad codes and ran into that implementing the parser in Go. The "blast radius" (love the name) is the group and we drop the connection because I couldn't figure out how to recover at that point. My reasoning here was that updates to the log are idempotent so if something just got garbled in the connection, we'll reconnect and I'll be able to catch up by starting after the last event I fully received. In the meantime, that event will end up in an escrow if needed and be dealt with in due time. If someone is just broken and sending garbage codes they won't get processed.

@stevetodd
Copy link
Contributor Author

Sorry, I could have phrased the original description of the issue better.

The issue I'm raising here is about the future time when additional signature and digest algorithms are standardized to be used with KERI. When that happens, additional codes will be added to the table of standard signature and digest algorithms. Not all nodes will be updated at the same time and some operators will choose to use the new algorithms before all nodes have been updated. As a result, there will be some nodes that receive signatures and digests with codes they don't recognize. When these are encountered by the decoder, it doesn't have the code in a table to look up the length, and, as a result, it must abort decoding the rest of the group. If the material is outside of a counted group, the decoder doesn't know how to advance beyond it and must terminate the connection.

Of course, a node can't validate a signature with a code it doesn't recognize, and shouldn't propagate material that it can't validate, an updated node may propagate a receipt from another updated node with the new algorithm in the same counted group with the controller signatures and witness receipts, which depending on the ordering of the material in the group and how the implementation is written, could delay the node "seeing"/accepting the event. Sam would have a better idea of whether that matters from a security standpoint.

@stevetodd
Copy link
Contributor Author

I should add that security isn't my main concern, but more from a standpoint of things shouldn't break when an expected future enhancement arrives. a.k.a. forward compatibility.

@pfeairheller
Copy link
Contributor

The message version is the first thing processed and should address your concerns. A given implementation of the KERI protocol should reject a message that is for a version it does not yet support and thus it will never get to a signature code it does not yet recognize.

@stevetodd
Copy link
Contributor Author

stevetodd commented Apr 15, 2021

Yep, that works for controller signatures and witness receipts since they will never use an algorithm beyond the version number in the event. VRCs, however, can and that has now been taken care of by the -F code @SmithSamuelM added in #130.

@stevetodd
Copy link
Contributor Author

We can close this as resolved for now, since the -F code takes care of the most proximate needs.

I don't know if it is worthwhile to consider at this point, but I would expect that at some point an organization is going to want to privately support a non-standard algorithm or configuration for their internal use. I'd be interested sometime to have a discussion about what kind of effort would be required on their part to do that.

@SmithSamuelM
Copy link
Contributor

SmithSamuelM commented Apr 15, 2021

To clarify the relevant version number is not the version number of the event but the version number in the message itself. When the message is an event message they are the same but when the message is a rct, vrc, or ksn then the message may have a version number that is different from the version number of the associated event. What matters to the stream parser is the version number in the msg it is parsing (not the associated event) and that version determines what attachments it recognizes on that message. So a stream parser will never see an attachment on a message that is not supported by the version in the message to which the attachment is attached. If it does its an error and the stream parser should flush the stream (or if its pipelined flush the pipeline count of bytes). So the new -F## code didn't fix anything with respect to core issue raised here of how to deal with unsupported attachment codes. It does simplify the ksn and will enable us to deprecate the VRC as some point.

@stevetodd
Copy link
Contributor Author

I read Sam's description wrong in #130. Repeated here:

The new attachment group is a counter with code -F## where ## is replaced by the two character Base64 count.

I read that as the count of base64 characters. It isn't.

@SmithSamuelM
Copy link
Contributor

SmithSamuelM commented May 11, 2021

As far as I can tell all @stevetodd concerns were addressed separately above.

  • backward compatibility (version string in event) for unknown attachment types
  • blast radius allows restart without dropping connection. The -V## allows one to "frame" the attachments putting in the same class as framed protocols.
  • in general with tcp connections if parser for any reason loses sync with stream it may drop the connection forcing a clean restart to resync. This is only a problem for bare TCP. Framed protocols such as UDP and HTTP merely require dropping the packet not dropping the connection.

The python code implements the new logic.
@stevetodd please describe exactly what is not addressed in the python code.

@chunningham
Copy link
Contributor

Unless I'm misunderstanding, in any versioned protocol breaking versions by definition are not traversable forwards-wise by deployments which are not up to date. The version string which can be peeked allows for nodes to determine if they can or can't process a packet, but conceptually a version-lagging node can only escrow or drop it.

@stevetodd
Copy link
Contributor Author

The issue is concerning VRCs attached to events. A watcher, for example, may be at a newer version using a signature algorithm that has a code in v2 and not v1. A watcher sending that event to a validator does not know the version the validator is running. If the validator is running v1, and the watcher running v2 attaches its receipt with the new signature code in a -F, then the validator's parser will have to abort/restart. It would be possible to keep the attachments processed before the receipt, but any after wouldn't not be reachable. keripy currently drops all the attachments.

Some options for dealing with this are:

  1. Require that a node never appends attachments with codes that are newer than the version written in the message they are attached to. A node running the newer version would need to take care when propagating an event to only add attachments to the event that are defined within the version indicated in the event. Any receipts with codes defined in newer versions would need to travel separately in a separate receipt message with the newer version number indicated.
  2. Wrap each receipt's signatures from a transferrable identifier that has signature codes from a version newer than the message's version with a -V## frame. Nodes running versions that don't understand the codes can drop the frame/receipt without negative impact, and ones that do understand the codes can process them. Would probably need to wrap all the -V## frames with another frame that has the count of -V## frames. Not sure if any of the already specified codes cover that.
  3. Prefix all signatures, digests, and basic identifiers in attachments with lengths. Put identifiers/codes for algorithms in a registry that can be added to without having to the rev the version.

IMHO, Option 1 is the most straight forward, though over time I would expect that we would almost always have to send two messages to convey receipts: one for the event itself and a second to attach all the receipts to. This is based on my expectation that over the long term that new signature algorithms will be invented that will become more popular than the ones chosen today. Maybe I'm wrong on that. Older identifiers will have events on older versions that are being receipted by watchers on newer versions, which receipts, under this option, will need to be delivered attached to a separate message. The additional bytes of an additional message aren't necessarily egregious so it's not a bad option.

Option 2 is more efficient from a size perspective, depending on the number of attached receipts. I'm not sure on what the vision is how many receipts we're going to be ferrying around with each event

Option 3 is probably too radical at this point but would allow for more flexibility around algorithms. This would also allow version numbers to be a little more stable over time. This would probably invite unwanted requests for additional algorithms.

Perhaps, its not worthwhile to consider this, but how do people see it working rolling out a new version whether it is for new algorithms or something else?

@SmithSamuelM
Copy link
Contributor

SmithSamuelM commented May 13, 2021

I don't think there has ever been any contention that by default option 1 is how things should be. In any security protocol allowing mixed versions is always problematic. Using a different version for attachments from the the message body that they are attached too should not be allowed in general. Its just a bad idea. So @stevetodd if your concern is that we have not explicitly made that a requirement then we definitely should as it has always been assumed. Early in the protocol development when we discussed versioning, that was the posture we assumed, if not explicitly stated. Clearly now that we have more complex hetero attachments the possibility for someone to do it wrong has increased but the policy has not changed. So having different versions for events from the message body to which they are attached is not allowed by KERI. I don't see any use case where it would be needed. Disjoint replay already allows for signatures that are signing one message payload to be attached to a different message payload. Such as a receipt. So we already account for that in the protocol. As explained above. Disjoint replay allows the version string in the message body to which the attachment is attached to be different from the version string in the original message body that the attachments actually sign. The normative version of both the message body and all its attachments is the version string of that message body, not the version of the message that the signatures may be signing . This has never been at issue if maybe not clearly understood.

The distinction between message body and attachment is not based on the attachments being somehow separable from the message body. They are both part of the message and not separable. Clearly the KEL stores attachments as a requirement to verification. Without stored signatures and receipted signatures the KEL is not verifiable. So the attachments are every bit a part of the "message" as the message body. The only distinction between the two is that the message body may be signed and the attachments may be the signatures.

But disjoint replay of receipts already allows attachments that are signatures on a different message body but are attached to some other message body. That other message body to which the signatures are attached may have a different version string that the original message body to which the signatures are signing. The message body to which the attachments are attached determines the version of the attachments. This is unambiguous. Given we allow disjoint replay of attachments, there is no reason to have separate versioning of attachments from the message body to which they are attached. One can use upgraded disjoint message bodies to convey newer versions of attachments than the original messages. A host that does not support the newer versions would just drop the disjoint messages with newer versions. If the host supports the newer versions then it will accept them. A conjoint replay would only be able to include attachments that are of the same version. But so far we have not encountered a case where that prevents full conjoint replay. For example in the keripy code, an event receipt message with receipt couples attached but where those couples belong to witnesses of the associated event is processed and stored the same as if the attachments were indexed witness receipt signatures. This was done so as not to break existing implementation that do not support the new indexed witness receipt counter type.

Because of the asynchronmous nature of KER,I it is problematic to attach signatures or receipts via an envelope. But lack of an envelope does not change the fact that the signatures are part and parcel of the "message" and are essential to correct replay of a KEL. Thus it would not make sense at all to allow different versions of attachments from the message to which they are attached. They are one combined whole.

I have yet to see a suggestion for other enveloping protocols where signatures are included in an envelope, that allows the signatures to be a different version than the message envelope they are contained in unless it is a specific feature of the protocol that each enveloped attachment may provide its own separate versioning system. But this feature is roundly criticized as opening up security vulnerabilities. It complicates things for little apparent gain. Also it would only makes sense in a verbose enveloped protocol because the envelope synchronizes the attachments and the verbosity allows multiple fields for each attachment.

In contrast, a compact performant asynchronous protocol like KERI where attachments may be collected independently there has never been an allowance for versioning separate from the message version to which the attachments are immediately attached and attempting to add versioning separately to attachments is antithetical to the performant nature of KERI streaming not to mention an unnecessary complication and a security vulnerability. Doing so on an attachment by attachment basis defeats the compactness advantages of not enveloping not just the asynchronous advantages. I hope this clarifies your concerns.

@SmithSamuelM
Copy link
Contributor

SmithSamuelM commented May 13, 2021

In general we are quite loose in our use of terminology but the normative definition in the KERI white paper is not loose.

Section 7.11 defines a message thusly:
A message as used by the protocol is a serialized data structure with one or more attached signatures. The data structure must be serialized in order to be digitally signed. A message is a well defined type of signed or verifiable statement.
Section 7.12 defines an event thusly:
An event may be represented or conveyed by a message associated with the protocol, typically an operation associated with an identifier. Event messages are expressed as well defined verifiable statements. The protocol is primarily concerned with creating, transmitting, validating, and logging events as conveyed by messages. The event data may be represented by a serializable data structure that is serialized and then signed to create the event message. Given the one-to-one relationship between an event and its message they may be referred to interchangeably in context

.
Notice that the term event data is used to refer to the serialized signed portion of an event message.

To clarify, the term event data or message data or message payload refers to the serialized portion of the message. This is signed if it's an event message payload and not signed if it's a receipt message payload. We loosely refer to the payload as the message but the message is incomplete without its attachments. And for event messages at least one verifiable signature is required as an attachment for the message to be accepted into escrow at the least. Otherwise it is dropped.

To restate a message is normatively composed of the serialized data payload plus any attachments. Thus the normative version for all the attachments is conveyed by the version string in the payload of the message. This means there is no ambiguity as to the version of the attachments and no allowance for the version of that attachments to ever be different from the version string in the message. But given how loosely we use the term message in the KIDs its no wonder such confusion has arisen.

@SmithSamuelM
Copy link
Contributor

This issue IMHO is looking at the the version future compatibility issue from the wrong perspective. A more helpful perspective is to think of versioning happening at the connection level. The appropriate place to address this is to use a query message to query the supported version of the other side of any connections. For example:

{
  "v" : "KERI10JSON00011c_",  
  "t" : "req",  
  "r" : "vers"
}

Would respond with a version message that included the supported versions of that responder. Then the requester would only use messages with included attachments that were compliant with a single one of the supported versions, i.e. the latest version that was supported by both the requestor and the responder. (see #109)

In contrast the parsing logic and rules for stream parsing are not meant to support any combination of versions from message to message but to allow a parser to operate robustly. And given that an unsupported version of anything is received the most robust response is to drop the connection. This signals to the other side of the connection, usually a stupid or malicious party that something is wrong. Doing more than that is attacking the problem from the wrong angle.

@stevetodd
Copy link
Contributor Author

Ok, so the answer is that in the future, when a new signature algorithm is added to the spec, the version must be bumped. Coveying receitps of events for older versions from tranferable identifiers using the new version will require two messages: (1) the event with its controller signatures and witness receipts/endorsements, and (2) a receipt event with the transferable identifiers receipt of the event. They cannot be combined. The version in the event data is the version for the entire message, attachments and all. Have I summarized that correctly

@SmithSamuelM
Copy link
Contributor

SmithSamuelM commented May 18, 2021

Yes, The corner case of a new receipt on an old event means the new receipt may use a different version number and new signatures from the version. In general this would be a corner case.

Recall that for an indirect mode relationship validators and watchers are not sending receipts to the witnesses.

Communication between a controller and its witnesses is under the control of the controller so it can manage its versioning appropriately.

A watcher that is not updated won't be able to verify new events from an updated witness but that does not impede the controller from upgrading. It just means the watcher won't see any new events until it updates. It couldn't verify them anyway until it updated so splitting the version of the attachments from the version of the event provides no advantage.

Communication between a validator and its watchers is under the control of the validator so it can manage its versioning appropriately. Like wise there is no advantage to splitting the version of the event from the attachments.

The case of unmatched versions is somewhat relevant to direct mode communications.

But recall that there is no reason for new keys after a rotation on a transferable identifier to result in a new receipt for an already receipted the event. It only applies to new events it has never receipted. Direct mode transactions pause until a receipt is returned so it would be odd for one side to create a bunch of events that were not receipted as they were created. Recall that the keys used to sign the original receipt are still used to validate the original receipt not the current keys. So all past events are good. Only new events may be a problem. And splitting the event version from the signature version provides no advantage to verifying new events.

For example should one side in a a direct mode relationship update to a new version of signatures but the other does not then the first one will not be able to use new updated signature types for that relationship until the other side updates as well. But the pair-wise nature of direct mode makes it straightforward to negotiate upgrades by informing the other side of a planned upgrade. Because the relationship has dedicated identifiers any version slippage is limited to specific relationships.

The corner case where unmatched versions with later receipts is most likely to be relevant is the global watcher network were various watcher services might first see old KELs with older versions and the watcher endorses that old KEL by receipting it
To some validator that is subscribed to the watcher service. But the watcher endorsement is not authoritative for that event its just to prevent DDOS. Recall that duplicity is provable without any watcher signature or receipt. So watcher services can easily provide endorsements using old signatures for backwards compatibility without materially hurting their duplicity detection fidelity.

I would be interested in a use case where it is necessary for security reasons not merely DDOS protection to split the version of the attachment from the event itself on new events particularly or even old events.

The current Keripy code does not enforce indirect vs direct mode for deciding whether or not to send receipts. It just always receipts. But this mode differentiation will be added soon now that support for witnesses has been added.

@stevetodd
Copy link
Contributor Author

I have no objections to closing this issue.

@SmithSamuelM
Copy link
Contributor

I will keep the issue open until we move the Normative language about versioning for message body and message foot (attachments) into the spec. As well as the query version request message

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants