Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README and test vectors should reflect new 32 byte hex key expectations from branca-js #37

Open
grempe opened this issue Jun 22, 2021 · 18 comments

Comments

@grempe
Copy link

grempe commented Jun 22, 2021

The Javascript reference implementation has recently changed to enforce the use of more secure 32 byte hex keys, or a 32 byte Buffer. This change is to require the use of cryptographic keys of the proper length and construction as expected by the underlying encryption algorithm.

https://github.com/tuupola/branca-js

The spec should be modified to:

  • add additional context to the spec README about why this change was made
  • note that it is likely a breaking change and thus might require version bumping the spec
  • modify the test vectors so they will work with the new key mechanism

Other downstream implementations should also be encouraged to adopt the new spec to maintain cross library interoperability.

@brycx
Copy link
Contributor

brycx commented Jun 22, 2021

I think these are all good points, and I agree these are useful to make clear. I, however, do not agree that this repository is the right place to do so.

The changes listed are all related to a single implementation of this spec. A spec, that to my knowledge, does not, and IMO shold not, track changes in a specific implementation. That belongs to the changelog of that specific library. I see nowhere that the spec requires that secret key to be an ASCII string, which IIRC was the issue brought up with branca-js. Perhaps, I'm overlooking something?

There are other implementations that accept raw byte-arrays, for secret keys, for instance.

Perhaps, it's worth looking in to how the secret key is currently described in the spec instead?

When you mention the test vectors should be modified, do you mean the ones in this repository or that of branca-js?

@grempe
Copy link
Author

grempe commented Jun 23, 2021

Hi @brycx and thanks for the thoughtful comments.

For those who may not be aware, I filed an issue and a PR was accepted to resolve what I saw as a serious security flaw in the implied and tested methods of providing security keys to Branca clients. This is not specific to a single implementation. I'll provide some additional info to support that assertion.

The original PR and some of my rationale can be found here:

tuupola/branca-js#12

Fundamentally, this comes down to the unsafe use of non-cryptographically random values for encryption keys. This flaw is inherent in the spec and the test vectors that the spec provides, and that all implementations are expected to adhere to.

32 character strings are not safe encryption keys

A 32 character string has far less entropy than a 32 Byte value. The string is further weakened by the fact that it is informally implied in the spec (via the examples and test vectors) that the use of strings like supersecretkeyyoushouldnotcommit might be acceptable. This is found in the library implementation as well as the spec test vectors.

The main website, and the spec vectors, all make use of plain text "passwords" which are not suitable as encryption keys.

https://branca.io/

Libsodium, the underlying encryption library for Branca is clear on what makes an acceptable key:

tuupola/branca-js#12 (comment)

It is never recommended in the cryptography community to use plain text strings as encryption keys. In cases where user memorability of the key might be a desirable feature the use of key stretching algorithms like PBKDF2 or Argon2 are required to derive a strong encryption key of at least 32 bytes. These are intentionally slow hashing algorithms, and by performing this "key spreading" they help prevent against brute force attacks.

Branca libraries/spec that use passwords as keys (and possibly user memorable passwords?) are much more susceptible to brute force attacks.

The spec is silent on how implementations should handle keys, but the vectors are not

While the text of the spec itself seems to be mostly silent on how to handle encryption keys, the test vectors are not. To my knowledge the vectors require conformant libraries to be able to accept a 32 character string, which they pass directly to the encryption library as a key. Even if one passed in a hex key (even if output from a cryptographic secure random number generator), it could only be equivalent to a 16 byte random key.

Since, to my knowledge, all of the library implementations follow this spec and its vectors, they are all flawed in this regard.

Clarifying that libraries should require 32 byte hex strings (or 32 byte byte arrays) as keys is a breaking change

If libraries do as they should, and require 32 byte keys, this represents a breaking change to the spec and potentially all conformant libraries.

Elixir:

32 character string keys.

https://github.com/tuupola/branca-elixir/blob/master/lib/branca.ex#L14

Go:

32 character string keys.

https://github.com/hako/branca/blob/master/branca.go#L42
https://github.com/hako/branca/blob/master/branca.go#L95

JS:

32 Byte (64 char) hex string, or 32 Byte Buffer.

Now fixed, but now also a breaking change with keys generated from older version of lib, and now incompatible with other implementations. New version was not published with SemVer major release bump.

https://github.com/tuupola/branca-js

PHP:

32 Byte (64 char) hex string, or 32 Byte Array.

I believe @tuupola has also just modified this lib to work the way that the JS lib works now.

tuupola/branca-php@10ca91a

Python:

Hybrid

Seems to straddle the line. If bytes are passed as a key they will be used, if not a string conversion to bytes will be done.

https://github.com/tuupola/pybranca/blob/master/branca.py#L42

The test cases for the Python lib only ever exercises the string password (test vector) case.

https://github.com/tuupola/pybranca/blob/master/branca_test.py

Rust:

The same. Uses the provided 32 character string directly.

https://github.com/return/branca/blob/master/src/lib.rs#L181

Summary

  • There is already an interoperability issue. Different libs are now doing things different ways. Some in conformance with the test vectors, some straddling the line, and some now safer but non-conformant with the vectors or tokens generated with previous versions. Solving this with clarity in a new version of the spec is the only way to resolve this.
  • The spec, by remaining silent on the need for secure keys and implicitly encouraging the use of password style strings in the text of the spec and website, the test vectors, and the docs of pretty much every lib, is encouraging unsafe use.
  • If libraries implement the safer alternative (an easy change) they are no longer compatible with the spec, or with tokens generated by earlier versions of the library, or with other libraries using the old style.
  • These facts would imply to me that the best way to move forward is to declare a new version of the spec, that is very explicit about what a safe key means, and which provides updated vectors.
  • All library implementations should be encouraged to adopt the new spec, fix their own test vectors, and publish SemVer breaking change releases.

This is hard. I understand that. If standard practice had been adhered to in the creation of the spec, and in the later development of reference and community clients, this would have been a non issue. I also think that fixing it the right way is the right thing to do. Some inspiration might be taken from how Paseto, a project with similar goals, handles these issues. They support the current version of the spec, and the one prior. If a new version comes out, support for the oldest drops.

https://github.com/paragonie/paseto/tree/master/docs/01-Protocol-Versions#rules-for-current-and-future-protocol-versions

This can also be a time to handle any other interoperability issues if they exist.

Cheers.

@grempe
Copy link
Author

grempe commented Jun 23, 2021

cc: @tuupola @hako @return , the library authors.

@brycx
Copy link
Contributor

brycx commented Jun 23, 2021

@grempe Thanks for taking the time to improve Branca. I do have some additional comments.

The main website, and the spec vectors, all make use of plain text "passwords" which are not suitable as encryption keys.

Indeed they are not. I don't believe the intention is to demonstrate otherwise either. This tells me that the website and examples, should use randomly generated keys, instead of strings, in order to correctly convey what sort of source we expect the keys to come from.

Libsodium, the underlying encryption library for Branca is clear on what makes an acceptable key:

Yes, this is true. That is why I previously mentioned looking into how the secret key is described in the spec right now. E.g:

"Encrypt the user given payload with IETF XChaCha20-Poly1305 AEAD with user provided secret key. Use header as the additional data for AEAD."

The part with "user provided" can have two meanings. I believe the user referred to here is the user of a Branca implementation. I see how this could be misinterpreted.

Branca does not handle password-stretching. Indeed, if the source of the secret key is a user-password in form of something directly from a keyboard, then this key should be stretched. This is not within the scope of Branca to support though, so the user of a Branca library must ensure the key used is sufficiently strong. Nonetheless, it seems something worth mentioning in the spec.

While the text of the spec itself seems to be mostly silent on how to handle encryption keys, the test vectors are not. To my knowledge the vectors require conformant libraries to be able to accept a 32 character string, which they pass directly to the encryption library as a key.

The keys used in the test vectors in this repository do not require one to pass them as strings directly. Converting the key-strings to byte-arrays yield arrays of length 32 (unless it's a too-(short/long) key test).

Even if one passed in a hex key (even if output from a cryptographic secure random number generator), it could only be equivalent to a 16 byte random key.

Since, to my knowledge, all of the library implementations follow this spec and its vectors, they are all flawed in this regard.

If one did use hex keys to test, that person would have to convert the hex to byte-arrays first, and then pass it, unless the library provided a method to do so automatically.

What if the the secret keys used in the test vectors were hex-encoded? Would that make it clearer, that one should be able to pass raw bytes?

(I haven't looked at the other implementations listed, since I'm not very familiar with them)

(Rust) The same. Uses the provided 32 character string directly.

I'm not sure what is meant here. key is a reference to a byte-slice (&[u8]). This can be made from a string, CSPRNG, KDF, etc. In the doc example of the new() function, the string b"supersecretkeyyoushouldnotcommit" is converted to a byte-array with the b before, so the key variable there is actually a &[u8].

There is already an interoperability issue. Different libs are now doing things different ways. Some in conformance with the test vectors, some straddling the line, and some now safer but non-conformant with the vectors or tokens generated with previous versions. Solving this with clarity in a new version of the spec is the only way to resolve this.

I think adding a section to the spec about what the key should be is a good idea. There we can clarify that an implementing library must at least accept raw bytes. Interoperability across ecosystems is important.

The spec, by remaining silent on the need for secure keys and implicitly encouraging the use of password style strings in the text of the spec and website, the test vectors, and the docs of pretty much every lib, is encouraging unsafe use.

I can get behind this. As mentioned above, let examples be using CSPRNG and mention this in spec along with references to KDFs.

If libraries implement the safer alternative (an easy change) they are no longer compatible with the spec, or with tokens generated by earlier versions of the library, or with other libraries using the old style.

I don't agree here. They are currently conforming to the spec, arguably due to its vagueness of what the key should be. Libraries accepting raw bytes can produce tokens with string-only passwords too. Libraries generating tokens, without accepting raw bytes, are simply not able to re-produce tokens made by implementations that use e.g. CSPRNG generated-keys.

I still think that, ultimately, the libraries only accepting strings are erroneous in this case. Though, again, the spec can be improved by defining the secret key clearer.

These facts would imply to me that the best way to move forward is to declare a new version of the spec, that is very explicit about what a safe key means, and which provides updated vectors.

So convert the current keys in test vectors to hex-encoded?

Some inspiration might be taken from how Paseto,

Indeed PASETO has some versioning support defined. That aside, I don't see any documentation on PASETO about expectations of the secret key. The only difference with their test vectors, compared to Branca, is that they have a hard-coded hex string used as key, which is decoded, instead of a string.

@tuupola, please correct me if I make any false assumptions.

@brycx
Copy link
Contributor

brycx commented Jun 23, 2021

@grempe

Take a look at #38. Is this along the lines of what you had in mind?

@tuupola
Copy link
Owner

tuupola commented Jun 23, 2021

This spec is only about the token format. This spec is not about which data format implementation should use nor how people should implement the spec.

Fundamentally, this comes down to the unsafe use of non-cryptographically random values for encryption keys. This flaw is inherent in the spec ...

Spec has never said that password style secret keys should be used.

Using the "supersecretkeyyoushouldnotcommit" as the key in the code examples of JavaScript and PHP implementations was an oversight on my side. I was trying to make the documentation easier to read and understand.

However like I have explained to you before ASCII secret key has never been a requirement of the spec. Also with the said implementations it has always been possible to use random byte keys.

If libraries implement the safer alternative (an easy change) they are no longer compatible with the spec, or with tokens generated by earlier versions of the library, or with other libraries using the old style.

Not true. Spec only says key should be 32 bytes. That said spec could better describe how generate a random byte key. #38 takes care of this already.

Spec itself is not broken. This is a documentation issue.

To my knowledge the vectors require conformant libraries to be able to accept a 32 character string, which they pass directly to the encryption library as a key.

Not true. The key just happens to be a string in the test vector file. This does not imply spec requires cleartext keys.

To avoid confusion the test vector file should be changed to use 73757065727365637265746b6579796f7573686f756c646e6f74636f6d6d6974 as the secret key and the nudge maintainers of the implementations to update the unit tests and also not to use cleartext secret keys in their code examples.

Spec itself does not change.

@tuupola
Copy link
Owner

tuupola commented Jun 23, 2021

There is already an interoperability issue.

You mean there is a library which cannot consume token generated by another library. Which library combination?

@tuupola
Copy link
Owner

tuupola commented Jun 23, 2021

PHP: I believe @tuupola has also just modified this lib to work the way that the JS lib works now.
tuupola/branca-php@10ca91a

Nope, I just changed the code examples in README. Library itself was not changed. It was a documentation issue.

@tuupola
Copy link
Owner

tuupola commented Jun 23, 2021

JS: Now fixed, but now also a breaking change with keys generated from older version of lib, and now incompatible with other implementations.

Just tested and I am able consume tokens generated with JavaScript version with PHP version and vice versa. I do not know which implementation you found to be incompatible but could you open a bug report in the corresponding repository.

New version was not published with SemVer major release bump.

There is no need to do a major version bump when breaking BC before 1.0.0.

@grempe
Copy link
Author

grempe commented Jun 23, 2021

Thanks @tuupola for engaging in the conversation. My only intention is to try to remedy a problem I see.

I will take a moment to digest your comments and your commits. I did want to clarify two things though:

KDF

I don't mean to imply that Branca, in spec or implementations, should make mention of or encourage use of a KDF to stretch passwords into suitable key material. I would like to suggest that it be recommended that any key provided to Branca be the output of a suitable CSPRNG as 32 fully random bytes in hex or Buffer style array of bytes form. It should be suggested that the "best" way to serialize/deserialize this 32 byte key is as a 64 char hex string. e.g. for use when you are storing an encrypted token encryption key in your app for use in creating Branca tokens. The idea of using a "password/passphrase" as the key should never be mentioned, or even implied.

Interop

On the topic of interoperability. I have not done any testing of the interop between libraries and my code links were only meant to demonstrate how what was published in the spec/vectors heavily influenced the implementations.

I did do some testing of what developers will experience when they upgrade from JS 0.3.0 to 0.4.0, which is breakage. It is possible for them to fix that breakage and continue to accept tokens generated by 0.3.0 but that is not documented for them. Here is a set of tests your can run that demonstrate this:

https://runkit.com/grempe/60d355d6a6c6ff001ac5ecdd#

The key piece of knowledge is to know that you must convert your v0.3.0 key using:

Buffer.from(keyString, 'utf8').toString('hex')

I think the documented goal should be that a 32 byte hex string (derived from a CSPRNG), or the conversion of that string to the equivalent of a JS Buffer (a byte array), should just work as a key for any conformant library on any platform. And tokens generated with that key should be readable cross library. This requires that libraries do no manipulate the keys in any way (e.g. KDF) in order for the same raw key to always be passed down to libSodium. I think the JS lib handles this properly now (although arguably it could provide additional info about the need for key conversion when a non-hex/non-buffer string is provided).

As an aside, I ran into this article this morning which is on topic. The author also ran into issues related to key management with Branca. Moving to this new method universally might help resolve these types of issues:

Then I looked at using Google’s Tink which has pure-Java implementation of the required algorithms. It is also opinionated about key management. The opinions seem very smart, but it also makes it hard to just pass in a string as an encryption key. Passing in a string seems like not so great idea, but I’d like my library to be interoperable with the other Branca libraries that do exactly that.

https://quanttype.net/posts/2020-10-04-branca-and-yak-shaving.html

@brycx
Copy link
Contributor

brycx commented Jun 23, 2021

It should be suggested that the "best" way to serialize/deserialize this 32 byte key is as a 64 char hex string

This makes little sense to me. Any other serialization is just as applicable, since the keys should be passed as raw bytes in the end. Also, still not in scope IMO.

@grempe
Copy link
Author

grempe commented Jun 23, 2021

I think #37 and #39 do help to clarify.

I am glad to see the Rust lib passing with the new vector keys.

@brycx

This makes little sense to me. Any other serialization is just as applicable, since the keys should be passed as raw bytes in the end. Also, still not in scope IMO.

I agree that getting the raw bytes of the key is what matters most. Requiring hex ser/deser was not my intent, only an approach that would perhaps be shown in the example docs since handling string serialized keys is obviously more convenient in most cases. If that is the case though, and the spec is agnostic about ser/deser, then perhaps the libraries should be encouraged in the spec to require acceptance of the platform specific method of passing in a byte array as the key (e.g. Buffer in Node). Acceptance of any other serialization (e.g. hex, Base64, etc) is an implementation specific convenience. IIRC the JS lib used to only document and test passing in a string. I am not sure if there are still implementations that expect only a string since that's what the vectors used. Is it known if all libs work with raw byte arrays?

One idea would be to handle some of this in the spec in the form of a section related to Guidelines for Implementors which would encourage (though not require) certain expectations for interfaces. The simpler these rules are, the better.

e.g. (with a few extra just to provide an example)

As a library implementer your library should follow suggested interface guidelines:

  • Keys MUST be able to be provided as a raw byte array of 32 bytes.
  • Keys MAY be provided in additional serialization forms (e.g. Hex, Base64) as a convenience for developers.
  • Token decode operations MUST be able to return the raw bytes of the payload.
  • Token decode operations MAY provide alternative output forms of the raw payload (e.g. JSON, string) as a convenience for developers.
  • (insert other recommended guidelines for the shape of the interface for implementors here)...

These types of guidelines (not requirements) make it much easier for library developers to remain conformant IMHO. Since the API for Branca is generally pretty simple (init, encode, decode) there would not be many of these suggestions.

@brycx
Copy link
Contributor

brycx commented Jun 24, 2021

I am not sure if there are still implementations that expect only a string since that's what the vectors used. Is it known if all libs work with raw byte arrays?

No, this is not known to me. The libraries listed are not vetted before being put there, and some have much more serious issues.

Keys MUST be able to be provided as a raw byte array of 32 bytes.

The spec now states the key is a sequence of 32 arbitrary bytes (was also the case before, though not as clearly stated). Which means, like before, to adhere to the spec, you'd have to be able to create tokens with any byte-sequence.

Payload is already defined as arbitrary in the spec, so the same applies here as with the key.

I just tried to write alternative guidelines (e.g without "MUST" which seems out-of-place in guidelines compared to requirements), but since everything is listed as arbitrary bytes, I'm having a hard time putting something together that doesn't seem obvious. To me it's obvious that you can't make assumptions on what these parameters contain, except for bytes.

The problem seems to me that branca-js didn't accept raw bytes, when it should have.

The rest of the guidelines you've listed are mostly on API design and usability, which I don't believe belong here.

@grempe Do you have an example for where such guidelines, or similar ones, are used in other similar projects?

@tuupola
Copy link
Owner

tuupola commented Jun 24, 2021

I did do some testing of what developers will experience when they upgrade from JS 0.3.0 to 0.4.0, which is breakage. It is possible for them to fix that breakage and continue to accept tokens generated by 0.3.0.

This has nothing to do with the spec. It is an API change in the JavaScript implementation.

I think the documented goal should be that a 32 byte hex string (derived from a CSPRNG), or the conversion of that string to the equivalent of a JS Buffer (a byte array), should just work as a key for any conformant library on any platform.

While hex is a convenient and probably most universal way to serialise it should not be a requirement. This spec is only about token format, not how to implement a library.

And tokens generated with that key should be readable cross library.

Yes, this is obvious and this is why the test vectors exist.

This requires that libraries do no manipulate the keys in any way (e.g. KDF) in order for the same raw key to always be passed down to libSodium.

It they would it would be against the spec and the unit tests would fail.

Requiring hex ser/deser was not my intent, only an approach that would perhaps be shown in the example docs since handling string serialized keys is obviously more convenient in most cases.

Which should be done in the library documentation. More docs is always a good idea.

On the topic of interoperability. I have not done any testing of the interop between libraries and my code links were only meant to demonstrate how what was published in the spec/vectors heavily influenced the implementations.

You said there are interoperability issues which is why I asked.

@tuupola
Copy link
Owner

tuupola commented Jun 24, 2021

The problem seems to me that branca-js didn't accept raw bytes, when it should have.

Actually it did. You could pass the key either as an instance of a Buffer or a string. To make it more difficult to shoot yourself in a foot a change was made where strings are not used directly anymore. If user passes a string as the key it is assumed to be hex encoded key.

@brycx
Copy link
Contributor

brycx commented Jun 24, 2021

Actually it did.

Apologies, that's my misunderstanding then.

@tuupola
Copy link
Owner

tuupola commented Jun 24, 2021

One idea would be to handle some of this in the spec in the form of a section related to Guidelines for Implementors which would encourage (though not require) certain expectations for interfaces.

I have too been thinking about a secondary doc (not the spec itself) or like you said guidelines which would contain tidbits which do not belong to the directly to the spec. Stuff like the fact that Branca tokens are sortable by their creation date. Also comment about timestamp is not part of the spec, instead it is a guideline.

Will think about this a bit.

@grempe
Copy link
Author

grempe commented Jun 24, 2021

OK, @tuupola and @brycx.

Actually it did.

It did. But it was undocumented, untested, and this capability was not referenced in the docs. It was also not intuitive to figure out even if you read the code since you would be parsing a Buffer from a Buffer. My original issue, and the pull requests I provided, and subsequent doc changes, made the ability to pass in a Buffer or a hex string explicit.

No, this is not known to me. The libraries listed are not vetted before being put there, and some have much more serious issues.

I'd suggest this is an area of improvement for anywhere you link to external projects. If the lib doesn't demonstrate conformance with the spec it should probably be noted how it is, or is not, conformant.

An example of this would be on the Paseto project. In their case they are indicating conformance with the various versions of the spec.

https://paseto.io/

JWT.io also provides great detail about what their clients support, and what they don't.

https://jwt.io/

I think I have made any points I wanted to make. I look forward to seeing any additions or clarifications you may make.

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants