-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design decisions related to ML-KEM and ML-DSA keys. #26652
Comments
Consideration of the very long term support guarantees the project has are relevant here. It would be less than ideal to support an effectively dead format for half a decade or more. |
The counter-argument is that if this is helpful to users, the "support" in question amounts to one line in a table. And we still don't know what LAMPS will come up with... |
If there is yet another format spec after we release, then that will still require changes regardless of if everything is data driven or not. |
A further option could be having a minimal table of formats and letting the user specify an additional one (or more). Compatibility with everything weird and wonderful would be maintained without polluting the code with "unused" things. |
First of all: what's raw format? I missed that one being discussed or mentioned. Second of all, I'm rather conflicted: I wish that there was One True Format that we've all agreed on right now, but I don't see that happening very quickly at IETF, let alone it being frozen in an RFC. So I think we need to support all the different formats: that will lead to a much better user experience. I need to put some blame on us, as we do ship oqsprovider in Centos 10 Stream that will output PKCS#8 files with NIST OIDs but in OQS provider format. Sorry about it!. (We just tried to fix it, so that it produces files in one of the older formats specified in the IETF draft, but that breaks some pretty core functionality: open-quantum-safe/oqs-provider#637). We will also extensively document that those are not final formats. One saving grace of this, is that we're talking about private key format, so unlike with values that we need to assume come from an attacker (public keys, TLS, etc.), I think we can apply Postel's law here and be liberal with what we accept. So:
I haven't grokked ML-DSA yet (been focusing on side-channel analysis of ML-KEM), so may be missing something, but it seems to me like the private and public parts of ML-DSA have slightly different variables: (ro, ..., t_0) vs (ro, t_1). Can we derive t_1 from the private key parameters? For ML-KEM conversion of the private to public key is trivial, as the private key has the full public key embedded inside. Haven't looked at SLH-DSA at all... (I think it's just that the operation is computationally expensive?) |
We default to retaining the seed value. If provided on input, or generated, it is included in the output (unless the user chooses explicitly to not retain or output the seed). I don't see the value of excluding some (non-default) formats from being written. Users may legitimately need them for interop. They're unlikely to output these by accident. In ML-DSA, the public key can be computed from just the private parts of the key, and this is what happens on import. So there is no possibility of a mismatch in ML-DSA. By way of contrast, in ML-KEM, the embedded public key could actually be incompatible with the enclosing private key, but (pending PR) we'll be doing a PCT test that will exclude that possibility with high probability. [ It would be good to have Paul comment on why the entropy in the PCT test is not randomly generated, is there a good reason why high quality random data might not yet be available during the PCT test? If there isn't I should random data, rather than having the full private + public key material ]. As for "raw" keys that's an EVP concept, to distinguish e.g. internal EC point encoding from wire EC point encoding. To hedge the bets a bit, the format table also supports bare-non-ASN.1 forms of each of the public and private keys with no ASN.1 octet wrapping, I hope those won't be the formats, but we'll see. |
For the most part, these PCTs are pointless. On key gen they are a pure waste of effort. On import there might be some minimal benefit for some algorithms to detect mismatched or corrupted keys but the chance of that is really small & any error will appear later. It isn't worth the performance hit making the check for the benefit gained. For the PQ algorithms, the benefits are further reduced. Consequently, it is desirable to get past them as fast as possible. Using high quality entropy is not required by the FIPS 140-3 IGs, so it's better & easier to avoid it. |
Excellent write up @slontis Right now there is a range of incompatible formats out there in current usage. There is a split among the early adopters in terms of what should be supported and there is no clear "one-true-format" and there may never be. We need to make a pragmatic decision here - and it isn't a technical one - we can review the technical implementation to get a sense of the cost of maintaining it - and with the way it has been done - I for one think the cost is sufficiently low that the user benefit outweighs that cost. I've chatted with quite a few folks and no one is particularly happy at the state of things in terms of the mess - but we also need to keep in mind that things will not settle down any time soon. We will need to be able to generate multiple output formats - that is very clear. There is not a single output format that is universally acceptable. Once we get to the position of there isn't a single one then it is a matter of accepting that we have a range and the incremental cost of adding additional items to a table is sufficiently small enough to be negligible in my view. Note I don't hold that view generically - i.e. if the code wasn't clean and straight forward it would tip the balance. And if there was a reasonable universal output format that would also make my view different. @vdukhovni and I have been following all the developments on the LAMPs mailing list and participating in lots of on-list and lots of off-list discussions all aimed at seeing if we could get to a single format that works for everyone - and that is clearly not going to happen any time soon. We don't have the luxury of waiting a year or so to see how things pan out with deployments. And once we are multi-format I also don't see any benefit of not supporting the oqs-provider formats as that does ease things for the early adopters who have been using that code to prototype - we (OpenSSL) pointed people there prior to having our in-tree solutions and when the effort is minimal to support it, I don't see a justification to not do that. We also already added in the oqs_provider names for algorithms which went in without any controversy - which will mean those names that do not match the standard will get used and will be around "forever" as such. I'm much less inclined to propagate algorithm names that users will use going forward that do not match a specification document than I am to support file formats that are in use. But the aliases are in - and I'm not suggesting removing those. To me compatibility with a user base that we also encouraged to occur isn't a bad thing - it does come at a cost - but I don't see that cost as sufficiently high to justify removing working code (names or format handling). I would also pick our default output format for maximum interoperability - and that remains a challenge - as the IETF format may end up not actually being maximally interoperable - it may lead to a balkanised user base - and maximum interoperability should remain our driver here in my view - with interoperability being more important than strict IETF specification compliance. |
Given we already have the code I am also inclined to allow all the possible implemented variants on input (assuming all are non-ambiguous and can be safely distinguished by ASN.1 tags or lengths) and support full configurability of the output format. I am not so sure about the need to have the supported input formats configurable instead of just importing everything that we can import. One argument could be seen for making the potential attack surface smaller, but then I would say that should be a build-time configurable. |
The downside of The input formats are unambiguous. They are matched by both ASN.1 and length except for the two "bare" formats of just seed or just key with no OCTET string wrapping, which are matched by length alone. The order of input formats is documented irrelevant for that reason, so a decision to not support disabling some of then is possible, but then LAMPS purists can't turn off the "both" format if they hate it so much they can't abide supporting reading it, ... and the "bare" key format that might not be in use anywhere, and is perhaps more of a proof of concept than something known to be used, can't be turned off (we could of course drop that one before the release, if we're sure it won't be one of the final choices forced by the LAMPS ASN.1 haters). Bottom line, I rather think that configurability serves us more in the current uncertain climate that any concern about purity, or hypothetical long-term support cost, which is regrettable (in an ideal world LAMPS would not have gone rogue), but something we can easily deal with. |
As a person from the team having "helped" create this mess I probably should keep silent -- but just can't :) fwiw, I conceptually completely agree with the approach "accept any input, make output run-time configurable" to cater to all possible futures (standards), but I have to ask this: If it is a tenet that OpenSSL only implements finalized standards (?), would it be justifiable to have no (at least private key) encoders active by default in 3.5? A purist position would even be to not include any encoders as there's no standard one can refer to -- but as the code/options is there, this default setting at least could make the unwary aware (of the problem of no commonly agreed standard being available) and force the people shipping distros make a conscious decision (maybe/hopefully activating by config the standard finalized at time of packaging). For me at least this worked reasonably well: encoders for KEMs are by default not available in |
No it is not a tenet, and shipping without working encoders is not acceptable. Some users are already creating ML-KEM certificates, and these require being able to encode the public key, and to to be able to load and use the private key in protocols that use static KEM keys (these are already in use). We would not ship if the algorithm or public-key encoding were ill-defined, because there's no chance of interoperability, but private keys are a different matter, and if we can handle a superset of the specification, and with luck have a better sense of the right default output format before long, we're solid, even if our default format is outside the spec, but ideally we won't need to do that. |
Thanks for the explanation/background. Makes sense.
I don't doubt that. My sole concern is how this "situation" is best represented (beyond documentation) towards all types of users so they're not caught off-guard. |
Added another question to the end of the top section.. |
It is a lot more than one line in a table. It needs unit and interop tests. Sure, they mightn't change much over time but they are a cost. |
I'm really glad someone bought this up. The policy for inclusion is national or international standard. OQS certainly doesn't qualify as either. I'm not advocating waiting until the IETF finishes they length processes since it would be counter productive. For ML-DSA we're going to have to make a stab at something if we want to include it. ML-DSA is kind of pointless without being about to encode and decode keys. I think we should be cautious about including too much here nonetheless. For ML-KEM we could take a more conservative approach. ML-KEM's primary, and by far largest, use case is for TLS. In this role encoders and decoders are not required. That is, we could ship ML-KEM without encoders & decodes and wait for the standards to settle before adding them. If there is a published standard for ML-KEM certs, this is moot. We should support them. I really hate introducing new features with built in legacy. Supporting OQS formats does exactly this. |
One thing why I would really like to see ability to load ML-KEM keys in from a file (even if they are just raw byte strings) is side-channel testing. To be able to check if there is no leakage related to private key values I need to know what key was actually used for the operation... |
That is a valid use case speaking for having an option to enable encoder/decoders but imo does not demand having this enabled by default. Also, it is not an argument for supporting all kinds of formats. |
As I've noted a number of times before, that context is only for cryptographic algorithms and was put in place when we were getting requires for random-algorithm-of-the-week to be added. We made a rule that if the algorithm did not have at least national level standard documentation we would not add it in. This has not applied (and is not meant to apply) to anything else. We implement what we believe represents what our communities want to see in place - and generally that tends towards interoperable code - interoperable is more important than "correct".
I don't see it that way at all - we can make it more painful for the early adopters to move across, or we can make it easier. For the PQC work, with the formats within the IETF in the context of PKCS#8 changing over the last 6 months between various incompatible formats and the latest details showing things are still not quite settled down lead @vdukhovni to implement a logical range of formats and having OQS support was effectively very low cost. As OQS was what we have been recommending for those that needed to experiment with until such time as we had our built-in implementations, it is leaving those users high and dry if we don't at least read the format that was written. And when the variations between the formats are such they can be easily table driven it makes the cost/benefit decision lean a particular direction. |
correct, I just need a way to import key, it doesn't have to be the standardised PKCS#8 format |
@tomato42 Do you need to 'only' know the key, or do you need to control the key? If it's the former, at least for TLS, you might be able to leverage Haven't tried it out, but something like below would be my starting point for such a callback function (example for the client side):
|
Apart from PKCS#8, EVP layer provides And |
Since the leakage from private key operations are basically relevant only in context of CMS or similar protocols, I'd rather not do it inside the TLS context at all. Also, I'd rather have the test harness as simple as possible, something like this, but one that loads a new key for every operation. I plan to be able to test also the seed format for private keys, so the amount of control over the actually used key (in terms of numerical values) is quite limited...
yes, those should be sufficient for side-channel testing |
FWIW, this format (privkey+pubkey, no seed) is also what GnuTLS 3.8.9 reads and writes, exclusively. |
Correct, but this seems not a deliberate GnuTLS design decision but a mere limitation/consequence of it currently still using |
GnuTLS 3.8.9 switched to leancrypto for PQC support, that's why I mentioned the version. ;-) I just wanted to mention it as another implementation that may benefit from OpenSSL's input/output flexibility. Which works great btw, well done! |
Before merging the ML-KEM or ML-DSA feature branches to master the following design decisions should be agreed on as the correct approach..
Background:
MK-KEM and ML-DSA both generate their key pair components using an input seed.
As part of the FIPS standards the public and private keys are encoded into a formats that can be serialized.
There is also currently a raw format.
Because of these different requirements, OpenSSL is stuck in a difficult position as to what the format should be, whilst IETF decides on the final format.
Because of these issues, the OpenSSL design allows lots of format choices....
To support this OpenSSL currently allows provider config options to allow input and output choices. If the choices are omitted then an error will occur if that type is attempted to be loaded.
So this raises the following questions:
(1) Which options should be supported for input. Should we just always allow all formats to load and not use input format choices?
(2) Which options should be supported for output. (Should OQS and raw format be allowed for example). Does it need a prioritized list?
When loading there are cases where both seed, priv encoding and public key could be present.. So what should happen in these cases?
(3) When importing - If both the seed and priv encoding are present, should we generate from the seed and make sure it matches the private encoding OR should we decode the private part and just store the seed.
(4) Since the public key can be derived from the private key, the same questions also relate to the private key.
(5) If either the seed or private key are present when exporting should the public key be exported also?
The following PR adds a Pairwise Test to the import
openssl/providers/implementations/keymgmt/ml_kem_kmgmt.c
Line 456 in 8a14495
The pairwise test checks that the public key matches the private key.. Given that the private key is trusted does that also imply that we trust the public key.
The text was updated successfully, but these errors were encountered: