-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate right-length node name #30491
Comments
/cc @mmusgrov (narayana) |
Wondering if we should just sha any value longer than 28(?) bytes ? What could go wrong :) |
@maxandersen Wouldn't truncating the hash increase the probability of a collision. If there is someone else here with a background in hashing functions that can provide an estimate of the two probabilities (truncated hash versus the truncated hashes of any of the other pods) then we can decide how exposed we'd be if we go with truncating the hash. |
But if the hashes for two different nodes collide then you could end up with two transaction recovery managers independently recovering the same in doubt transaction, probably resulting in a heuristic outcome, that would be wrong ;-) |
So your concern is that a sha256 is 32 characters and we only have room for 28? Where does that 28 size come from anyway ? ::) |
Just realizing sha-224 exists which would fit into 28 bytes. Thus not be truncated. My understanding is that collisions on small values is still very unlikely - one can potentially have a hit if one generate a very long string that coincidentally has same hash. But that is not bound to happen in any realistic usage of these key to signify identity. Don't know the exact numbers of these but should be possible to find. |
Okey so late night reading. The hash collision is the same as the birthday problem. This article explains it pretty well https://kevingal.com/blog/collisions.html with a calculator https://kevingal.com/apps/collision.html Let's say we expect there to be on average 4(?) nodes participating; how likely is it their values will have hash collisions given their values are most likely unique (i.e. host names are user entered keys with the intention to be unique) then the simplified calculation is: 4^2/2*(2^224) = 215679573337205118357336120696157045389097155380324579848828881993728 Approximate: 2.15*10^68 It's way more likely a asteroid hits and eliminate the planet in our lifetime than there will be collisions Afaics :) |
Ps. Check my math - I simply based it on the article explaining the approximation for the birthday problem. |
Interesting, this is a useful analysis. The limitation on the size of the node name is because we include it in the (X/Open) Xid which is a fixed size data structure. But I think you have kick-started another round of discussions about how to work around the issues ... |
Is there a technical reason for needing the reversal? If yes; then we should consider using compression instead. If it's just to locate the node; doesn't the protocol allow to annotate/tag the log to get more context ? |
@mmusgrov Why is that? I know that there is a limit on route URLs (I think of 63 characters). I definitively know that we have pod-names with more than 23 and 28 character in production in an OpenShift cluster. |
@maxandersen while this is true, not all of the generated bytes represent printable characters. So |
Not sure the math is correct, but the order of magnitude seems sensible. So basically
should be the probability to have Conversely,
is the probability that at least one collision occurs ( Wolframalpha refuses to calculate the value If we let wolframalpha calculate so yeah... |
Okay, I used the approximation used by the link provided by @maxandersen and calculated the number of nodes necessary to reach a probability of |
bump so... do we wanna do this? |
I'm not sure what to say here other than to state our needs:
So our key requirement is uniqueness and I don't feel sufficiently qualified to comment on the solutions that Marco and Max have presented here. So to answer your question, we would like to do this but I'm uncomfortable with the solutions presented so far. Can we not detect the problem, of someone setting too long a node name, at build time? |
@mmusgrov The node identifier is something one would set at run time. Think, for example, the pod name of a pod in a stateful set. We cannot detect this value at build time. Also: where are the 23 bytes coming from if the error message says 28 bytes? |
Thanks. The 23 byte node identifier limit is something that was originally imposed when running WildFly on OpenShift and subsequently got added as a WildFly restriction too and there are a number of JIRAs for it, none of which say why. I've asked around a few times without success and the two people that I think could know the answer no longer work in this area (and no longer work for RedHat). Perhaps it's just one of those things that got "lost in the mists of time" and is no longer relevant, I know that OpenShift has changed a lot since the restriction was added. So let's just assume this isn't applicable to quarkus, I'm comfortable doing that. Note that the 28 bytes for the datum is still a hard requirement since we inherit this limitation from the XA specification which constrains the size of an Xid. |
28 bytes is fine, we have hash-algorithms that generate 28-byte hashes. But there is nothing that generates 23-byte hashes. |
So would an identifier like |
That would be fine with me. I know it was said elsewhere that this would make debugging a pain but that complaint can be side-stepped if the boot log can report the original value prior to hashing? |
Printing the initial value is no problem. So the last question... do we want to always hash, or only if the name is too long (and print information accordingly)? |
I don't know. But if you only hash when necessary then if someone complained about it then we could tell them to use a shorter node name so personally I'd go for only hash if necessary. |
Oh, and thanks very much for sorting through the issues and coming up with a good solution. |
@maxandersen @mmusgrov PR is open. You are welcome to review it 🙂 |
Generate right-length nodeIdentifier if nodeIdentifier is longer than 28 bytes, the XID generated by narayana ist too long, and it breaks. This shortens it, while nodeIdentifier remains unique. Fix ported from quarkus quarkusio/quarkus#30491
Generate right-length nodeIdentifier if nodeIdentifier is longer than 28 bytes, the XID generated by narayana ist too long, and it breaks. This shortens it, while nodeIdentifier remains unique. Fix ported from quarkus quarkusio/quarkus#30491
Generate right-length nodeIdentifier if nodeIdentifier is longer than 28 bytes, the XID generated by narayana ist too long, and it breaks. This shortens it, while nodeIdentifier remains unique. Fix ported from quarkus quarkusio/quarkus#30491
* Update NarayanaPropertiesInitializer.java Generate right-length nodeIdentifier if nodeIdentifier is longer than 28 bytes, the XID generated by narayana ist too long, and it breaks. This shortens it, while nodeIdentifier remains unique. Fix ported from quarkus quarkusio/quarkus#30491 * Replace deprecated API usage * Compute XA recovery nodes from node identifier by default * Upgrade versions --------- Co-authored-by: deepred-dev <[email protected]> Co-authored-by: Benjamin Graf <[email protected]>
Background
Currently, when we pass a value to
quarkus.transaction-manager.node-name
that is too long, the application fails to start with an exception:It would be a better developer experience when if
quarkus-narayana-jta
transformed the value to a valid value.Story
AS a developer
WHEN I provide a value for
quarkus.transaction-manager.node-name
THEN the extension
quarkus-narayana-jta
guarantees that the value is transformed to a valid node name if necessary.Implementation ideas
No response
The text was updated successfully, but these errors were encountered: