-
Notifications
You must be signed in to change notification settings - Fork 186
Generate random DNS id's #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
@Claudenw Pinging |
So this isn't entirely clear because I have seen other implementations of DNS resolvers in other software (i.e. systemd-resolved) and they just use standard random which is also prone to collisions.
I don't think this is an issue, the entire implementation is private and I don't think it can be called from multiple threads at once. Even if it can then its not a problem, you might get a slightly "wrong" result but it doesn't matter because we just want random numbers. |
|
The old code wasn't thread safe either - why do we need to worry about this code being thread safe? To make the old code thread safe, we'd need to use an AtomicInteger or something like that. |
|
I'm not an expert but any idea why the value is a |
It has to be according to the standard, see https://datatracker.ietf.org/doc/html/rfc5452#section-4.3. The generated ID's fill a octet (i.e. 2 byte) range which in Java/Scala is a Actually this new implementation is strictly better than the current one because the current one only used 1 byte range since it just incremented from |
|
The collision can happen in DnsClient.scala, if there are two flight queries with same id. And i was.submited a pr in akka for this, but can be done with a fsm like thing too. |
If this is regarding it not being thread safe we can solve this separately. Let's fix one problem at a time, the RC is on our doorstep. On another note CI is failing because the tests need to be updated. I'll do this in a bit. |
|
Do we know what happens if we change |
On the top of my head this uses a completely separate implementation from Java stdlib so it shouldn't be an issue. |
Java lib based one may be slower and blocking? |
From what @raboof said, the reason why the It being slower due to blocking is likely a secondary reason. |
9efc07b to
8c346e2
Compare
|
I raised #373 for the lower priority issue of tracking ids |
|
So I have just updated the tests and rebased/force pushed the changes. The reason why the tests are failing is that they assumed the val firstId = dnsClient1.expectMsgPF() {
case q4: Question4 if q4.name == "cats.com" =>
q4.id
}and then that val secondId = dnsClient2.expectMsgPF() {
case q4: Question4 if q4.name == "cats.com" && q4.id != firstId =>
q4.id
}i.e. in this specific case we can't test for hardcoded id's but we can still make sure that the |
I am going to confirm whether collisions are an issue. Note that one advantage of the algorithm that @Claudenw provided is that its deliberately designed to minimize the amount of collisions, from https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing
|
|
So I just checked systemd-resolved and it does appear to handle hash collisions, it does this by tracking the state of current transactions (see https://github.com/systemd/systemd/blob/main/src/resolve/resolved-dns-transaction.c#L195-L196). Will work on this now. |
| idIncrement -= { | ||
| idCount += 1; idCount - 1 | ||
| } | ||
| result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextId can be called concurrently, so I think two thread can get the same result.
idIncrement and idIndex should be protected with AtomicLong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using #288
pjfanning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - we can continue to look at fully collision-proof implementations but this PR is probably enough to unblock an RC
|
Lets wait a bit before merging this, I think I was wrong in what I said before about collisions and we do need to handle this. |
So having gone through the codebase, Did I miss something? |
Unfortunately, the 3 numbers in the state are all important. And they should be updated together, so I find it hard to see any solution based on AtomicLongs or AtomicReferences. Using the Semaphore locking has the best result in my testing. Could we just go with that for RC1? We are very likely to need an RC2 so there is time for someone to find a better solution. |
Ordinarily yes, but we aren't looking for complete correctness here but I can't think of any way to abuse this atm
Fine with me unless there are some other real objections. Will wait till EOD. |
307bd25 to
6ae1d29
Compare
|
So I have just pushed the branch using the simple version of enhanced double hash with the Note that I have modified the benchmarks to be more indicative of what we are actually testing in reality, i.e. the original benchmarks designated 8 threads however its unlikely in reality that we will have more than 8 threads hitting the @pjfanning @IainHull Let me know if I may have missed something |
6ae1d29 to
b0bc719
Compare
actor/src/main/scala/org/apache/pekko/util/RandomShortProvider.scala
Outdated
Show resolved
Hide resolved
b0bc719 to
b0e3ac1
Compare
| private var count = 1L | ||
|
|
||
| def nextId(): Short = synchronized { | ||
| val result = (0xFFFFFFFF & index).asInstanceOf[Short] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to consider https://github.com/mdedetrich/incubator-pekko/blob/514ac493c3d530aea16f86d702b7d2ed358229e1/actor/src/main/scala/org/apache/pekko/util/UniqueRandomShortProvider.scala ? @alexandru suggested that 'synchronized' is a bad solution and the semaphore can timeout (while synchronized can just hang).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure ill look at it tomorrow, will be busy tonight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have just update the version to use a semaphore as well as making the lock timeout configurable plus I did some other improvements to make the config reading/parsing consistent with the rest of Pekko.
I used the exact same version that you suggested (with a minor performance improvement to avoid creating a new class in the failure case of acquiring a lock) and the performance nosedived.
[info] IdGeneratorBanchmark.measureEnhancedDoubleHash thrpt 3 ≈ 10⁻¹⁰ ops/ns
[info] IdGeneratorBanchmark.measureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns
[info] IdGeneratorBanchmark.measureThreadLocalRandom thrpt 3 0,311 ± 0,031 ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureEnhancedDoubleHash thrpt 3 ≈ 10⁻¹⁰ ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureThreadLocalRandom thrpt 3 0,608 ± 0,036 ops/ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So using a much shorter timeout the improvements are better (i.e. this is the results with 10 nanos)
[info] IdGeneratorBanchmark.measureEnhancedDoubleHash thrpt 3 0,001 ± 0,001 ops/ns
[info] IdGeneratorBanchmark.measureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns
[info] IdGeneratorBanchmark.measureThreadLocalRandom thrpt 3 0,312 ± 0,074 ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureEnhancedDoubleHash thrpt 3 0,001 ± 0,001 ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureThreadLocalRandom thrpt 3 0,622 ± 0,018 ops/ns
I am not sure what problem we are trying to achieve with a semaphore though, in such a hot loop with only a single permit its actually slower than synchronized which because the JVM know its only single threaded (i.e. permit of 1) is hyper optimized for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pjfanning You just approved PR, did you read this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, and it's cpu bound and so ok to use the synchronized.
And there is already have a util, let's make use of that one.
And for virtual thread friendly, use lock instead which is fast too.
e015bbc to
2500db1
Compare
pjfanning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| private val onAcquireFailure = IdGenerator.random(seed) | ||
|
|
||
| override final def nextId(): Short = { | ||
| if (lock.tryAcquire(semaphoreTimeout.length, semaphoreTimeout.unit)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert to synchronized if you find it is working better for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I will do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I just merged a PR fixing a small issue in the negative ID fix - I have resolved the conflict in your PR/branch - so you might want to pull the latest code
8513519 to
045df1b
Compare
|
@mdedetrich this PR is showing a Merge Conflict in GitHub |
Co-authored-by: Claude Warren <[email protected]> Co-authored-by: PJ Fanning <[email protected]>
045df1b to
e9be00d
Compare
Fixed |
|
@mdedetrich Sorry for the delay getting back, this looks good. |
DNS id generated should be random, see https://datatracker.ietf.org/doc/html/rfc5452#section-4.3. One additional thing that needs to be confirmed is whether we need to handle collisions