Generate random DNS id's #371

mdedetrich · 2023-06-07T09:17:39Z

DNS id generated should be random, see https://datatracker.ietf.org/doc/html/rfc5452#section-4.3. One additional thing that needs to be confirmed is whether we need to handle collisions

He-Pin · 2023-06-07T09:35:46Z

collisions should be handled too.
the call to nextId is not thread safe, does this pr handle this?

mdedetrich · 2023-06-07T09:36:45Z

@Claudenw Pinging

mdedetrich · 2023-06-07T09:50:41Z

collisions should be handled too.

So this isn't entirely clear because I have seen other implementations of DNS resolvers in other software (i.e. systemd-resolved) and they just use standard random which is also prone to collisions.

the call to nextId is not thread safe, does this pr handle this?

I don't think this is an issue, the entire implementation is private and I don't think it can be called from multiple threads at once. Even if it can then its not a problem, you might get a slightly "wrong" result but it doesn't matter because we just want random numbers.

pjfanning · 2023-06-07T09:53:50Z

The old code wasn't thread safe either - why do we need to worry about this code being thread safe?

To make the old code thread safe, we'd need to use an AtomicInteger or something like that.

pjfanning · 2023-06-07T09:55:29Z

I'm not an expert but any idea why the value is a short. If we used a type like int or long, the risk of collisions reduces.

mdedetrich · 2023-06-07T09:57:57Z

I'm not an expert but any idea why the value is a short. If we used a type like int or long, the risk of collisions reduces.

It has to be according to the standard, see https://datatracker.ietf.org/doc/html/rfc5452#section-4.3. The generated ID's fill a octet (i.e. 2 byte) range which in Java/Scala is a Short.

Actually this new implementation is strictly better than the current one because the current one only used 1 byte range since it just incremented from 0 and Short.MaxValue + 1 doesn't overflow into Short.MinValue.

He-Pin · 2023-06-07T10:14:07Z

The collision can happen in DnsClient.scala, if there are two flight queries with same id.

And i was.submited a pr in akka for this, but can be done with a fsm like thing too.

mdedetrich · 2023-06-07T10:26:24Z

The collision can happen in DnsClient.scala, if there are two flight queries with same id.

And i was.submited a pr in akka for this, but can be done with a fsm like thing too.

If this is regarding it not being thread safe we can solve this separately. Let's fix one problem at a time, the RC is on our doorstep.

On another note CI is failing because the tests need to be updated. I'll do this in a bit.

pjfanning · 2023-06-07T10:35:46Z

Do we know what happens if we change pekko.io.dns.resolver to use inet-address instead of async-dns?

mdedetrich · 2023-06-07T10:42:17Z

Do we know what happens if we change pekko.io.dns.resolver to use inet-address instead of async-dns?

On the top of my head this uses a completely separate implementation from Java stdlib so it shouldn't be an issue.

He-Pin · 2023-06-07T10:56:33Z

Do we know what happens if we change pekko.io.dns.resolver to use inet-address instead of async-dns?

Java lib based one may be slower and blocking？

mdedetrich · 2023-06-07T11:21:54Z

Do we know what happens if we change pekko.io.dns.resolver to use inet-address instead of async-dns?

Java lib based one may be slower and blocking？

From what @raboof said, the reason why the async-dns was implemented in the first place is that there are caching issues with the java implementation.

It being slower due to blocking is likely a secondary reason.

pjfanning · 2023-06-07T12:15:59Z

I raised #373 for the lower priority issue of tracking ids

mdedetrich · 2023-06-07T12:20:40Z

So I have just updated the tests and rebased/force pushed the changes. The reason why the tests are failing is that they assumed the nextId was incrementally increasing and hence checking against raw id's. I have updated the tests so that we don't test against the id's but the properties for the id, i.e. typically speaking the test expects an initial id from the dnsClient1 i.e.

val firstId = dnsClient1.expectMsgPF() {
  case q4: Question4 if q4.name == "cats.com" =>
    q4.id
}

and then that secondId is being reused in later on, such as

val secondId = dnsClient2.expectMsgPF() {
  case q4: Question4 if q4.name == "cats.com" && q4.id != firstId =>
    q4.id
}

i.e. in this specific case we can't test for hardcoded id's but we can still make sure that the secondId is not the same as the firstId.

mdedetrich · 2023-06-07T12:23:02Z

I raised #373 for the lower priority issue of tracking ids

I am going to confirm whether collisions are an issue. Note that one advantage of the algorithm that @Claudenw provided is that its deliberately designed to minimize the amount of collisions, from https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing

the interval depends on the data, so that values mapping to the same location have different bucket sequences; this minimizes repeated collisions and the effects of clustering.

mdedetrich · 2023-06-07T12:56:39Z

So I just checked systemd-resolved and it does appear to handle hash collisions, it does this by tracking the state of current transactions (see https://github.com/systemd/systemd/blob/main/src/resolve/resolved-dns-transaction.c#L195-L196). Will work on this now.

He-Pin · 2023-06-07T12:59:01Z

actor/src/main/scala/org/apache/pekko/io/dns/internal/AsyncDnsResolver.scala

+    idIncrement -= {
+      idCount += 1; idCount - 1
+    }
+    result


nextId can be called concurrently, so I think two thread can get the same result.
idIncrement and idIndex should be protected with AtomicLong

I was using #288

pjfanning

lgtm - we can continue to look at fully collision-proof implementations but this PR is probably enough to unblock an RC

mdedetrich · 2023-06-07T13:00:10Z

Lets wait a bit before merging this, I think I was wrong in what I said before about collisions and we do need to handle this.

mdedetrich · 2023-06-07T13:53:54Z

nextId can be called concurrently, so I think two thread can get the same result. idIncrement and idIndex should be protected with AtomicLong

So having gone through the codebase, nextId() is only called in a receive block (indirectly) which means that its only going to be called in a single threaded context (this is something that actor guarantees).

Did I miss something?

pjfanning · 2023-06-15T13:14:23Z

I tried a solution with an AtomicReference around the state but its performance was slower than the synchronization and Semaphore locking variants. Approximately in SecureRandom territory in the 8 threads microbenchmark.

This isn't too surprising, I suspect that if you wanted to get any tangible benefits of AtomicLong/volatile you would need to use it on the individual variables (i.e. index/counter etc etc) where its needed which is hard to do correctly

Unfortunately, the 3 numbers in the state are all important. And they should be updated together, so I find it hard to see any solution based on AtomicLongs or AtomicReferences.

Using the Semaphore locking has the best result in my testing. Could we just go with that for RC1? We are very likely to need an RC2 so there is time for someone to find a better solution.

mdedetrich · 2023-06-15T13:16:29Z

Unfortunately, the 3 numbers in the state are all important. And they should be updated together, so I find it hard to see any solution based on AtomicLongs or AtomicReferences.

Ordinarily yes, but we aren't looking for complete correctness here but I can't think of any way to abuse this atm

Using the Semaphore locking has the best result in my testing. Could we just go with that for RC1? We are very likely to need an RC2 so there is time for someone to find a better solution.

Fine with me unless there are some other real objections. Will wait till EOD.

mdedetrich · 2023-06-15T19:27:41Z

So I have just pushed the branch using the simple version of enhanced double hash with the synchronized keyword. Here are the benchmarks

[info] IdGeneratorBanchmark.measureEnhancedDoubleHash                 thrpt    3  0,377 ±  0,032  ops/ns
[info] IdGeneratorBanchmark.measureSecureRandom                       thrpt    3  0,001 ±  0,001  ops/ns
[info] IdGeneratorBanchmark.measureThreadLocalRandom                  thrpt    3  0,308 ±  0,057  ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureEnhancedDoubleHash  thrpt    3  0,025 ±  0,013  ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureSecureRandom        thrpt    3  0,001 ±  0,001  ops/ns
[info] IdGeneratorBanchmark.multipleThreadsMeasureThreadLocalRandom   thrpt    3  0,609 ±  0,014  ops/ns

Note that I have modified the benchmarks to be more indicative of what we are actually testing in reality, i.e. the original benchmarks designated 8 threads however its unlikely in reality that we will have more than 8 threads hitting the nextId() function at once with such high contention, so instead I set the default to just a single thread but added another set of benchmarks for multiple threads.

@pjfanning @IainHull Let me know if I may have missed something

actor/src/main/scala/org/apache/pekko/util/RandomShortProvider.scala

pjfanning · 2023-06-15T19:39:15Z

actor/src/main/scala/org/apache/pekko/util/RandomShortProvider.scala

+  private var count = 1L
+
+  def nextId(): Short = synchronized {
+    val result = (0xFFFFFFFF & index).asInstanceOf[Short]


would it be possible to consider https://github.com/mdedetrich/incubator-pekko/blob/514ac493c3d530aea16f86d702b7d2ed358229e1/actor/src/main/scala/org/apache/pekko/util/UniqueRandomShortProvider.scala ? @alexandru suggested that 'synchronized' is a bad solution and the semaphore can timeout (while synchronized can just hang).

Sure ill look at it tomorrow, will be busy tonight

So I have just update the version to use a semaphore as well as making the lock timeout configurable plus I did some other improvements to make the config reading/parsing consistent with the rest of Pekko.

I used the exact same version that you suggested (with a minor performance improvement to avoid creating a new class in the failure case of acquiring a lock) and the performance nosedived.

[info] IdGeneratorBanchmark.measureEnhancedDoubleHash thrpt 3 ≈ 10⁻¹⁰ ops/ns [info] IdGeneratorBanchmark.measureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns [info] IdGeneratorBanchmark.measureThreadLocalRandom thrpt 3 0,311 ± 0,031 ops/ns [info] IdGeneratorBanchmark.multipleThreadsMeasureEnhancedDoubleHash thrpt 3 ≈ 10⁻¹⁰ ops/ns [info] IdGeneratorBanchmark.multipleThreadsMeasureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns [info] IdGeneratorBanchmark.multipleThreadsMeasureThreadLocalRandom thrpt 3 0,608 ± 0,036 ops/ns

So using a much shorter timeout the improvements are better (i.e. this is the results with 10 nanos)

[info] IdGeneratorBanchmark.measureEnhancedDoubleHash thrpt 3 0,001 ± 0,001 ops/ns [info] IdGeneratorBanchmark.measureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns [info] IdGeneratorBanchmark.measureThreadLocalRandom thrpt 3 0,312 ± 0,074 ops/ns [info] IdGeneratorBanchmark.multipleThreadsMeasureEnhancedDoubleHash thrpt 3 0,001 ± 0,001 ops/ns [info] IdGeneratorBanchmark.multipleThreadsMeasureSecureRandom thrpt 3 0,001 ± 0,001 ops/ns [info] IdGeneratorBanchmark.multipleThreadsMeasureThreadLocalRandom thrpt 3 0,622 ± 0,018 ops/ns

I am not sure what problem we are trying to achieve with a semaphore though, in such a hot loop with only a single permit its actually slower than synchronized which because the JVM know its only single threaded (i.e. permit of 1) is hyper optimized for this case.

@pjfanning You just approved PR, did you read this?

He-Pin

lgtm, and it's cpu bound and so ok to use the synchronized.

And there is already have a util, let's make use of that one.

And for virtual thread friendly, use lock instead which is fast too.

pjfanning

lgtm

pjfanning · 2023-06-16T08:52:58Z

actor/src/main/scala/org/apache/pekko/io/dns/IdGenerator.scala

+    private val onAcquireFailure = IdGenerator.random(seed)
+
+    override final def nextId(): Short = {
+      if (lock.tryAcquire(semaphoreTimeout.length, semaphoreTimeout.unit)) {


revert to synchronized if you find it is working better for you

Okay I will do so

btw I just merged a PR fixing a small issue in the negative ID fix - I have resolved the conflict in your PR/branch - so you might want to pull the latest code

pjfanning · 2023-06-16T09:07:36Z

@mdedetrich this PR is showing a Merge Conflict in GitHub

Co-authored-by: Claude Warren <[email protected]> Co-authored-by: PJ Fanning <[email protected]>

mdedetrich · 2023-06-16T09:10:03Z

@mdedetrich this PR is showing a Merge Conflict in GitHub

Fixed

IainHull · 2023-06-20T13:57:43Z

@mdedetrich Sorry for the delay getting back, this looks good.

mdedetrich · 2023-06-20T14:17:41Z

@IainHull Note that I made a ticket for anyone that wants to further improve performance of double enhanced hasher #411

mdedetrich requested review from He-Pin, gmethvin, jrudolph, nvollmar, pjfanning, raboof and seglo June 7, 2023 09:17

mdedetrich force-pushed the generate-random-dns-ids branch from 9efc07b to 8c346e2 Compare June 7, 2023 12:06

He-Pin reviewed Jun 7, 2023

View reviewed changes

pjfanning previously approved these changes Jun 7, 2023

View reviewed changes

pjfanning added the bug Something isn't working label Jun 7, 2023

pjfanning self-requested a review June 7, 2023 13:00

mdedetrich force-pushed the generate-random-dns-ids branch 2 times, most recently from 307bd25 to 6ae1d29 Compare June 15, 2023 19:21

mdedetrich force-pushed the generate-random-dns-ids branch from 6ae1d29 to b0bc719 Compare June 15, 2023 19:28

pjfanning reviewed Jun 15, 2023

View reviewed changes

actor/src/main/scala/org/apache/pekko/util/RandomShortProvider.scala Outdated Show resolved Hide resolved

mdedetrich force-pushed the generate-random-dns-ids branch from b0bc719 to b0e3ac1 Compare June 15, 2023 19:37

pjfanning reviewed Jun 15, 2023

View reviewed changes

He-Pin approved these changes Jun 16, 2023

View reviewed changes

mdedetrich force-pushed the generate-random-dns-ids branch 2 times, most recently from e015bbc to 2500db1 Compare June 16, 2023 07:38

pjfanning approved these changes Jun 16, 2023

View reviewed changes

pjfanning reviewed Jun 16, 2023

View reviewed changes

mdedetrich force-pushed the generate-random-dns-ids branch 4 times, most recently from 8513519 to 045df1b Compare June 16, 2023 09:07

Add enchanced-double-hash-random for id-generator-policy

e9be00d

Co-authored-by: Claude Warren <[email protected]> Co-authored-by: PJ Fanning <[email protected]>

mdedetrich force-pushed the generate-random-dns-ids branch from 045df1b to e9be00d Compare June 16, 2023 09:09

mdedetrich merged commit f6930d9 into apache:main Jun 16, 2023

mdedetrich deleted the generate-random-dns-ids branch June 16, 2023 10:48

This was referenced Jun 16, 2023

Async DNS Resolver should not use predictable IDs in requests #384

Closed

AsyncDnsResolver: examine if its state can be made thread safe #383

Closed

AsyncDnsResolver: track ids to avoid collisions #373

Closed

mdedetrich mentioned this pull request Jun 17, 2023

Improve performance of IdGenerator.EnhancedDoubleHashGenerator #411

Open

Generate random DNS id's #371

Generate random DNS id's #371

Uh oh!

Conversation

mdedetrich commented Jun 7, 2023

Uh oh!

He-Pin commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

pjfanning commented Jun 7, 2023

Uh oh!

pjfanning commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

He-Pin commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

pjfanning commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

He-Pin commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pjfanning commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

He-Pin Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

He-Pin Jun 7, 2023

Choose a reason for hiding this comment

Uh oh!

pjfanning left a comment

Choose a reason for hiding this comment

Uh oh!

mdedetrich commented Jun 7, 2023

Uh oh!

mdedetrich commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pjfanning commented Jun 15, 2023

Uh oh!

mdedetrich commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdedetrich commented Jun 15, 2023

Uh oh!

Uh oh!

pjfanning Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdedetrich Jun 15, 2023

Choose a reason for hiding this comment

Uh oh!

mdedetrich Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdedetrich Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

mdedetrich Jun 16, 2023

Choose a reason for hiding this comment

He-Pin commented Jun 7, 2023 •

edited

Loading

pjfanning commented Jun 7, 2023 •

edited

Loading

mdedetrich commented Jun 7, 2023 •

edited

Loading

He-Pin Jun 7, 2023 •

edited

Loading

mdedetrich commented Jun 7, 2023 •

edited

Loading

mdedetrich commented Jun 15, 2023 •

edited

Loading

pjfanning Jun 15, 2023 •

edited

Loading

mdedetrich Jun 16, 2023 •

edited

Loading

He-Pin left a comment •

edited

Loading