Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: fix constant 30s backoff for re-resolution #7262

Merged
merged 7 commits into from
Jun 6, 2024

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented May 23, 2024

Fixes #7186 with a test

RELEASE NOTES:

  • dns: fix bug that was causing constant 30s backoff between every resolution request

@purnesh42H
Copy link
Contributor Author

purnesh42H commented May 23, 2024

Without Fix

1st resolve sent 2024-05-23 22:01:10.566367 +0530 IST m=+0.046273376
    /Users/purneshdixit/Google/Dev/grpc-go/internal/resolver/dns/resolver.go:49: testutils.ResolverClientConn: UpdateState({
          "Addresses": [
            {
              "Addr": "1.2.3.4:443",
              "ServerName": "",
              "Attributes": null,
              "BalancerAttributes": null,
              "Metadata": null
            }
          ],
          "Endpoints": null,
          "ServiceConfig": {
            "Config": {
              "Config": null,
              "Methods": {}
            },
            "Err": null
          },
          "Attributes": null
        })
1st resolve completed 2024-05-23 22:01:15.568635 +0530 IST m=+5.048577209
2nd resolve sent 2024-05-23 22:01:20.569472 +0530 IST m=+10.049450126
    /Users/purneshdixit/Google/Dev/grpc-go/internal/resolver/dns/resolver.go:49: testutils.ResolverClientConn: UpdateState({
          "Addresses": [],
          "Endpoints": null,
          "ServiceConfig": {
            "Config": {
              "Config": null,
              "Methods": {}
            },
            "Err": null
          },
          "Attributes": null
        })
2nd resolve completed 2024-05-23 22:01:25.571348 +0530 IST m=+15.051360709

With Fix

1st resolve sent 2024-05-23 22:09:39.675797 +0530 IST m=+0.048168126
    /Users/purneshdixit/Google/Dev/grpc-go/internal/resolver/dns/resolver.go:49: testutils.ResolverClientConn: UpdateState({
          "Addresses": [
            {
              "Addr": "1.2.3.4:443",
              "ServerName": "",
              "Attributes": null,
              "BalancerAttributes": null,
              "Metadata": null
            }
          ],
          "Endpoints": null,
          "ServiceConfig": {
            "Config": {
              "Config": null,
              "Methods": {}
            },
            "Err": null
          },
          "Attributes": null
        })
1st resolve completed 2024-05-23 22:09:44.677399 +0530 IST m=+5.049806168
2nd resolve sent 2024-05-23 22:09:49.678512 +0530 IST m=+10.050954668
    /Users/purneshdixit/Google/Dev/grpc-go/internal/resolver/dns/resolver.go:49: testutils.ResolverClientConn: UpdateState({
          "Addresses": [],
          "Endpoints": null,
          "ServiceConfig": {
            "Config": {
              "Config": null,
              "Methods": {}
            },
            "Err": null
          },
          "Attributes": null
        })
2nd resolve completed 2024-05-23 22:09:49.679501 +0530 IST m=+10.051943126

@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch from efa9843 to 074c279 Compare May 23, 2024 16:47
@purnesh42H purnesh42H modified the milestone: 1.65 Release May 27, 2024
@purnesh42H purnesh42H marked this pull request as ready for review May 27, 2024 20:20
@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch 5 times, most recently from dbbbea6 to 7484d21 Compare May 28, 2024 04:08
@purnesh42H purnesh42H requested review from arjan-bal and aranjans May 28, 2024 04:08
@aranjans aranjans added this to the 1.65 Release milestone May 28, 2024
@sebastianjonasson
Copy link

I've tested this and can confirm that it resolves the issue described in #7186.

@purnesh42H
Copy link
Contributor Author

purnesh42H commented May 28, 2024

TestMinResolutionInterval_NoExtraDelay fails for main branch code but passes for the change. No failures in 100K forge

@easwars
Copy link
Contributor

easwars commented May 28, 2024

@purnesh42H : This PR definitely needs a release note because we are fixing a bug that we introduced.

internal/resolver/dns/dns_resolver_test.go Outdated Show resolved Hide resolved
// the likelihood of flakiness due to timing issues.
for i := 0; i < 4; i++ {
verifyUpdateFromResolver(ctx, t, stateCh, wantAddrs, nil, wantSC)
time.Sleep(1 * time.Second) // respect resolution rate of 1 second for re-resolve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid sleeping in our tests as much as possible. Sleeping causes the tests to unnecessarily flaky, and run for longer than actually required.

Instead, if you override internal.TimeAfterFunc, you can control exactly when this function pushes something on the channel read by the code. This way, you can avoid sleeping as well.

There is currently an overrideTimeAfterFunc helper function, but this sets the after func to be invoked after a set duration. Instead you can override internal.TimeAfterFunc to return a channel that is controlled by the test, and have it push to it whenever it wants the code under test to be unblocked.

Copy link
Contributor Author

@purnesh42H purnesh42H May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should overriding internal.TimeAfterFunc because we want to test the actual implementation work correctly with MinResolutionInterval. Here, we just want to wait outside for 1 second before making another request so I just replaced time.Sleep with another channel using TimeAfterFunc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't want this test to run for 4+ seconds.

Instead I suggest the following:

  • Define a variable for internal.TimeNowFunc, just like how it is done for internal.TimeAfterFunc
    • set this variable to time.Now in the init function
  • Override this variable to return a time value that you control in your test
  • Override internal.TimeAfterFunc in your test and verify that it is called with the expected duration

Let me know if this doesn't work.

Copy link
Contributor Author

@purnesh42H purnesh42H Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Modified the test to override time.Until to calculate remaining wait to allow resolution after 1s i.e. equivalent of making re-resolution after 1s and verify the expected wait duration is very small.

internal/resolver/dns/dns_resolver_test.go Outdated Show resolved Hide resolved
@easwars easwars assigned purnesh42H and unassigned easwars May 28, 2024
@easwars easwars removed the request for review from aranjans May 28, 2024 18:18
@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch 2 times, most recently from ce604c2 to e5f053b Compare May 29, 2024 09:57
@purnesh42H purnesh42H requested review from easwars and arjan-bal May 29, 2024 09:57
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H May 29, 2024
@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch from e5f053b to 78a490c Compare May 30, 2024 05:13
@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch from bec24bd to 23dd2e8 Compare June 4, 2024 08:22
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Jun 4, 2024
@purnesh42H purnesh42H requested a review from easwars June 4, 2024 08:25
@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch 4 times, most recently from e6304b1 to 41e46e4 Compare June 4, 2024 11:36
@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch from 41e46e4 to a775921 Compare June 4, 2024 11:44
@@ -102,6 +102,15 @@ func overrideTimeAfterFuncWithChannel(t *testing.T) (durChan chan time.Duration,
return durChan, timeChan
}

// Override the remaining time used by the DNS resolver to allow resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// between two resolution requests apart from [minResolutionInterval].
// It sets the minResolutionInterval 1s and overrides timeUntilFunc to
// calculate remaining time to allow resolution after 1s and verifies that
// remaining time to allow resolution is very small
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Same here. Please terminate with a period.


// TestMinResolutionInterval_NoExtraDelay verifies that there is no extra delay
// between two resolution requests apart from [minResolutionInterval].
// It sets the minResolutionInterval 1s and overrides timeUntilFunc to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Switch It sets the minResolutionInterval 1s with either It sets the minResolutionInterval to 1s or It sets a minResolutionInterval of 1s

Comment on lines 1339 to 1340
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't recreate the test context. Use the one created earlier. Using the same top-level test context will ensure that all operations in the test complete within defaultTestTimeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Missed removing it after getting the test work

},
}
overrideNetResolver(t, tr)
var minResolutionInterval = 1 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use const instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

internal/resolver/dns/dns_resolver_test.go Show resolved Hide resolved
@easwars easwars assigned purnesh42H and unassigned easwars Jun 4, 2024
@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch from 23d7528 to 6d8cc1b Compare June 5, 2024 19:12
@purnesh42H purnesh42H requested a review from easwars June 5, 2024 19:12
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Jun 5, 2024
Copy link

codecov bot commented Jun 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.49%. Comparing base (adf976b) to head (6d8cc1b).
Report is 86 commits behind head on master.

Current head 6d8cc1b differs from pull request most recent head deff579

Please upload reports for the commit deff579 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7262      +/-   ##
==========================================
- Coverage   81.24%   80.49%   -0.75%     
==========================================
  Files         345      348       +3     
  Lines       33941    33985      +44     
==========================================
- Hits        27574    27357     -217     
- Misses       5202     5432     +230     
- Partials     1165     1196      +31     
Files Coverage Δ
internal/resolver/dns/dns_resolver.go 89.39% <100.00%> (+0.08%) ⬆️

... and 49 files with indirect coverage changes

@purnesh42H purnesh42H force-pushed the dns-resolver-backoff-fix branch from 6d8cc1b to deff579 Compare June 5, 2024 19:19
@arvindbr8 arvindbr8 modified the milestones: 1.65 Release, 1.66 Release Jun 6, 2024
@easwars easwars merged commit 5a289d9 into grpc:master Jun 6, 2024
11 checks passed
purnesh42H added a commit to purnesh42H/grpc-go that referenced this pull request Jun 7, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delayed dns resolve after upgrade to v1.60.0
6 participants