Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make throttling nmagent fetches for nodesubnet more dynamic #3023

Merged

Conversation

santhoshmprabhu
Copy link
Contributor

Reason for Change:
We previously added fetch of secondary IPs to the NMAgent client in CNS. This PR makes the fetch asynchronous, with the ability to adjust the frequency dynamically. Specifically,

  1. Min and max interval between fetches can be configured
  2. Frequency is cut in half (interval doubled) if no diff is seen, up to the max interval.
  3. Min interval is used as soon as a diff is seen

This is implemented using a time Ticker, but we wrap it with a wrapper so as to mock it for tests. This PR also includes some additional checks for empty responses from the nmagent (this was raised in this comment). Also have UTs.

Issue Fixed:
NA

Requirements:

Notes:

@santhoshmprabhu santhoshmprabhu added cns Related to CNS. go Pull requests that update Go code labels Sep 19, 2024
@santhoshmprabhu santhoshmprabhu self-assigned this Sep 19, 2024
@santhoshmprabhu santhoshmprabhu requested a review from a team as a code owner September 19, 2024 21:28
@santhoshmprabhu santhoshmprabhu changed the title Sanprabhu/cilium node subnet intelligent refresh Make throttling nmagent fetches for nodesubnet more dynamic Sep 19, 2024
@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@timraymond timraymond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to recant some of what I said in the last PR now that I understand why we're doing this debouncing. As I mentioned during the design review, rate limits and wireserver are a fact of life, so it now feels weird to have arbitrary logic treating the nmagent client carefully because it might blow our rate limit. The design intention of the NMAgent client was always that users can do nmagent.DoSomethingForMe() and it will block while it deals with all of the weirdness that happens with NMAgent and Wireserver. This includes async response handling... and I'm thinking it should also include respecting rate limits. If the request is going to blow the rate limit, the WireserverTransport in the NMAgent client needs to block that request until the request will succeed. Otherwise, we're just setting the user up for failure.

Doing this with tickers is definitely the right way (in particular the whole ticker resetting thing is how I've done respecting rate limits in the past). However, it should be done in https://github.com/Azure/azure-container-networking/blob/master/nmagent/internal/wireserver.go so that it's shared across the client.

cns/nodesubnet/ip_fetcher_test.go Outdated Show resolved Hide resolved
cns/nodesubnet/ip_fetcher.go Outdated Show resolved Hide resolved
cns/nodesubnet/ip_fetcher.go Outdated Show resolved Hide resolved
cns/nodesubnet/refreshticker.go Outdated Show resolved Hide resolved
@santhoshmprabhu santhoshmprabhu requested a review from a team as a code owner September 24, 2024 18:50
@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

timraymond
timraymond previously approved these changes Oct 9, 2024
@santhoshmprabhu santhoshmprabhu added this pull request to the merge queue Oct 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 9, 2024
@santhoshmprabhu santhoshmprabhu added this pull request to the merge queue Oct 14, 2024
@santhoshmprabhu santhoshmprabhu removed this pull request from the merge queue due to a manual request Oct 14, 2024
@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu santhoshmprabhu added this pull request to the merge queue Oct 14, 2024
Merged via the queue into master with commit b5046a0 Oct 14, 2024
14 checks passed
@santhoshmprabhu santhoshmprabhu deleted the sanprabhu/cilium-node-subnet-intelligent-refresh branch October 14, 2024 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cns Related to CNS. go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants