-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add provider cache #4597
Merged
k8s-ci-robot
merged 14 commits into
kubernetes-sigs:master
from
tjamet:add-provider-cache
Aug 14, 2024
Merged
Add provider cache #4597
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
089744c
Add cache at provider level
tjamet 29191e2
Skip apply empty changes in the cache provider
tjamet eb07eb9
Add licence headers
tjamet 82c6983
Add a log line when no changes on cache provider
tjamet b2ff161
Add Domain filter interface
tjamet b2d678f
Run go imports -local sigs.k8s.io/external-dns
tjamet 9b759f0
Update changes to match latest state of external-dns code
tjamet 2955e5d
Apply suggestions from code review
tjamet 309f34f
Add an advanced topic about rate limits
tjamet 1736048
Rename Advanced topics title to match the content
tjamet ef0dd29
Apply suggestions from code review
tjamet 6c5faaf
Dynamically register cache provider metrics
tjamet 8d2f1c0
Fix invalid private variable reference
tjamet a6ab2ba
Update docs/tutorials/aws.md
tjamet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
DNS provider API rate limits considerations | ||
=========================================== | ||
|
||
## Introduction | ||
|
||
By design, external-dns refreshes all the records of a zone using API calls. | ||
This refresh may happen peridically and upon any changed object if the flag `--events` is enabled. | ||
|
||
Depending on the size of the zone and the infrastructure deployment, this may lead to external-dns | ||
hitting the DNS provider's rate-limits more easily. | ||
|
||
In particular, it has been found that with 200k records in an AWS Route53 zone, each refresh triggers around | ||
70 API calls to retrieve all the records, making it more likely to hit the AWS Route53 API rate limits. | ||
|
||
To prevent this problem from happening, external-dns has implemented a cache to reduce the pressure on the DNS | ||
provider APIs. | ||
|
||
This cache is optional and systematically invalidated when DNS records have been changed in the cluster | ||
(new or deleted domains or changed target). | ||
|
||
## Trade-offs | ||
|
||
The major trade-off of this setting relies in the ability to recover from a deleted record on the DNS provider side. | ||
As the DNS records are cached in memory, external-dns will not be made aware of the missing records and will hence | ||
take a longer time to restore the deleted or modified record on the provider side. | ||
|
||
This option is enabled using the `--provider-cache-time=15m` command line argument, and turned off when `--provider-cache-time=0m` | ||
|
||
## Monitoring | ||
|
||
You can evaluate the behaviour of the cache thanks to the built-in metrics | ||
|
||
* `external_dns_provider_cache_records_calls` | ||
* The number of calls to the provider cache Records list. | ||
* The label `from_cache=true` indicates that the records were retrieved from memory and the DNS provider was not reached | ||
* The label `from_cache=false` indicates that the cache was not used and the records were retrieved from the provider | ||
* `external_dns_provider_cache_apply_changes_calls` | ||
* The number of calls to the provider cache ApplyChanges. | ||
* Each ApplyChange systematically invalidates the cache and makes subsequent Records list to be retrieved from the provider without cache. | ||
|
||
## Related options | ||
|
||
This global option is available for all providers and can be used in pair with other global | ||
or provider-specific options to fine-tune the behaviour of external-dns | ||
to match the specific needs of your deployments, with the goal to reduce the number of API calls to your DNS provider. | ||
|
||
* `--google-batch-change-interval=1s` When using the Google provider, set the interval between batch changes. ($EXTERNAL_DNS_GOOGLE_BATCH_CHANGE_INTERVAL) | ||
* `--google-batch-change-size=1000` When using the Google provider, set the maximum number of changes that will be applied in each batch. | ||
* AWS | ||
* `--aws-batch-change-interval=1s` When using the AWS provider, set the interval between batch changes. | ||
* `--aws-batch-change-size=1000` When using the AWS provider, set the maximum number of changes that will be applied in each batch. | ||
* `--aws-batch-change-size-bytes=32000` When using the AWS provider, set the maximum byte size that will be applied in each batch. | ||
* `--aws-batch-change-size-values=1000` When using the AWS provider, set the maximum total record values that will be applied in each batch. | ||
* `--aws-zones-cache-duration=0s` When using the AWS provider, set the zones list cache TTL (0s to disable). | ||
* `--[no-]aws-zone-match-parent` Expand limit possible target by sub-domains | ||
* Cloudflare | ||
* `--cloudflare-dns-records-per-page=100` When using the Cloudflare provider, specify how many DNS records listed per page, max possible 5,000 (default: 100) | ||
* OVH | ||
* `--ovh-api-rate-limit=20` When using the OVH provider, specify the API request rate limit, X operations by seconds (default: 20) | ||
|
||
* Global | ||
* `--registry=txt` The registry implementation to use to keep track of DNS record ownership (default: txt, options: txt, noop, dynamodb, aws-sd) | ||
* `--txt-cache-interval=0s` The interval between cache synchronizations in duration format (default: disabled) | ||
* `--interval=1m0s` The interval between two consecutive synchronizations in duration format (default: 1m) | ||
* `--min-event-sync-interval=5s` The minimum interval between two consecutive synchronizations triggered from kubernetes events in duration format (default: 5s) | ||
* `--[no-]events` When enabled, in addition to running every interval, the reconciliation loop will get triggered when supported sources change (default: disabled) | ||
|
||
A general recommendation is to enable `--events` and keep `--min-event-sync-interval` relatively low to have a better responsiveness when records are | ||
created or updated inside the cluster. | ||
This should represent an acceptable propagation time between the creation of your k8s resources and the time they become registered in your DNS server. | ||
|
||
On a general manner, the higher the `--provider-cache-time`, the lower the impact on the rate limits, but also, the slower the recovery in case of a deletion. | ||
The `--provider-cache-time` value should hence be set to an acceptable time to automatically recover restore deleted records. | ||
|
||
✍️ Note that caching is done within the external-dns controller memory. You can invalidate the cache at any point in time by restarting it (for example doing a rolling update). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
/* | ||
Copyright 2017 The Kubernetes Authors. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
package provider | ||
|
||
import ( | ||
"context" | ||
"sync" | ||
"time" | ||
|
||
"github.com/prometheus/client_golang/prometheus" | ||
log "github.com/sirupsen/logrus" | ||
|
||
"sigs.k8s.io/external-dns/endpoint" | ||
"sigs.k8s.io/external-dns/plan" | ||
) | ||
|
||
var ( | ||
cachedRecordsCallsTotal = prometheus.NewCounterVec( | ||
prometheus.CounterOpts{ | ||
Namespace: "external_dns", | ||
Subsystem: "provider", | ||
Name: "cache_records_calls", | ||
Help: "Number of calls to the provider cache Records list.", | ||
}, | ||
[]string{ | ||
"from_cache", | ||
}, | ||
) | ||
cachedApplyChangesCallsTotal = prometheus.NewCounter( | ||
prometheus.CounterOpts{ | ||
Namespace: "external_dns", | ||
Subsystem: "provider", | ||
Name: "cache_apply_changes_calls", | ||
Help: "Number of calls to the provider cache ApplyChanges.", | ||
}, | ||
) | ||
|
||
registerCacheProviderMetrics = sync.Once{} | ||
) | ||
|
||
type CachedProvider struct { | ||
Provider | ||
RefreshDelay time.Duration | ||
lastRead time.Time | ||
cache []*endpoint.Endpoint | ||
} | ||
|
||
func NewCachedProvider(provider Provider, refreshDelay time.Duration) *CachedProvider { | ||
registerCacheProviderMetrics.Do(func() { | ||
prometheus.MustRegister(cachedRecordsCallsTotal) | ||
}) | ||
return &CachedProvider{ | ||
Provider: provider, | ||
RefreshDelay: refreshDelay, | ||
} | ||
} | ||
|
||
func (c *CachedProvider) Records(ctx context.Context) ([]*endpoint.Endpoint, error) { | ||
if c.needRefresh() { | ||
log.Info("Records cache provider: refreshing records list cache") | ||
records, err := c.Provider.Records(ctx) | ||
if err != nil { | ||
c.cache = nil | ||
return nil, err | ||
} | ||
c.cache = records | ||
c.lastRead = time.Now() | ||
cachedRecordsCallsTotal.WithLabelValues("false").Inc() | ||
} else { | ||
log.Debug("Records cache provider: using records list from cache") | ||
cachedRecordsCallsTotal.WithLabelValues("true").Inc() | ||
} | ||
return c.cache, nil | ||
} | ||
func (c *CachedProvider) ApplyChanges(ctx context.Context, changes *plan.Changes) error { | ||
if !changes.HasChanges() { | ||
log.Info("Records cache provider: no changes to be applied") | ||
return nil | ||
} | ||
c.Reset() | ||
cachedApplyChangesCallsTotal.Inc() | ||
return c.Provider.ApplyChanges(ctx, changes) | ||
} | ||
|
||
func (c *CachedProvider) Reset() { | ||
c.cache = nil | ||
c.lastRead = time.Time{} | ||
} | ||
|
||
func (c *CachedProvider) needRefresh() bool { | ||
if c.cache == nil { | ||
log.Debug("Records cache provider is not initialized") | ||
return true | ||
} | ||
log.Debug("Records cache last Read: ", c.lastRead, "expiration: ", c.RefreshDelay, " provider expiration:", c.lastRead.Add(c.RefreshDelay), "expired: ", time.Now().After(c.lastRead.Add(c.RefreshDelay))) | ||
return time.Now().After(c.lastRead.Add(c.RefreshDelay)) | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could someone explain why we changed the type here?
Edit: Meant to comment on the provider type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time of implementing, there was a conflict with some providers.
It seems that this is not the case any longer and that this change was kept over rebases.
I tried reverting b2ff1619 and it seems to work after some minor fixes.
What problems do you encounter because of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It introduced a breaking change for some of the webhook providers as we used the DomainFilter type to simply Marshal the filter list for negotiation. We now just take the filter json and marshal it ourselves which is no big issue. It's just a loss of functionality. (writing this at 2am, might have a better way to explain it later)
Reference