-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4597 from tjamet/add-provider-cache
Add provider cache
- Loading branch information
Showing
21 changed files
with
429 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
DNS provider API rate limits considerations | ||
=========================================== | ||
|
||
## Introduction | ||
|
||
By design, external-dns refreshes all the records of a zone using API calls. | ||
This refresh may happen peridically and upon any changed object if the flag `--events` is enabled. | ||
|
||
Depending on the size of the zone and the infrastructure deployment, this may lead to external-dns | ||
hitting the DNS provider's rate-limits more easily. | ||
|
||
In particular, it has been found that with 200k records in an AWS Route53 zone, each refresh triggers around | ||
70 API calls to retrieve all the records, making it more likely to hit the AWS Route53 API rate limits. | ||
|
||
To prevent this problem from happening, external-dns has implemented a cache to reduce the pressure on the DNS | ||
provider APIs. | ||
|
||
This cache is optional and systematically invalidated when DNS records have been changed in the cluster | ||
(new or deleted domains or changed target). | ||
|
||
## Trade-offs | ||
|
||
The major trade-off of this setting relies in the ability to recover from a deleted record on the DNS provider side. | ||
As the DNS records are cached in memory, external-dns will not be made aware of the missing records and will hence | ||
take a longer time to restore the deleted or modified record on the provider side. | ||
|
||
This option is enabled using the `--provider-cache-time=15m` command line argument, and turned off when `--provider-cache-time=0m` | ||
|
||
## Monitoring | ||
|
||
You can evaluate the behaviour of the cache thanks to the built-in metrics | ||
|
||
* `external_dns_provider_cache_records_calls` | ||
* The number of calls to the provider cache Records list. | ||
* The label `from_cache=true` indicates that the records were retrieved from memory and the DNS provider was not reached | ||
* The label `from_cache=false` indicates that the cache was not used and the records were retrieved from the provider | ||
* `external_dns_provider_cache_apply_changes_calls` | ||
* The number of calls to the provider cache ApplyChanges. | ||
* Each ApplyChange systematically invalidates the cache and makes subsequent Records list to be retrieved from the provider without cache. | ||
|
||
## Related options | ||
|
||
This global option is available for all providers and can be used in pair with other global | ||
or provider-specific options to fine-tune the behaviour of external-dns | ||
to match the specific needs of your deployments, with the goal to reduce the number of API calls to your DNS provider. | ||
|
||
* `--google-batch-change-interval=1s` When using the Google provider, set the interval between batch changes. ($EXTERNAL_DNS_GOOGLE_BATCH_CHANGE_INTERVAL) | ||
* `--google-batch-change-size=1000` When using the Google provider, set the maximum number of changes that will be applied in each batch. | ||
* AWS | ||
* `--aws-batch-change-interval=1s` When using the AWS provider, set the interval between batch changes. | ||
* `--aws-batch-change-size=1000` When using the AWS provider, set the maximum number of changes that will be applied in each batch. | ||
* `--aws-batch-change-size-bytes=32000` When using the AWS provider, set the maximum byte size that will be applied in each batch. | ||
* `--aws-batch-change-size-values=1000` When using the AWS provider, set the maximum total record values that will be applied in each batch. | ||
* `--aws-zones-cache-duration=0s` When using the AWS provider, set the zones list cache TTL (0s to disable). | ||
* `--[no-]aws-zone-match-parent` Expand limit possible target by sub-domains | ||
* Cloudflare | ||
* `--cloudflare-dns-records-per-page=100` When using the Cloudflare provider, specify how many DNS records listed per page, max possible 5,000 (default: 100) | ||
* OVH | ||
* `--ovh-api-rate-limit=20` When using the OVH provider, specify the API request rate limit, X operations by seconds (default: 20) | ||
|
||
* Global | ||
* `--registry=txt` The registry implementation to use to keep track of DNS record ownership (default: txt, options: txt, noop, dynamodb, aws-sd) | ||
* `--txt-cache-interval=0s` The interval between cache synchronizations in duration format (default: disabled) | ||
* `--interval=1m0s` The interval between two consecutive synchronizations in duration format (default: 1m) | ||
* `--min-event-sync-interval=5s` The minimum interval between two consecutive synchronizations triggered from kubernetes events in duration format (default: 5s) | ||
* `--[no-]events` When enabled, in addition to running every interval, the reconciliation loop will get triggered when supported sources change (default: disabled) | ||
|
||
A general recommendation is to enable `--events` and keep `--min-event-sync-interval` relatively low to have a better responsiveness when records are | ||
created or updated inside the cluster. | ||
This should represent an acceptable propagation time between the creation of your k8s resources and the time they become registered in your DNS server. | ||
|
||
On a general manner, the higher the `--provider-cache-time`, the lower the impact on the rate limits, but also, the slower the recovery in case of a deletion. | ||
The `--provider-cache-time` value should hence be set to an acceptable time to automatically recover restore deleted records. | ||
|
||
✍️ Note that caching is done within the external-dns controller memory. You can invalidate the cache at any point in time by restarting it (for example doing a rolling update). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
/* | ||
Copyright 2017 The Kubernetes Authors. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
package provider | ||
|
||
import ( | ||
"context" | ||
"sync" | ||
"time" | ||
|
||
"github.com/prometheus/client_golang/prometheus" | ||
log "github.com/sirupsen/logrus" | ||
|
||
"sigs.k8s.io/external-dns/endpoint" | ||
"sigs.k8s.io/external-dns/plan" | ||
) | ||
|
||
var ( | ||
cachedRecordsCallsTotal = prometheus.NewCounterVec( | ||
prometheus.CounterOpts{ | ||
Namespace: "external_dns", | ||
Subsystem: "provider", | ||
Name: "cache_records_calls", | ||
Help: "Number of calls to the provider cache Records list.", | ||
}, | ||
[]string{ | ||
"from_cache", | ||
}, | ||
) | ||
cachedApplyChangesCallsTotal = prometheus.NewCounter( | ||
prometheus.CounterOpts{ | ||
Namespace: "external_dns", | ||
Subsystem: "provider", | ||
Name: "cache_apply_changes_calls", | ||
Help: "Number of calls to the provider cache ApplyChanges.", | ||
}, | ||
) | ||
|
||
registerCacheProviderMetrics = sync.Once{} | ||
) | ||
|
||
type CachedProvider struct { | ||
Provider | ||
RefreshDelay time.Duration | ||
lastRead time.Time | ||
cache []*endpoint.Endpoint | ||
} | ||
|
||
func NewCachedProvider(provider Provider, refreshDelay time.Duration) *CachedProvider { | ||
registerCacheProviderMetrics.Do(func() { | ||
prometheus.MustRegister(cachedRecordsCallsTotal) | ||
}) | ||
return &CachedProvider{ | ||
Provider: provider, | ||
RefreshDelay: refreshDelay, | ||
} | ||
} | ||
|
||
func (c *CachedProvider) Records(ctx context.Context) ([]*endpoint.Endpoint, error) { | ||
if c.needRefresh() { | ||
log.Info("Records cache provider: refreshing records list cache") | ||
records, err := c.Provider.Records(ctx) | ||
if err != nil { | ||
c.cache = nil | ||
return nil, err | ||
} | ||
c.cache = records | ||
c.lastRead = time.Now() | ||
cachedRecordsCallsTotal.WithLabelValues("false").Inc() | ||
} else { | ||
log.Debug("Records cache provider: using records list from cache") | ||
cachedRecordsCallsTotal.WithLabelValues("true").Inc() | ||
} | ||
return c.cache, nil | ||
} | ||
func (c *CachedProvider) ApplyChanges(ctx context.Context, changes *plan.Changes) error { | ||
if !changes.HasChanges() { | ||
log.Info("Records cache provider: no changes to be applied") | ||
return nil | ||
} | ||
c.Reset() | ||
cachedApplyChangesCallsTotal.Inc() | ||
return c.Provider.ApplyChanges(ctx, changes) | ||
} | ||
|
||
func (c *CachedProvider) Reset() { | ||
c.cache = nil | ||
c.lastRead = time.Time{} | ||
} | ||
|
||
func (c *CachedProvider) needRefresh() bool { | ||
if c.cache == nil { | ||
log.Debug("Records cache provider is not initialized") | ||
return true | ||
} | ||
log.Debug("Records cache last Read: ", c.lastRead, "expiration: ", c.RefreshDelay, " provider expiration:", c.lastRead.Add(c.RefreshDelay), "expired: ", time.Now().After(c.lastRead.Add(c.RefreshDelay))) | ||
return time.Now().After(c.lastRead.Add(c.RefreshDelay)) | ||
} |
Oops, something went wrong.