Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 240 additions & 7 deletions docs/contributing/sources-and-providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ tags:

ExternalDNS supports swapping out endpoint **sources** and DNS **providers** and both sides are pluggable. There currently exist multiple sources for different provider implementations.

**Usage**

You can choose any combination of sources and providers on the command line.
Given a cluster on AWS you would most likely want to use the Service and Ingress Source in combination with the AWS provider.
`Service` + `InMemory` is useful for testing your service collecting functionality, whereas `Fake` + `Google` is useful for testing that the Google provider behaves correctly, etc.

## Sources

Sources are an abstraction over any kind of source of desired Endpoints, e.g.:
Expand All @@ -21,7 +27,7 @@ The `Source` interface has a single method called `Endpoints` that should return

```go
type Source interface {
Endpoints() ([]*endpoint.Endpoint, error)
Endpoints() ([]*endpoint.Endpoint, error)
}
```

Expand Down Expand Up @@ -83,8 +89,8 @@ Upon receiving a change set (via an object of `plan.Changes`), `ApplyChanges` sh

```go
type Provider interface {
Records() ([]*endpoint.Endpoint, error)
ApplyChanges(changes *plan.Changes) error
Records() ([]*endpoint.Endpoint, error)
ApplyChanges(changes *plan.Changes) error
}
```

Expand All @@ -100,8 +106,235 @@ All providers live in package `provider`.
* `AzureProvider`: returns and creates DNS records in Azure DNS
* `InMemoryProvider`: Keeps a list of records in local memory

## Usage
### Implementing GetDomainFilter

You can choose any combination of sources and providers on the command line.
Given a cluster on AWS you would most likely want to use the Service and Ingress Source in combination with the AWS provider.
`Service` + `InMemory` is useful for testing your service collecting functionality, whereas `Fake` + `Google` is useful for testing that the Google provider behaves correctly, etc.
`GetDomainFilter()` is a method on the `Provider` interface. The default implementation in
`BaseProvider` returns an empty filter with no effect. Providers can override it to
contribute an additional domain constraint to the reconcile plan, on top of whatever the
user configured via `--domain-filter`.

#### How the controller uses it

Each reconcile cycle, the controller builds a plan combining two filters:

```go
DomainFilter: endpoint.MatchAllDomainFilters{c.DomainFilter, registryFilter}
```

* `c.DomainFilter` — from the `--domain-filter` CLI flag (user-supplied)
* `registryFilter` — the value returned by `provider.GetDomainFilter()`

`MatchAllDomainFilters` is a logical AND: a record must satisfy both to be included in the
plan. The provider filter acts as an additional, provider-side constraint on top of whatever
the user configured.

#### When to leave the default

If your provider has no concept of zones, domains, or hosted zones — for example, a
provider backed by flat storage like etcd — the `BaseProvider` default is fine. Do not
override it just to echo `config.DomainFilter` back. For example, if the user runs with
`--domain-filter=example.com` and the provider returns the same value, the plan sees:

```go
MatchAllDomainFilters{example.com, example.com} // same filter twice, no added value
```

This is functionally identical to the default and adds no protection.

#### When and how to override — the dynamic pattern

Override `GetDomainFilter()` when your provider has an authoritative list of zones,
domains, or hosted zones it manages — regardless of what the DNS provider calls them —
and can narrow the scope independently of what the user configured. Two concrete
benefits make this worthwhile:

**Protection without user configuration** — when no `--domain-filter` is set,
`BaseProvider` returns an empty filter and the controller has no domain constraint at all.
A dynamic override builds the constraint from zones the provider actually manages, so the
controller is scoped correctly even if the operator never sets a flag.

**The filter reflects reality, not intent** — `--domain-filter` expresses what the
operator wants to manage. `GetDomainFilter()` expresses what the provider actually manages
at runtime — zones that exist and are accessible with the current credentials. The
intersection of the two is tighter and safer than either alone.

For example, if `--domain-filter=example.com` is set but the provider only has access to
`api.example.com` and `prod.example.com`, a dynamic implementation scopes the plan to
exactly those two zones rather than anything under `example.com`.

The correct approach is to query your zone API at runtime and build the filter from the
zones your provider actually controls. `AWSProvider.GetDomainFilter()` is the canonical
example:

```go
func (p *MyProvider) GetDomainFilter() endpoint.DomainFilterInterface {
zones, err := p.zones()
if err != nil {
return &endpoint.DomainFilter{}
}
// Apply your own configured filter to keep only zones this provider manages.
filteredZones := applyDomainFilter(zones)

names := make([]string, 0, len(zones))
for _, z := range filteredZones {
names = append(names, z.Name, "."+z.Name)
}
return endpoint.NewDomainFilter(names)
}
```

Each zone name is added twice — as a bare domain (`example.com`) and with a leading dot
(`.example.com`) — so the filter matches both exact records and subdomains.

For example, suppose the provider manages four zones:

```sh
api.example.com
prod.myapp.io
staging.myapp.io
legacy.internal.net
```

**Without `--domain-filter`** — the provider filter alone constrains the plan:

```go
MatchAllDomainFilters{
<empty>, // no CLI flag, matches everything
[api.example.com, .api.example.com,
prod.myapp.io, .prod.myapp.io,
staging.myapp.io, .staging.myapp.io,
legacy.internal.net, .legacy.internal.net], // only provider-managed zones
}
```

The controller will only touch records in those four zones. Any other zone in the cluster
is left untouched, even if records pointing to it appear in sources.

**With `--domain-filter=myapp.io`** — the two filters intersect:

```go
MatchAllDomainFilters{
myapp.io, // CLI flag
[api.example.com, .api.example.com,
prod.myapp.io, .prod.myapp.io,
staging.myapp.io, .staging.myapp.io,
legacy.internal.net, .legacy.internal.net],
}
```

Only `prod.myapp.io` and `staging.myapp.io` satisfy both filters and are in scope.
`api.example.com` and `legacy.internal.net` are excluded by the CLI filter.

On error, return an empty `&endpoint.DomainFilter{}`. This has the same effect as the
`BaseProvider` default — the CLI filter becomes the sole authority. If the user specifies
a domain the provider does not manage, reconciliation will proceed against it. This is a
deliberate tradeoff: a temporary API failure should not block all reconciliation.

For example, if the provider manages `a.com` and `b.com` but the user sets
`--domain-filter=c.com`, a dynamic implementation produces an empty intersection —
the controller does nothing:

```go
MatchAllDomainFilters{
c.com, // CLI flag
[a.com, .a.com, // provider zones — no overlap with c.com
b.com, .b.com],
}
```

With an empty `GetDomainFilter()` (default or error), only the CLI filter applies and
the controller attempts to reconcile `c.com` against a provider that does not manage it.

#### Zone name formatting

Check the format your provider's API returns for zone names before passing them to
`endpoint.NewDomainFilter`. Some APIs include a trailing dot (`"example.com."`), which
must be stripped first:

```go
// API returns: "foo.example.com."
// Filter needs: "foo.example.com"
name := strings.TrimSuffix(z.Name, ".")
names = append(names, name, "."+name)
```

#### Summary

| Implementation | `--domain-filter` unset | `--domain-filter` set |
|---------------------------------------|--------------------------------------------|----------------------------------------------|
| `BaseProvider` default | No additional constraint | User filter applied |
| Static (echoes `config.DomainFilter`) | No additional constraint (same as default) | Same filter applied twice — redundant |
| Dynamic (`ListZones` + filter) | Provider-managed zones constrain the plan | Intersection of user filter + provider zones |

The dynamic approach is what gives `GetDomainFilter()` its value: when no `--domain-filter`
is set, it prevents the controller from touching records in zones the provider does not
manage.

#### Testing

`GetDomainFilter()` must have a unit test. See `TestAWSProvider_GetDomainFilter` for a
reference. At minimum, test that:

* Zone names are correctly mapped to filter entries (including the leading-dot variant)
* An error from `ListZones` returns an empty `DomainFilter` gracefully

## Provider Blueprints

The `provider/blueprint` package contains reusable building blocks for provider
implementations. Using them keeps providers consistent and avoids reimplementing
solved problems.

### ZoneCache

`ZoneCache[T]` is a generic, thread-safe TTL cache for zone, domain, or hosted zone data.
See `provider/blueprint/zone_cache.go` for the full API and godoc.

**Reduced API pressure** — listing zones, domains, or hosted zones is called on every
reconcile cycle, but they are rarely created or deleted. Caching the result for a
configurable TTL means the provider only hits the API when the cache has expired, rather
than on every loop.

**Consistent behaviour across providers** — thread safety, TTL logic, and the
disable-via-zero behaviour are implemented and tested once in `blueprint`. Providers that
use `ZoneCache` behave the same way, reducing drift between implementations over time.

The typical usage pattern — taken from `AWSProvider.zones()` — is:

```go
// On the provider struct:
zonesCache *blueprint.ZoneCache[map[string]*MyZone]

// In the constructor:
zonesCache: blueprint.NewZoneCache[map[string]*MyZone](config.ZoneCacheDuration),

// In the zone/domain-listing method:
func (p *MyProvider) zones() (map[string]*MyZone, error) {
if !p.zonesCache.Expired() {
return p.zonesCache.Get(), nil
}

zones, err := p.client.ListZones()
if err != nil {
return nil, err
}

p.zonesCache.Reset(zones)
return zones, nil
}
```

Full behaviour is documented in the `ZoneCache` godoc. The key contract to keep in mind
when implementing the pattern: `Get()` returns stale data after expiry rather than a zero
value — callers must check `Expired()` first and decide whether to refresh.

### Configuration flag

`ZoneCache` is controlled by a single shared flag:

| Flag | Default | Description |
|--------------------------|---------|----------------------------------------------|
| `--zones-cache-duration` | `0s` | Zone list cache TTL. Set to `0s` to disable. |

Add a `ZoneCacheDuration time.Duration` field to your provider config struct, wire it to
this flag in `pkg/apis/externaldns/types.go`, and pass it to `NewZoneCache` in the
constructor.
7 changes: 5 additions & 2 deletions provider/blueprint/zone_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,13 @@ type ZoneCache[T any] struct {
}

// NewZoneCache creates a new ZoneCache with the specified TTL duration.
// A duration of 0 or less disables caching: Reset becomes a no-op and Expired always returns true.
func NewZoneCache[T any](duration time.Duration) *ZoneCache[T] {
return &ZoneCache[T]{duration: duration}
}

// Get returns the cached data. Returns the zero value if cache is empty.
// Get returns the cached data. Returns the zero value if the cache has never been populated.
// Data is not cleared on expiration; Get returns the last known value until Reset is called again.
func (c *ZoneCache[T]) Get() T {
c.mu.RLock()
defer c.mu.RUnlock()
Expand All @@ -57,7 +59,8 @@ func (c *ZoneCache[T]) Reset(data T) {
log.WithField("duration", c.duration).Debug("zone cache reset")
}

// Expired returns true if the cache has expired or is empty.
// Expired returns true if the cache is empty (never populated) or the TTL has elapsed since
// the last Reset. When caching is disabled (duration <= 0), always returns true.
func (c *ZoneCache[T]) Expired() bool {
c.mu.RLock()
defer c.mu.RUnlock()
Expand Down
Loading