Skip to content

fix(aws): incorrect behavior for non-aliasable record types#6017

Open
u-kai wants to merge 11 commits intokubernetes-sigs:masterfrom
u-kai:fix/aws-alias-logic
Open

fix(aws): incorrect behavior for non-aliasable record types#6017
u-kai wants to merge 11 commits intokubernetes-sigs:masterfrom
u-kai:fix/aws-alias-logic

Conversation

@u-kai
Copy link
Copy Markdown
Member

@u-kai u-kai commented Dec 12, 2025

What does it do?

This PR fixes incorrect behavior for record types that do not support AWS Alias records (e.g., MX, TXT, SRV, etc.).

Motivation

This work originated from the discussion here:
#5997 (comment)

When a ProviderSpecific property alias=true exists on record types that don't support alias records
(like MX records):

  • Alias processing continues even after alias property deletion
    • The alias property is deleted at aws.go, but the alias variable remains true
    • This results in unnecessary evaluateTargetHealth=false being added
  • TTL value gets unintentionally modified
    • When RecordTTL is configured, it gets fixed to 300

Reproduction

ep := &endpoint.Endpoint{
    RecordType: endpoint.RecordTypeMX,
    RecordTTL:  600,
    ProviderSpecific: endpoint.ProviderSpecific{
        {Name: "alias", Value: "true"},
    },
}
// Expected: TTL=600, ProviderSpecific=empty
// But result is: TTL=300, evaluateTargetHealth=false gets added

While this affects cases with incorrectly configured ProviderSpecific properties, I believe the behavior should

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
@k8s-ci-robot k8s-ci-robot added the internal Issues or PRs related to internal code label Dec 12, 2025
@k8s-ci-robot k8s-ci-robot added provider Issues or PRs related to a provider needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 12, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @u-kai. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 12, 2025
@u-kai u-kai changed the title Refactor alias-handling logic and fix incorrect behavior for non-aliasable record types fix(aws): refactor alias-handling logic and fix incorrect behavior for non-aliasable record types Dec 12, 2025
@ivankatliarchuk
Copy link
Copy Markdown
Member

Refactoring and bug fix in single PR. Too risky. As well as for bug, it should be manifests provided to reproduce

@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Dec 12, 2025

@ivankatliarchuk
Sorry about that — I’ll be more careful next time.

To make the existing issue easier to fix, I’ve opened a separate PR with the refactoring work.
Once that PR is merged, I will rebase this PR and continue the bug fix on top of it.

It would be great if you could review the refactoring PR first. Thanks!

@u-kai u-kai changed the title fix(aws): refactor alias-handling logic and fix incorrect behavior for non-aliasable record types fix(aws): rfix incorrect behavior for non-aliasable record types Dec 12, 2025
@u-kai u-kai changed the title fix(aws): rfix incorrect behavior for non-aliasable record types fix(aws): incorrect behavior for non-aliasable record types Dec 12, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2026
u-kai added 2 commits January 28, 2026 21:55
Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 28, 2026
@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Jan 28, 2026

@ivankatliarchuk
Now that the refactoring PR has been merged, I’ve rebased and updated this PR so it’s ready for review again.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Missing links to docs in description with facts about this behaviour. Missing end to end tests with the proof.

I'll review only after other reviewers share their review. I could not keep approving on my own.

@ivankatliarchuk
Copy link
Copy Markdown
Member

There’s no example showing the current behaviour, so it’s unclear whether this is actually a problem or not.

@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Jan 28, 2026

I’ve updated the PR description.

Previously, even for record types that do not support alias (such as MX records), ExternalDNS was applying alias-specific behavior — updating TTL and adding the evaluateTargetHealth property — whenever alias was set to true.

This was incorrect and also hurt code readability.
With this change, MX records no longer have their TTL modified even if alias is true, and alias-related properties are removed for record types that do not support alias.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Missing before and after real example. And missing links to official documentation describing AWS alias record behavior

Comment thread provider/aws/aws.go Outdated
if isAlias, _ := ep.GetBoolProviderSpecificProperty(providerSpecificAlias); isAlias {
p.adjustAliasRecord(ep)
// NS and SOA records do not support alias
if ep.RecordType == endpoint.RecordTypeNS || ep.RecordType == "SOA" {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about ep.RecordType == "SOA"

@coveralls
Copy link
Copy Markdown

coveralls commented Feb 3, 2026

Pull Request Test Coverage Report for Build 23925061512

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 198 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.04%) to 80.517%

Files with Coverage Reduction New Missed Lines %
aws/aws.go 23 89.07%
store.go 37 65.08%
azure/azure.go 58 76.0%
gateway.go 80 86.53%
Totals Coverage Status
Change from base Build 23734434195: 0.04%
Covered Lines: 17147
Relevant Lines: 21296

💛 - Coveralls

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
@u-kai u-kai force-pushed the fix/aws-alias-logic branch from 73b7319 to ef7b98a Compare February 7, 2026 06:56
@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Feb 7, 2026

As noted in the code comments, AWS Route53 allows Alias records for record types other than NS and SOA.
However, the ExternalDNS documentation mainly describes replacing CNAME records with Alias records.

From a practical perspective, there is little real-world value in using Alias records for types other than A and AAAA. In fact, the current code already does not accept Alias records for MX during validation. While the concept of “Alias” exists across multiple providers, its semantics differ significantly.

For example, GCP Cloud DNS ALIAS records are specifically designed to synthesize A/AAAA records at the zone apex and are not intended for use with MX or other non-address record types.

To keep the Alias behavior simple and consistent across providers, this change removes the Alias property
when it is set on record types other than A and AAAA.

Previously, when non-CNAME records had the Alias property set, ExternalDNS applied Alias-specific settings
(such as default TTL and evaluateTargetHealth) and then removed the Alias property.
With this change, we instead remove the Alias property and related fields (including evaluateTargetHealth)
entirely.

This helps avoid ambiguous behavior and keeps the implementation aligned with practical use cases.

@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Feb 7, 2026

Here is a concrete example to illustrate the behavior.

Given the following DNSEndpoint:

apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
  name: alias
  namespace: default
spec:
  endpoints:
    - dnsName: mail.example.com
      recordType: MX
      recordTTL: 600
      targets:
        - "10 mail1.example.com."
      providerSpecific:
        - name: "alias"
          value: "true"

With the previous behavior, when this was applied:

  • The TTL was automatically rewritten to 300 (the default for Alias records).
  • The evaluateTargetHealth property was added to the endpoint.

As a result, ExternalDNS detected a diff on every sync and continuously issued UPDATE requests, as shown in the logs, because alias-related properties were dropped during request generation for MX records.

INFO[0075] Desired change: UPSERT mx-mail.example.com TXT  profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXX zoneName=example.com.
INFO[0075] Desired change: UPSERT test.mail.example.com MX  profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXX zoneName=example.com.
INFO[0076] 2 record(s) were successfully updated         profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXX zoneName=example.com.
...
INFO[0086] Desired change: UPSERT mx-mail.example.com TXT  profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXX zoneName=example.com.
INFO[0086] Desired change: UPSERT mail.example.com MX  profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXX zoneName=example.com.
INFO[0087] 2 record(s) were successfully updated         profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXX zoneName=example.com.

With the current change:

  • The TTL remains as the user-specified value (600 in this example).
  • No unnecessary alias-related properties are injected.

As a result, after the initial sync, no further UPDATE operations are triggered, and the record remains stable.

This avoids unnecessary reconciliation loops and keeps the behavior consistent with what the provider actually supports.

INFO[0001] Desired change: UPSERT mx-mail.example.com TXT  profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXXX zoneName=example.com.
INFO[0001] Desired change: UPSERT mail.example.com MX  profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXXX zoneName=example.com.
INFO[0002] 2 record(s) were successfully updated         profile=default zoneID=/hostedzone/ZXXXXXXXXXXXXXX zoneName=example.com.
...
INFO[0011] All records are already up to date

sample-ex-dns

@u-kai u-kai requested a review from ivankatliarchuk February 7, 2026 07:47
@ivankatliarchuk
Copy link
Copy Markdown
Member

I gotcha now.

What’s bothering me isn’t the mechanics of the fix, it’s where the responsibility boundary is being moved. Right now this PR quietly shifts user configuraiont error handling into the AWS provider, and that’s a design smell.

An MX record with alias=true is invalid in ExternalDNS or DNS terms, regardless of what Route53 might technically support.

If the input is invalid:

Silently “fixing” it in the provider:

  • hides user mistakes
  • creates provider-specific behavior
  • makes debugging harder
  • encourages cargo-cult annotations

ExternalDNS already has clear layering:

Sources → Endpoints → Wrappers Validation/Normalization → Providers

The provider’s job is:

  • translate valid, normalized endpoints into provider API calls

Not:

  • reinterpret semantics
  • drop fields conditionally
  • "guess" user intent

This PR does exactly that:

  • it accepts invalid input
  • then mutates it deep in the AWS provider
  • instead of rejecting it earlier

That’s backwards.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Now:

  • AWS provider silently “fixes” invalid alias usage
  • Other providers may not
  • Behavior diverges across providers

ExternalDNS should enforce provider-agnostic semantics before provider code runs.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Ideally, what we should have

Reject invalid combinations at middleware layer:

  • alias=true only allowed for A / AAAA
  • fail endpoint validation
  • surface a clear debug msg

This is the cleanest contract.

@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Feb 11, 2026

Totally agree — the issue here is the responsibility boundary.

To move in that direction, I opened a follow-up PR that makes endpoint validation fail when alias=true is used for non A/AAAA,CNAME record types. Once that PR is merged, this PR can be simplified further.

I’ll update this PR accordingly once the validation change lands.

u-kai added 2 commits March 1, 2026 10:33
Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
@ivankatliarchuk
Copy link
Copy Markdown
Member

Is this resolved or still a problem?

@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Mar 4, 2026

Sorry, it has already been resolved.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Sorry, it has already been resolved.

/close

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@ivankatliarchuk: Closed this PR.

Details

In response to this:

Sorry, it has already been resolved.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Mar 31, 2026

Sorry for the confusion. "Resolved" was referring to the code review comment being addressed, not the PR itself being resolved. I'll reopen this.

@u-kai
Copy link
Copy Markdown
Member Author

u-kai commented Mar 31, 2026

/reopen

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@u-kai: Reopened this PR.

Details

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot reopened this Mar 31, 2026
@ivankatliarchuk
Copy link
Copy Markdown
Member

Ok

Copy link
Copy Markdown
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change missing some links in PR title + debug logs or something.

High level. We have a gap - no end 2 end test that stitches together CheckEndpoint → dedupSource → AdjustEndpoints to show the full rejection path. The two unit tests exist, but their composition is not visible in unit tests.

Comment thread provider/aws/aws.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we cleaning up the providerSpecificAlias property in wrapper layer? This was just added in recent PR https://github.com/kubernetes-sigs/external-dns/pull/6021/changes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've separated this into the PR below:
#6342

Comment thread provider/aws/aws_test.go
expectedAaaa: nil,
},
// TODO: fix For records other than A, AAAA, and CNAME, if an alias record is set, the alias record processing is not performed. This will be fixed in another PR.
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. Maybe instead of removing test, actually provide a current code behaviour with changed name?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this case can still exist from a unit test perspective.
However, in the current design, it's difficult for an alias record to be used with MX, since it's prevented at another layer, so I removed the test.

What kind of test name would you suggest if we keep it?

Comment thread provider/aws/aws_test.go Outdated
Co-authored-by: Ivan Ka <5395690+ivankatliarchuk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. internal Issues or PRs related to internal code ok-to-test Indicates a non-member PR verified by an org member that is safe to test. provider Issues or PRs related to a provider size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants