v0.13.5 trying to create already existing record #3754

dmitriishaburov · 2023-06-29T11:50:15Z

What happened:

After update to v0.13.5, external-dns is having CrashLoopBackOff trying to create already existing DNS record in Route53 that it already manages.

What you expected to happen:

Not crashing.

How to reproduce it (as minimally and precisely as possible):

Probably:

Create DNS record for LoadBalancer in Route53 via v0.13.4, then update to v0.13.5

Anything else we need to know?:

We have following DNS records for EKS LoadBalancer, which were created by external-dns v0.13.4:

cname-tempo-distributor.prelive.domain TXT Simple - No "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/tempo/tempo-distributed-distributor"
tempo-distributor.prelive.domain A Simple - Yes k8s-tempo-tempodis-xxxx.elb.eu-west-1.amazonaws.com.
tempo-distributor.prelive.domain TXT Simple - No "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/tempo/tempo-distributed-distributor"

After update to v0.13.5, external-dns trying to recreate them and fails:

time="2023-06-29T11:27:47Z" level=info msg="Desired change: CREATE a-tempo-distributor.prelive.domain TXT [Id: /hostedzone/ID]"
time="2023-06-29T11:27:47Z" level=info msg="Desired change: CREATE tempo-distributor.prelive.domain  A [Id: /hostedzone/ID]"
time="2023-06-29T11:27:47Z" level=info msg="Desired change: CREATE tempo-distributor.prelive.domain  TXT [Id: /hostedzone/ID]"
time="2023-06-29T11:27:47Z" level=error msg="Failure in zone domain. [Id: /hostedzone/ID] when submitting change batch: InvalidChangeBatch: [Tried to create resource record set [name='tempo-distributor.prelive.domain.', type='A'] but it already exists, Tried to create resource record set [name='tempo-distributor.prelive.domain.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: ID"
time="2023-06-29T11:27:48Z" level=fatal msg="failed to submit all changes for the following zones: [/hostedzone/ID]"

Command line args (for both versions):

    Args:
      --log-level=info
      --log-format=text
      --interval=1m
      --source=service
      --source=ingress
      --policy=upsert-only
      --registry=txt
      --provider=aws

Environment:

External-DNS version (use external-dns --version): v0.13.5
DNS provider: Route53

The text was updated successfully, but these errors were encountered:

zeqk · 2023-06-30T18:48:26Z

May be related to this, looks like a similar problem
#3484
#3706

iodeslykos · 2023-07-04T04:15:52Z

We encountered the same issue described by the OP after upgrading the Helm Chart to version 1.13.0 (external-dns v0.13.5) from 1.12.2 (external-dns v0.13.4): Pods entered CrashLoopBackOff after repeated failures to create records that already existed in the target Route53 Hosted Zone.

Current working solution is to downgrade back to Helm Chart v1.12.2 (v0.13.4).

aardbol · 2023-07-05T13:12:06Z

Same issue with Google DNS. Downgrade to Helm Chart v1.12.2 (v0.13.4) works

jbilliau-rcd · 2023-07-28T17:31:26Z

We are having the same issue, the pod straight up crashes from a normal error we see all the time, through multiple version upgrades over the years.

alfredkrohmer · 2023-08-29T08:29:12Z

This seems to be caused by this changes: #3009

szuecs · 2023-09-08T13:30:45Z

cname-tempo-distributor.prelive.domain TXT Simple - No "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/tempo/tempo-distributed-distributor"
tempo-distributor.prelive.domain A Simple - Yes k8s-tempo-tempodis-xxxx.elb.eu-west-1.amazonaws.com.
tempo-distributor.prelive.domain TXT Simple - No "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/tempo/tempo-distributed-distributor"

What do you mean by Yes/No ?

iodeslykos · 2023-09-08T13:39:27Z

cname-tempo-distributor.prelive.domain TXT Simple - No "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/tempo/tempo-distributed-distributor"
tempo-distributor.prelive.domain A Simple - Yes k8s-tempo-tempodis-xxxx.elb.eu-west-1.amazonaws.com.
tempo-distributor.prelive.domain TXT Simple - No "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/tempo/tempo-distributed-distributor"

What do you mean by Yes/No ?

Yes and No are part of the record when pulled via the AWS CLI utility from Route53 and represent whether or not the record is an Alias, which is a Route53-specific extension to DNS functionality.

CAR6807 · 2023-11-16T17:00:48Z

Any update on this?
Is this fixed in 1.14.0?

k8s-triage-robot · 2024-02-14T17:24:10Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

krmichelos · 2024-02-14T19:22:59Z

/remove-lifecycle stale

CAR6807 · 2024-03-13T20:57:46Z

Bump

k8s-triage-robot · 2024-06-11T21:55:51Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

FernandoMiguel · 2024-06-12T06:31:25Z

/remove-lifecycle stale

iodeslykos · 2024-06-12T14:58:10Z

To ensure that the maintainers understand this is still an issue: it is.

We did upgrade past v1.12.x, but it required a significant amount of record deletions in order to allow CoreDNS to recreate the records it had previously managed without issue. Now everything is working fine.

If there is conflict in DNS we would expect the error message, but not the entire CoreDNS deployment to enter CrashLoopBackoff.

CAR6807 · 2024-06-19T13:14:43Z

Bump.
This is preventing us from addressing high vulnerabilities fixed in newer version.
We would like to avoid have to delete existing records to avoid potential outages.
Record conflicts should not cause the external dns controller to crash outright.

mlazowik · 2024-09-10T10:43:53Z

I'm guessing this is fixed, at least for some providers, by #4166?

k8s-triage-robot · 2024-12-09T11:06:59Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

aardbol · 2024-12-21T14:03:13Z

/remove-lifecycle stale

dmitriishaburov added the kind/bug Categorizes issue or PR as related to a bug. label Jun 29, 2023

alfredkrohmer mentioned this issue Aug 29, 2023

If an error propagates all the way out, bail execution #3009

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2024

CAR6807 mentioned this issue Apr 3, 2024

Pod crash on aws route53 #4330

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 9, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.13.5 trying to create already existing record #3754

v0.13.5 trying to create already existing record #3754

dmitriishaburov commented Jun 29, 2023

zeqk commented Jun 30, 2023

iodeslykos commented Jul 4, 2023 •

edited

Loading

aardbol commented Jul 5, 2023

jbilliau-rcd commented Jul 28, 2023

alfredkrohmer commented Aug 29, 2023

szuecs commented Sep 8, 2023

iodeslykos commented Sep 8, 2023

CAR6807 commented Nov 16, 2023

k8s-triage-robot commented Feb 14, 2024

krmichelos commented Feb 14, 2024

CAR6807 commented Mar 13, 2024

k8s-triage-robot commented Jun 11, 2024

FernandoMiguel commented Jun 12, 2024

iodeslykos commented Jun 12, 2024

CAR6807 commented Jun 19, 2024

mlazowik commented Sep 10, 2024

k8s-triage-robot commented Dec 9, 2024

aardbol commented Dec 21, 2024

v0.13.5 trying to create already existing record #3754

v0.13.5 trying to create already existing record #3754

Comments

dmitriishaburov commented Jun 29, 2023

zeqk commented Jun 30, 2023

iodeslykos commented Jul 4, 2023 • edited Loading

aardbol commented Jul 5, 2023

jbilliau-rcd commented Jul 28, 2023

alfredkrohmer commented Aug 29, 2023

szuecs commented Sep 8, 2023

iodeslykos commented Sep 8, 2023

CAR6807 commented Nov 16, 2023

k8s-triage-robot commented Feb 14, 2024

krmichelos commented Feb 14, 2024

CAR6807 commented Mar 13, 2024

k8s-triage-robot commented Jun 11, 2024

FernandoMiguel commented Jun 12, 2024

iodeslykos commented Jun 12, 2024

CAR6807 commented Jun 19, 2024

mlazowik commented Sep 10, 2024

k8s-triage-robot commented Dec 9, 2024

aardbol commented Dec 21, 2024

iodeslykos commented Jul 4, 2023 •

edited

Loading