Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Become IDNA aware in Plan and DomainFilter #4689

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kimsondrup
Copy link
Contributor

@kimsondrup kimsondrup commented Aug 19, 2024

Description

Ensure support for Internationalized Domain Names for Applications (aka. domains using Unicode) using golang.org/x/net/idna

Disclaimer, this is my first Go code so please look at it with extra skepticism

Checklist

  • Unit tests updated
  • End user documentation updated

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign raffo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 19, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @kimsondrup. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 19, 2024
@kimsondrup kimsondrup force-pushed the punycode branch 6 times, most recently from 590af75 to a26165c Compare August 19, 2024 21:03
@kimsondrup kimsondrup changed the title feat: Handle Unicode domains in Plan and DomainFilter feat: Become IDNA aware in Plan and DomainFilter Aug 19, 2024
@mloiseleur
Copy link
Contributor

Thanks for this PR @kimsondrup !
Reviewing the tests with all those emojis in FQDN are really fun :)
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 20, 2024
@kimsondrup
Copy link
Contributor Author

kimsondrup commented Aug 20, 2024

After looking at the code a bit more, I'm a little confused.
I am trying to figure out the reasoning behind some of the places that External DNS normalize values.
For example, the function RemoveDuplicates() in endpoint/endpoint.go or the places that use strings.ToLower() in registry/txt.go could look like some places where IDNA awareness might be wanted.
I am tempted to think that all the places should be normalized, but I am unsure of the impact.
If there were more well-defined tests for normalization in these places, like the ones seen in Plan and DomainFilter, I would be less hesitant to touch things.

But I could give a shoot at implementing it if you think it make sense to do so?

@kimsondrup
Copy link
Contributor Author

kimsondrup commented Aug 21, 2024

After looking at bit more at the code I think I will just summit the changes for registry/txt.go IDNA awareness in another pull request.
It is less cut and dry since txtPrefix and txtSuffix might conflict with how Ponycode prefix each level of the domain with xn-- and again suffix each level with the encoded characters.
But that is a problem for future me.

@mloiseleur if this pull request looks good to you then I have nothing else to add.

// normalizeDomain converts a domain to a canonical form, so that we can filter on it
// it: trim "." suffix, get Unicode version of domain complient with Section 5 of RFC 5891
func normalizeDomain(domain string) string {
s, _ := idna.Lookup.ToUnicode(strings.TrimSuffix(domain, "."))
Copy link
Contributor

@mloiseleur mloiseleur Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt about catching the error and, at least, log a warning ?

Copy link
Contributor Author

@kimsondrup kimsondrup Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is opinionated to return a partial string on errors (source) so I judged that we could get away with ignoring it, even if just to avoid log spam should something happen. But you we could log it as a warning or debug depending what you prefer?

@mloiseleur
Copy link
Contributor

It is less cut and dry since txtPrefix and txtSuffix might conflict with how Ponycode prefix each level of the domain with xn-- and again suffix each level with the encoded characters.

🤔 If the prefix and suffix are applied before the idna toAscii is called, then it should work, isn't it ?

@kimsondrup
Copy link
Contributor Author

It is less cut and dry since txtPrefix and txtSuffix might conflict with how Ponycode prefix each level of the domain with xn-- and again suffix each level with the encoded characters.

🤔 If the prefix and suffix are applied before the idna toAscii is called, then it should work, isn't it ?

Yes, I would also prefer a solution where prefix and suffix are applied before the IDNA toAscii is called, but given I don't know the codebase that well, I would like to either unquestioningly trust someone to tell me how it should be done or get some more time to read the code myself 😅

Some providers might be IDNA aware themself so that I can imagine an External DNS playing around with prefix and suffix introducing some problems.

Just an example of some results

prefix Unicode concat Unicode concat ASCII concat
extd. ørsted.dk extd.xn--rsted-uua.dk extd.xn--rsted-uua.dk
extd. orsted.dk extd.orsted.dk extd.orsted.dk
extd- ørsted.dk xn--extd-rsted-4cb.dk extd-xn--rsted-uua.dk
extd- orsted.dk extd-orsted.dk extd-orsted.dk
xn- ørsted.dk xn--xn-rsted-74a.dk xn-xn--rsted-uua.dk
xn- orsted.dk xn--xn-rsted-74a.dk xn-xn--rsted-uua.dk
xn--1 ørsted.dk xn--xn--rsted-o8a.dk xn--xn--rsted-o8a.dk
xn--1 orsted.dk xn--orsted.dk xn--orsted.dk

I think that when adding the logic to registry/txt.go, it should also include some sane constraints on --txt-prefix and --txt-suffix with the most simple being something like no unicode and no double dash in the params.

So, a "finished" implementation™ for IDNA awareness in External DNS should, IMHO, also handle whether a provider might normalize the domains themselves and then return either the Punycode, Unicode or the raw string used to create it, which could be a mix of the two2 at their discretion. To help with testing this, I would like the InMemory provider to support these three different behaviours so that they can be used in tests.

I don't mind making all of this now that I am already invested in the task, but I would like that when the changelog says, "IDNA awareness now brings Unicode support", it is a thoroughly tested implementation.

1 I can't see a reason why anyone would prefix with xn--, but the potential to shoot oneself in the foot this bad feels like something that should be blocked so that we don't end up with something like this idna.ToUnicode("xn--orsted.dk") -> 夹夺夃.dk (I hope this random string doesn't mean anything rude)

2 Example morsø.xn--rsted-uua.dk

@mloiseleur
Copy link
Contributor

I think that when adding the logic to registry/txt.go, it should also include some sane constraints on --txt-prefix and --txt-suffix with the most simple being something like no unicode and no double dash in the params.

That sounds reasonable, yes 👍 .

So, a "finished" implementation™ for IDNA awareness in External DNS should, IMHO, also handle whether a provider might normalize the domains themselves and then return either the Punycode, Unicode or the raw string used to create it, which could be a mix of the two at their discretion. To help with testing this, I would like the InMemory provider to support these three different behaviours so that they can be used in tests.

🤔 Why would we want to enter in such a messy and complex world ? External Dns is maintained by very few people on their spare time, so we clearly need something as simple as possible. What about a switch like --enable-idna, which would enable or disable this feature ? Disabled by default to start, and enabled by default when most providers support it ?

I don't mind making all of this now that I am already invested in the task, but I would like that when the changelog says, "IDNA awareness now brings Unicode support", it is a thoroughly tested implementation.

Yes, sure 💯 !

@kimsondrup
Copy link
Contributor Author

🤔 Why would we want to enter in such a messy and complex world ? External Dns is maintained by very few people on their spare time, so we clearly need something as simple as possible. What about a switch like --enable-idna, which would enable or disable this feature ? Disabled by default to start, and enabled by default when most providers support it ?

Great point!
Perhaps a --enable-idna flag could be combined with something like how SupportedRecordType() bool is implemented so that providers could implement SupportIDNA() bool to tell External DNS if IDNA is supported without the users of External DNS having to think about it?

@mloiseleur
Copy link
Contributor

mloiseleur commented Sep 7, 2024

Perhaps a --enable-idna flag could be combined with something like how SupportedRecordType() bool is implemented so that providers could implement SupportIDNA() bool to tell External DNS if IDNA is supported without the users of External DNS having to think about it?

Following KISS principle, I suggest to just add the feature in this PR and provide this additional feature to switch default in an other PR. You'll need to design it also for webhook providers.

@kimsondrup
Copy link
Contributor Author

@mloiseleur I would like to remove the changes made to the Plan logic from this PR and only leave in the DomainFilter changes.

The reason is that I am already prototyping the larger change with the switch and, in the process, realized that DomainFilter is used all over the place in both External DNS and for external providers. So, I can't see a good way to make a switch global without using an environment variable, which I'm not much for.

So my suggestion would be to let DomainFilter be IDNA aware by default with this PR, focusing on the best effort where encoding problems are logged but the domain is still processed. Then, in a separate PR, the logic for an --idna-enabled will be added, and one of the changes will enforce that the domain must be valid for IDNA encoded/decoded.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2024
@mloiseleur
Copy link
Contributor

/lgtm
/assign @Raffo

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2024
@mloiseleur mloiseleur removed their assignment Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants