Skip to content

dns: disable the use of search namespaces when using the c-ares resolver#16237

Merged
mattklein123 merged 18 commits intoenvoyproxy:mainfrom
suniltheta:area_flag_nosearch_option
Jun 2, 2021
Merged

dns: disable the use of search namespaces when using the c-ares resolver#16237
mattklein123 merged 18 commits intoenvoyproxy:mainfrom
suniltheta:area_flag_nosearch_option

Conversation

@suniltheta
Copy link
Copy Markdown
Contributor

@suniltheta suniltheta commented Apr 29, 2021

Signed-off-by: Sunil Narasimhamurthy sunnrs@amazon.com

Commit Message: Modify API to support configuration of ARES flags to disable the use of search namespaces when using the c-ares resolver and reconcile DNS options in a single message
Additional Description:
Risk Level: low
Testing: unit test
Docs Changes: yes
Release Notes: updated new feature and deprecated list
Platform Specific Features:
Issues: #12432
Deprecated: field 'use_tcp_for_dns_lookups' in bootstrap, cluster & dynamic_forward_proxy xDS APIs

…of search

namespaces when using the c-ares resolver and reconcile DNS options in a single message

Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #16237 was opened by suniltheta.

see: more, trace.

@suniltheta
Copy link
Copy Markdown
Contributor Author

suniltheta commented Apr 29, 2021

Hi, this is a draft PR to address the issue #12432 .
I want to get the opinion and suggestions about the DnsLookupOptions fields. Can I get the tips on how I can propagate this message throughout the code where the DNS resolvers are instantiated.

Also need consensus on this change where use_tcp_for_dns_lookups option will be deprecated and will be moved inside new protobuf message DnsLookupOptions.

Note: I have not included the files from running the tools/proto_format/proto_format.sh fix, to keep the change set small. So CI might be failing.

Also @rshriram please let me know if you have any suggestion on this PR.

Thank you.

Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
}

// Configuration of DNS Lookup option flags which control the behavior of the DNS resolver.
message DnsLookupOptions {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think we are heading towards a model of pluggable DNS resolvers in the not too distant future. I wonder if the DNS lookup options should reflect that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. I never thought of that. Thank you for your suggestion. I will keep this in mind and take care of it in upcoming commits.

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I've added a few comments.

// ``envoy.restart_features.use_apple_api_for_dns_lookups`` runtime value is true during
// server startup. Apple' API only uses UDP for DNS resolution.
bool use_tcp_for_dns_lookups = 8;
bool use_tcp_for_dns_lookups = 8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the comment a reference to the configuration that should be used (instead of the deprecated one)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment.

google.protobuf.BoolValue use_tcp_for_dns_lookups = 1;

// Do not use the default search domains; only query hostnames as-is or as aliases.
google.protobuf.BoolValue no_defalt_search_domain = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
google.protobuf.BoolValue no_defalt_search_domain = 2;
google.protobuf.BoolValue no_default_search_domain = 2;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took care of this manually. thanks for noticing.

// specified.
// Setting this value causes failure if the
// ``envoy.restart_features.use_apple_api_for_dns_lookups`` runtime value is true during
// server startup. Apple' API only uses UDP for DNS resolution.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// server startup. Apple' API only uses UDP for DNS resolution.
// server startup. Apple's API only uses UDP for DNS resolution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took care of this manually. thanks for noticing.

@suniltheta suniltheta force-pushed the area_flag_nosearch_option branch 2 times, most recently from 009f522 to 5862a19 Compare April 30, 2021 06:28
…dressed few review comments

Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Please fix link issues, as it blocks CI.

// server startup. Apple' API only uses UDP for DNS resolution.
bool use_tcp_for_dns_lookups = 20;
// This field is deprecated. Instead use
// :ref:`dns_lookup_options <envoy_api_field_config.config.bootstrap.v3.Bootstrap.dns_lookup_options>`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a wrong link here. Should probably be:
envoy_api_field_config.bootstrap.v3.Bootstrap.dns_lookup_options.
Please fix here and in other places.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. I am working on these changes on the side. I will try to push rest of the changes by tomorrow. I will fix this :ref: link as well.

// server startup. Apple's API only uses UDP for DNS resolution.
google.protobuf.BoolValue use_tcp_for_dns_lookups = 1;

// Do not use the default search domains; only query hostnames as-is or as aliases.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think this should be tagged with [#not-implemented-hide:], because IIUC it is not implemented yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes are committed.

// specified.
// Setting this value causes failure if the
// ``envoy.restart_features.use_apple_api_for_dns_lookups`` runtime value is true during
// server startup. Apple's API only uses UDP for DNS resolution.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch this new message replaces a deprecated field (use_tcp_for_dns_lookups). However, it also modifies the field type from bool to google.protobuf.BoolValue.
I'm assuming it is ok, as the other field is deprecated, just want your opinion on this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. What is the default value for this field? BoolValue is mostly useful when we anticipate wanting to change the default (in particular for security related fields).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is protobuf's bool default value (false).
Actually, the name of the field use_tcp_for_dns_lookups implies it is meant to be "enabled". I'm not certain what's the added value of BoolValue in this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here is my approach to keep supporting the deprecated field "bool use_tcp_for_dns_lookups". I would love to get your opinions on them.

Even though the field "bool use_tcp_for_dns_lookups" is marked deprecated, we expect previous implementations of xDSv3 control plane to still set this value. So the logic will check if the new field "google.protobuf.BoolValue use_tcp_for_dns_lookups" is being set via dns_lookup_options or not. The code will rely on has_use_tcp_for_dns_lookups() to check if this value is being programmed by the server. If this dns_lookup_options.has_use_tcp_for_dns_lookups() is false (which means value is not set) then the code will check if the deprecated field bool use_tcp_for_dns_lookups has the value true and uses it. This makes sure of the backward compatibility of the changes.

Correct me if I am wrong here, by just using bool field type we will not be able to know if the config (true or false) is applied or not.

One might ask can we just do OR of bool use_tcp_for_dns_lookups and dns_lookup_options.use_tcp_for_dns_lookups().value(), like -> config.dns_lookup_options.use_tcp_for_dns_lookups().value() || config.use_tcp_for_dns_lookups. But as per the logic which is being worked upon now, the new field
google.protobuf.BoolValue dns_lookup_options.use_tcp_for_dns_lookups will get preference and override any value set by bare bool use_tcp_for_dns_lookups value.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the presence of the dns_lookup_options message alone to detect that we aren't dealing with legacy configuration..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions. Not using BoolValue.

Sunil Narasimhamurthy added 2 commits May 6, 2021 04:39
…of search namespaces

when using the c-ares resolver and reconcile DNS options in a single message.

Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
@suniltheta suniltheta changed the title [WIP] dns filter: disable the use of search namespaces when using the c-ares resolver dns: disable the use of search namespaces when using the c-ares resolver May 6, 2021
@suniltheta suniltheta marked this pull request as ready for review May 6, 2021 04:48
Sunil Narasimhamurthy added 2 commits May 6, 2021 16:36
Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API looks good.
Left some drive-by comments on the non-api code.

Comment on lines 206 to 207
* @param use_tcp_for_dns_lookups if set to true, tcp will be used to perform dns lookups.
* Otherwise, udp is used.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. thanks for noticing

"using TCP for DNS lookups is not possible when using Apple APIs for DNS "
"resolution. Apple' API only uses UDP for DNS resolution. Use UDP or disable "
"the envoy.restart_features.use_apple_api_for_dns_lookups runtime feature.");
return std::make_shared<Network::AppleDnsResolverImpl>(*this, api_.randomGenerator(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in this case the new no_default_search_domain flag won't be used to initialize the AppleDnsResolver.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some clarity here. Originally AppleDnsResolver has problem using TCP for DNS resolution. Now as we introduce no_default_search_domain as DNS lookup/resolver options, nowhere in this PR we are making this options available for AppleDnsResolver.

Can you please suggest what we can do here?

  1. Should we find a way to pass this no_default_search_domain into the DNS resolver library that AppleDnsResolver uses, or
  2. Highlight that setting any options via dns_resolver_options is not allowed for AppleDnsResolver? Before this PR there was already a RELEASE_ASSERT which make sure the user is not setting use_tcp_for_dns_lookups . Should that be the norm for any options set via dns_resolver_options going forward ?

Please let me know. Thanks in advance.

max_hosts_(PROTOBUF_GET_WRAPPED_OR_DEFAULT(config, max_hosts, 1024)) {
envoy::config::core::v3::DnsResolverOptions dns_resolver_options;
if (config.has_dns_resolver_options()) {
const auto& config_dns_options = config.dns_resolver_options();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.
Also not sure if the two pieces of code can be refactored, as they are essentially the same.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took care of this change. Need your thoughts here. In the updated changes the code looks like

    envoy::config::core::v3::DnsResolverOptions dns_resolver_options;
    if (cluster.has_dns_resolver_options()) {
      dns_resolver_options.CopyFrom(cluster.dns_resolver_options());
    } else {
      // <code comment>
      dns_resolver_options.set_use_tcp_for_dns_lookups(cluster.use_tcp_for_dns_lookups());
    }

technically we don't need both the if & else condition. We could just do with

  envoy::config::core::v3::DnsResolverOptions dns_resolver_options;
  dns_resolver_options.CopyFrom(cluster.dns_resolver_options());
  if (!cluster.has_dns_resolver_options()) {
    dns_resolver_options.set_use_tcp_for_dns_lookups(cluster.use_tcp_for_dns_lookups());
  }

But I thought it is better if we make it explicit that if legacy control plane has not configured dns_resolver_options then we stick to just setting dns_resolver_options.use_tcp... based on deprecated field use_tcp_for_dns_lookups.

const bool use_tcp_for_dns_lookups = bootstrap_.use_tcp_for_dns_lookups();
dns_resolver_ = dispatcher_->createDnsResolver({}, use_tcp_for_dns_lookups);
envoy::config::core::v3::DnsResolverOptions dns_resolver_options;
if (bootstrap_.has_dns_resolver_options()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.
I think this can be refactored to avoid code dup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +101 to +105
const auto& cluster_dns_options = cluster.dns_resolver_options();
dns_resolver_options.set_use_tcp_for_dns_lookups(
cluster_dns_options.use_tcp_for_dns_lookups());
dns_resolver_options.set_no_default_search_domain(
cluster_dns_options.no_default_search_domain());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use something like CopyFrom here?
Otherwise, more fields will require manual copying.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks.

Sunil Narasimhamurthy added 2 commits May 8, 2021 00:04
@suniltheta
Copy link
Copy Markdown
Contributor Author

Waiting on #16294 to be merged before merging this.

@suniltheta
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16237 (comment) was created by @suniltheta.

see: more, trace.

@suniltheta
Copy link
Copy Markdown
Contributor Author

suniltheta commented May 28, 2021

The job running on agent Azure Pipelines 17 ran longer than the maximum time of 180 minutes.

So restarting the tests.

@suniltheta
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16237 (comment) was created by @suniltheta.

see: more, trace.

@suniltheta
Copy link
Copy Markdown
Contributor Author

Hi @htuch & @adisuissa, this PR is ready for review.
The CI is failing with the following reason below. I thought let me just get the review comments and address those comments so that subsequent commit can trigger a new CI run. Thank you for your time.

##[error]The agent did not connect within the alloted time of 45 minute(s).
,##[error]The request: 395683 was abandoned due to an infrastructure failure. Notification of assignment to an agent was never received.

// This may be overridden on a per-cluster basis in cds_config, when
// :ref:`dns_resolution_config <envoy_v3_api_field_config.cluster.v3.Cluster.dns_resolution_config>`
// is specified.
core.v3.DnsResolutionConfig dns_resolution_config = 30;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanjunxiang-google this is the proto message that DNS resolver extension should live in.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

bool use_tcp_for_dns_lookups = 1;

// Do not use the default search domains; only query hostnames as-is or as aliases.
bool no_default_search_domain = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add a version history entry for this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
@adisuissa
Copy link
Copy Markdown
Contributor

/lgtm api

@repokitteh-read-only repokitteh-read-only bot removed the api label Jun 1, 2021
@suniltheta
Copy link
Copy Markdown
Contributor Author

suniltheta commented Jun 1, 2021

Hi Maintainers, this PR is ready for merge?
@htuch & @adisuissa have given first pass approval. This commit should fix the issue #12432. Thank you for your time.
Tagging for attention @mattklein123, @htuch @dio, @lizan

@mattklein123
Copy link
Copy Markdown
Member

cc @alyssawilk this is a case for the bot of a PR not getting a non-API reviewer. I will review.

@mattklein123 mattklein123 self-assigned this Jun 2, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mattklein123 mattklein123 merged commit 2c0e1f2 into envoyproxy:main Jun 2, 2021
@alyssawilk
Copy link
Copy Markdown
Contributor

Ugh, this was definitely missed with old and new versions of the script, though I also see cases in maintainer-oncall where the API bug is definitely caught. Can't debug this easily since it's merged but please let me know if you see other cases :-(

@alyssawilk
Copy link
Copy Markdown
Contributor

on no wait, this was caught, I just searched for the wrong string. So should be fixed forward as long as on-call goes through the list :-)

@YaseenAlk
Copy link
Copy Markdown
Contributor

Hi, I am running buf on some previous commits to the envoy api to see if there were any commits that break api compatibility and it detected this change as being non-backwards-compatible:

api/envoy/extensions/common/dynamic_forward_proxy/v3/dns_cache.proto:110:3:Field "9" on message "DnsCacheConfig" changed type from "envoy.config.core.v3.DnsResolver" to "envoy.config.core.v3.DnsResolutionConfig".
api/envoy/extensions/common/dynamic_forward_proxy/v3/dns_cache.proto:110:38:Field "9" on message "DnsCacheConfig" changed name from "dns_resolver" to "dns_resolution_config".
api/envoy/extensions/common/dynamic_forward_proxy/v3/dns_cache.proto:110:3:Field "9" with name "dns_resolution_config" on message "DnsCacheConfig" changed option "json_name" from "dnsResolver" to "dnsResolutionConfig".

Just wanted to flag it in case it was unintentional! cc @adisuissa

@htuch
Copy link
Copy Markdown
Member

htuch commented Aug 13, 2021

@YaseenAlk what is the SHA or PR this happened at? I seem to remember OKing something like that because it was within a short window of the original commit, where we allow these changes, but I should verify.

@suniltheta
Copy link
Copy Markdown
Contributor Author

The incompatibility is between this PR and the PR at #16294
And anything that is incompatible is not across a release. So we should be good here?

@htuch
Copy link
Copy Markdown
Member

htuch commented Aug 13, 2021

Yep, we allow breaking API changes within a 14 day window, see https://github.com/envoyproxy/envoy/blob/main/api/API_VERSIONING.md.

Good catch though!

@jpeach
Copy link
Copy Markdown
Contributor

jpeach commented Aug 30, 2021

@htuch The upstream_resolvers field in DnsFilterConfig was present in Envoy 1.18, but absent in Envoy 1.19. AFAICT that makes this an incompatible change, and it's also difficult for a control plane to support both since the field number was re-used.

@suniltheta
Copy link
Copy Markdown
Contributor Author

https://github.com/envoyproxy/envoy/blob/main/api/API_VERSIONING.md

API versions tagged vNalpha. Within an alpha major version, arbitrary breaking changes are allowed.

@jpeach That is true, but the API is still v3alpha. Hope that is allowed.

@jpeach
Copy link
Copy Markdown
Contributor

jpeach commented Aug 31, 2021

https://github.com/envoyproxy/envoy/blob/main/api/API_VERSIONING.md

API versions tagged vNalpha. Within an alpha major version, arbitrary breaking changes are allowed.

@jpeach That is true, but the API is still v3alpha. Hope that is allowed.

Is it possible to upgrade across the breaking change? i.e. I have to serve 1.18 and 1.19 clients concurrently.

leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
…ver (envoyproxy#16237)

Signed-off-by: Sunil Narasimhamurthy <sunnrs@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants