Skip to content

dns: add dns resolver implementation based on Apple APIs#13074

Merged
mattklein123 merged 48 commits intoenvoyproxy:masterfrom
junr03:apple-dns
Sep 30, 2020
Merged

dns: add dns resolver implementation based on Apple APIs#13074
mattklein123 merged 48 commits intoenvoyproxy:masterfrom
junr03:apple-dns

Conversation

@junr03
Copy link
Member

@junr03 junr03 commented Sep 13, 2020

Commit Message: adding DNS resolver based on Apple's DNSServiceGetAddrInfo functionality.
Additional Description: This is particularly required in builds of Envoy that run on iOS devices >version 14. Apple added new local network permissions that are triggered with DNS resolution over the local DNS server on WiFi networks. The Apple API is excluded from the new permissions.
Risk Level: high for users of the new resolver, as the resolver does not have the amount of production as the c-ares based implementation. However, this is fully opt-in and only available on Apple platforms.
Testing: unit tests that run on mac ci
Release Notes: added

Signed-off-by: Jose Nino jnino@lyft.com

Jose Nino added 5 commits September 10, 2020 18:23
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03 junr03 changed the title Apple dns dns: add dns resolver implementation based on Apple APIs Sep 13, 2020
Jose Nino added 2 commits September 13, 2020 11:13
Signed-off-by: Jose Nino <jnino@lyft.com>
fmt
Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03
Copy link
Member Author

junr03 commented Sep 13, 2020

@goaway @mattklein123 alright, I have battle tested this a bit with envoy mobile and different network conditions (although need to talk to @sberrevoets about testing on a iOS14 device tomorrow. I also ran the (majority) of dns_impl_test suite and it looks good.

I wanted your eyes on this as early as possible, so I think this is ready for a draft stage review. I have left some POINTS FOR DISCUSSION as inline comments, would like to hear your thoughts.

I'll, obviously, add a test suite once we run through first comments. It will look very close to dns_impl_test. I also will add the runtime usage and selective compilation after first comments.

cc @htuch @Reflejo -- given you all were involved in the original writing of the c-ares implementation, which this borrows from in spirit.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this out so quickly. Flushing out some comments from the first pass. Let me know when you want me to take a look again. Thank you!

/wait

@junr03
Copy link
Member Author

junr03 commented Sep 14, 2020

@mattklein123 @goaway thanks for the initial pass, will push update ASAP

Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03
Copy link
Member Author

junr03 commented Sep 14, 2020

@goaway @mattklein123 updated!

I added plentiful comments. I agree with both of you that there are some non-intuitive aspects of the Apple API. Let me know if I went a little overboard.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this LGTM at a high level. Dropped a few more comments. Let me know when ready for a full review!

/wait

Comment on lines +45 to +54
// This implementation uses a shared connection for two main reasons:
// 1. Efficiency of concurrent resolutions by sharing the same underlying UDS to the DNS
// daemon.
// 2. An error on a connection to the daemon is good indication that other connection,
// even if not shared, would not succeed. So it is better to share one connection and
// promptly cancel all outstanding queries, rather than individually wait for all
// connections to error out.
// However, using a shared connection brings some complexities detailed in the inline comments
// for kDNSServiceFlagsShareConnection in dns_sd.h, and copied (and edited) in this implementation
// where relevant.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it, but is this a premature optimization? Should we just do a connection per resolution for now and keep it simple?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ultimately decided to leave it. I think the reasons in the inline comment are strong enough to leave it. Mostly around "failing fast" in mobile environments were instead of potentially waiting for independent timeouts that may stagger, all outstanding queries will fail.

Jose Nino added 3 commits September 18, 2020 14:59
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Jose Nino added 4 commits September 21, 2020 19:55
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
static bool use_apple_api_for_dns_lookups =
Runtime::runtimeFeatureEnabled("envoy.restart_features.use_apple_api_for_dns_lookups");
if (use_apple_api_for_dns_lookups) {
RELEASE_ASSERT(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be rejecting the config earlier in a more graceful way? If so, shouldn't these be ASSERTs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be exceptions that fail during config load.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all comes down to snapping the runtime once by making it static local state here vs checking earlier in callers of this function.

I guess I could change this function to return nullptr on failure, and have callers deal with the return value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the DNS resolver is created during config load time in all cases. I think you can just throw an exception here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the case for all xDS, assuming this also holds for dynamic forward proxy, SG.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It holds for the dynamic proxy. Let me check in the UDP filter. If that's the case, I'll update with an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch @mattklein123 unfortunately, the UDP DNS filter creates a resolver (underneath its own wrapper) for every filter instance construction:

resolver_ = std::make_unique<DnsFilterResolver>(resolver_callback_, config->resolvers(),

So can't throw an exception as is (and per @htuch the assert is not great either).

The underlying resolver in the UDP DNS filter should definitely be shared, and I can work to untangle. However, I don't want to block this PR too much (we need it to be able to run envoy mobile on ios 14 devices). How do you feel about landing this PR with a warn log in place of the assert. Then I can work on untangling the filter situation, and subsequently upgrading the warn log here into an exception?

Copy link
Member

@mattklein123 mattklein123 Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually rather you just leave the RELEASE_ASSERT and make sure the error message is good so the OSX user can flip the runtime flag, then come back and clean it up. So just leave a TODO for that? @htuch?

"error_code={}, hostname={}",
dns_name_, flags, flags & kDNSServiceFlagsMoreComing ? "yes" : "no",
flags & kDNSServiceFlagsAdd ? "yes" : "no", interface_index, error_code, hostname);
ASSERT(interface_index == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How certain are you of some of these ASSERTs on state that comes from the NS libs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As certain as I can be from what they write in the comments in dns_sd.h 😅

@@ -0,0 +1,325 @@
#include "common/network/apple_dns_impl.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just got around to looking into the details of this PR now. One question I have; what is the threading model here? Who controls the callbacks, which threads do they land on? Maybe a top-level comment explaining this would help. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junr03
Copy link
Member Author

junr03 commented Sep 29, 2020

@htuch answered two of your questions, and I'd like to hear your opinion on the third.

&individual_sd_ref_, kDNSServiceFlagsShareConnection | kDNSServiceFlagsTimeout, 0, protocol,
dns_name_.c_str(),
/*
* About Thread Safety (taken from inline documentation there):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd push this to the top of the file. It's one of the most important thing to have in mind when reading the code.

Signed-off-by: Jose Nino <jnino@lyft.com>
mattklein123
mattklein123 previously approved these changes Sep 30, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mattklein123
Copy link
Member

Oops looks like legit CI failure.

/wait

Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03
Copy link
Member Author

junr03 commented Sep 30, 2020

alright, now it's all green. Thanks for the reviews @mattklein123!

@mattklein123 mattklein123 merged commit d3ecdef into envoyproxy:master Sep 30, 2020
rebello95 added a commit to envoyproxy/envoy-mobile that referenced this pull request Sep 30, 2020
Updating to pull in envoyproxy/envoy#13074

Signed-off-by: Michael Rebello <me@michaelrebello.com>
rebello95 added a commit to envoyproxy/envoy-mobile that referenced this pull request Oct 1, 2020
Updating to pull in envoyproxy/envoy#13074 which will use Apple's DNS resolver for iOS in order to allow Envoy Mobile to ship on iOS 14 without triggering the "local area network" modal.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
"//test/mocks/runtime:runtime_mocks",
"//test/mocks/thread_local:thread_local_mocks",
] + select({
"//bazel:apple": ["//source/common/network:dns_lib"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be: "//bazel:apple": ["//source/common/network:apple_dns_lib"],

jpsim pushed a commit that referenced this pull request Nov 28, 2022
Updating to pull in #13074 which will use Apple's DNS resolver for iOS in order to allow Envoy Mobile to ship on iOS 14 without triggering the "local area network" modal.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Updating to pull in #13074 which will use Apple's DNS resolver for iOS in order to allow Envoy Mobile to ship on iOS 14 without triggering the "local area network" modal.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants