dns: add dns resolver implementation based on Apple APIs#13074
dns: add dns resolver implementation based on Apple APIs#13074mattklein123 merged 48 commits intoenvoyproxy:masterfrom junr03:apple-dns
Conversation
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
|
@goaway @mattklein123 alright, I have battle tested this a bit with envoy mobile and different network conditions (although need to talk to @sberrevoets about testing on a iOS14 device tomorrow. I also ran the (majority) of dns_impl_test suite and it looks good. I wanted your eyes on this as early as possible, so I think this is ready for a draft stage review. I have left some I'll, obviously, add a test suite once we run through first comments. It will look very close to dns_impl_test. I also will add the runtime usage and selective compilation after first comments. cc @htuch @Reflejo -- given you all were involved in the original writing of the c-ares implementation, which this borrows from in spirit. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for getting this out so quickly. Flushing out some comments from the first pass. Let me know when you want me to take a look again. Thank you!
/wait
|
@mattklein123 @goaway thanks for the initial pass, will push update ASAP |
|
@goaway @mattklein123 updated! I added plentiful comments. I agree with both of you that there are some non-intuitive aspects of the Apple API. Let me know if I went a little overboard. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this LGTM at a high level. Dropped a few more comments. Let me know when ready for a full review!
/wait
| // This implementation uses a shared connection for two main reasons: | ||
| // 1. Efficiency of concurrent resolutions by sharing the same underlying UDS to the DNS | ||
| // daemon. | ||
| // 2. An error on a connection to the daemon is good indication that other connection, | ||
| // even if not shared, would not succeed. So it is better to share one connection and | ||
| // promptly cancel all outstanding queries, rather than individually wait for all | ||
| // connections to error out. | ||
| // However, using a shared connection brings some complexities detailed in the inline comments | ||
| // for kDNSServiceFlagsShareConnection in dns_sd.h, and copied (and edited) in this implementation | ||
| // where relevant. |
There was a problem hiding this comment.
I don't feel strongly about it, but is this a premature optimization? Should we just do a connection per resolution for now and keep it simple?
There was a problem hiding this comment.
I ultimately decided to leave it. I think the reasons in the inline comment are strong enough to leave it. Mostly around "failing fast" in mobile environments were instead of potentially waiting for independent timeouts that may stagger, all outstanding queries will fail.
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
| static bool use_apple_api_for_dns_lookups = | ||
| Runtime::runtimeFeatureEnabled("envoy.restart_features.use_apple_api_for_dns_lookups"); | ||
| if (use_apple_api_for_dns_lookups) { | ||
| RELEASE_ASSERT( |
There was a problem hiding this comment.
Shouldn't we be rejecting the config earlier in a more graceful way? If so, shouldn't these be ASSERTs?
There was a problem hiding this comment.
Yes, this should be exceptions that fail during config load.
There was a problem hiding this comment.
It all comes down to snapping the runtime once by making it static local state here vs checking earlier in callers of this function.
I guess I could change this function to return nullptr on failure, and have callers deal with the return value?
There was a problem hiding this comment.
I believe the DNS resolver is created during config load time in all cases. I think you can just throw an exception here.
There was a problem hiding this comment.
I think that's the case for all xDS, assuming this also holds for dynamic forward proxy, SG.
There was a problem hiding this comment.
It holds for the dynamic proxy. Let me check in the UDP filter. If that's the case, I'll update with an exception.
There was a problem hiding this comment.
@htuch @mattklein123 unfortunately, the UDP DNS filter creates a resolver (underneath its own wrapper) for every filter instance construction:
So can't throw an exception as is (and per @htuch the assert is not great either).
The underlying resolver in the UDP DNS filter should definitely be shared, and I can work to untangle. However, I don't want to block this PR too much (we need it to be able to run envoy mobile on ios 14 devices). How do you feel about landing this PR with a warn log in place of the assert. Then I can work on untangling the filter situation, and subsequently upgrading the warn log here into an exception?
There was a problem hiding this comment.
I would actually rather you just leave the RELEASE_ASSERT and make sure the error message is good so the OSX user can flip the runtime flag, then come back and clean it up. So just leave a TODO for that? @htuch?
| "error_code={}, hostname={}", | ||
| dns_name_, flags, flags & kDNSServiceFlagsMoreComing ? "yes" : "no", | ||
| flags & kDNSServiceFlagsAdd ? "yes" : "no", interface_index, error_code, hostname); | ||
| ASSERT(interface_index == 0); |
There was a problem hiding this comment.
How certain are you of some of these ASSERTs on state that comes from the NS libs?
There was a problem hiding this comment.
As certain as I can be from what they write in the comments in dns_sd.h 😅
| @@ -0,0 +1,325 @@ | |||
| #include "common/network/apple_dns_impl.h" | |||
There was a problem hiding this comment.
Sorry, just got around to looking into the details of this PR now. One question I have; what is the threading model here? Who controls the callbacks, which threads do they land on? Maybe a top-level comment explaining this would help. Thanks.
There was a problem hiding this comment.
| &individual_sd_ref_, kDNSServiceFlagsShareConnection | kDNSServiceFlagsTimeout, 0, protocol, | ||
| dns_name_.c_str(), | ||
| /* | ||
| * About Thread Safety (taken from inline documentation there): |
There was a problem hiding this comment.
I'd push this to the top of the file. It's one of the most important thing to have in mind when reading the code.
Signed-off-by: Jose Nino <jnino@lyft.com>
|
Oops looks like legit CI failure. /wait |
|
alright, now it's all green. Thanks for the reviews @mattklein123! |
Updating to pull in envoyproxy/envoy#13074 Signed-off-by: Michael Rebello <me@michaelrebello.com>
Updating to pull in envoyproxy/envoy#13074 which will use Apple's DNS resolver for iOS in order to allow Envoy Mobile to ship on iOS 14 without triggering the "local area network" modal. Signed-off-by: Michael Rebello <me@michaelrebello.com>
| "//test/mocks/runtime:runtime_mocks", | ||
| "//test/mocks/thread_local:thread_local_mocks", | ||
| ] + select({ | ||
| "//bazel:apple": ["//source/common/network:dns_lib"], |
There was a problem hiding this comment.
Should this be: "//bazel:apple": ["//source/common/network:apple_dns_lib"],
Updating to pull in #13074 which will use Apple's DNS resolver for iOS in order to allow Envoy Mobile to ship on iOS 14 without triggering the "local area network" modal. Signed-off-by: Michael Rebello <me@michaelrebello.com> Signed-off-by: JP Simard <jp@jpsim.com>
Updating to pull in #13074 which will use Apple's DNS resolver for iOS in order to allow Envoy Mobile to ship on iOS 14 without triggering the "local area network" modal. Signed-off-by: Michael Rebello <me@michaelrebello.com> Signed-off-by: JP Simard <jp@jpsim.com>
Commit Message: adding DNS resolver based on Apple's DNSServiceGetAddrInfo functionality.
Additional Description: This is particularly required in builds of Envoy that run on iOS devices >version 14. Apple added new local network permissions that are triggered with DNS resolution over the local DNS server on WiFi networks. The Apple API is excluded from the new permissions.
Risk Level: high for users of the new resolver, as the resolver does not have the amount of production as the c-ares based implementation. However, this is fully opt-in and only available on Apple platforms.
Testing: unit tests that run on mac ci
Release Notes: added
Signed-off-by: Jose Nino jnino@lyft.com