-
Notifications
You must be signed in to change notification settings - Fork 5.5k
dns resolvers: add All lookup mode #18464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
c0b7ccf
51a3419
2272329
386c2cb
8e29fbe
9472b9e
23917d0
d41d879
b51e69c
4b99dc5
66f204a
db288e6
a11ff20
8411c1c
8b2ae4d
d1c8b52
a8aa2ad
368cee4
52d1a88
a718d06
8b7ad1a
4c2ecd6
fc45363
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -108,10 +108,11 @@ class AppleDnsResolverImpl : public DnsResolver, protected Logger::Loggable<Logg | |
|
|
||
| // Small wrapping struct to accumulate addresses from firings of the | ||
| // onDNSServiceGetAddrInfoReply callback. | ||
| struct FinalResponse { | ||
| struct PendingResponse { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a much better name! |
||
| ResolutionStatus status_; | ||
| std::list<DnsResponse> v4_responses_; | ||
| std::list<DnsResponse> v6_responses_; | ||
| std::list<DnsResponse> all_responses_; | ||
| }; | ||
|
|
||
| AppleDnsResolverImpl& parent_; | ||
|
|
@@ -127,7 +128,7 @@ class AppleDnsResolverImpl : public DnsResolver, protected Logger::Loggable<Logg | |
| // DNSServiceGetAddrInfo fires one callback DNSServiceGetAddrInfoReply callback per IP address, | ||
| // and informs via flags if more IP addresses are incoming. Therefore, these addresses need to | ||
| // be accumulated before firing callback_. | ||
| FinalResponse pending_cb_; | ||
| PendingResponse pending_response_; | ||
| DnsLookupFamily dns_lookup_family_; | ||
| }; | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -104,16 +104,24 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |
| // ARES_ECONNREFUSED. If the PendingResolution has not been cancelled that means that the | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The c-ares based resolver changed a bit more because it was not originally written to accumulate responses from multiple c-ares callbacks.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I refactored for ease of understanding but open to comments to simplify further! |
||
| // callback_ target _should_ still be around. In that case, raise the callback_ so the target | ||
| // can be done with this query and initiate a new one. | ||
| if (!cancelled_) { | ||
| ENVOY_LOG_EVENT(debug, "cares_dns_resolution_destroyed", "dns resolution for {} destroyed", | ||
| dns_name_); | ||
|
|
||
| callback_(ResolutionStatus::Failure, {}); | ||
| ENVOY_LOG_EVENT(debug, "cares_dns_resolution_destroyed", "dns resolution for {} destroyed", | ||
| dns_name_); | ||
|
|
||
| if (!pending_response_.address_list_.empty()) { | ||
| ASSERT(dns_lookup_family_ == DnsLookupFamily::All && !dual_resolution_); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here and below I considered a failure in the second resolution an overall success if the first resolution was successful, and hence fired the callback with the existing addresses and a success state. Up to discussing if this should be the behavior.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah IMO a resolve success of either v4 or v6 should be considered success.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, success of v4 OR v6 should mean overall success since we have usable addresses |
||
| // The only way the resolver would have addresses here is if it is configured to resolve All | ||
| // families, and the first resolution was successful. In that case, we might as well return | ||
| // the addresses that resolved as a success. | ||
| } else { | ||
| pending_response_.status_ = ResolutionStatus::Failure; | ||
| } | ||
| delete this; | ||
| // Nothing can follow a call to finishResolve due to the deletion of this object upon | ||
| // finishResolve(). | ||
| finishResolve(); | ||
| return; | ||
| } | ||
| if (!fallback_if_failed_) { | ||
|
|
||
| if (!dual_resolution_) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm, are we holding all forward progress on the dns resolver returning v4 and v6 addreses? That might work for a first pass but I believe that the chrome folks have told me it's a bad idea in general for latency / Quality of Experience reasons. If they confirm, I think we need a TODO to handle ipv4 and ipv6 asynchronously.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I was thinking about this as well from two different angles:
(1) is easy and I could do that in this PR. (2) is a bit harder, and I wouldn't want to do it in this PR.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I would do (1) and not (2) for now.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alyssawilk @mattklein123 re:(1) do we have a preference for all three modes (
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, less sure on that - I'll defer to Matt and David.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alyssawilk @mattklein123 I settled on doing both lookups concurrrently for I am already part of the way on the TODO, so will put that up as a separate PR.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree that waiting for both means users are waiting unnecessarily. But tackling in a separate PR with TODO here sounds good
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SG. Do we have a TODO on not waiting for both?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. decided to create an issue #18572
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This definitely seems like a good way to handle this for now. While suboptimal because it block progres until both return, Chrome does exactly this today. Solving #18572 (later) is a great chance to make envoy mobile better than Chrome! |
||
| completed_ = true; | ||
|
|
||
| // If c-ares returns ARES_ECONNREFUSED and there is no fallback we assume that the channel_ is | ||
|
|
@@ -130,10 +138,8 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |
| } | ||
| } | ||
|
|
||
| std::list<DnsResponse> address_list; | ||
| ResolutionStatus resolution_status; | ||
| if (status == ARES_SUCCESS) { | ||
| resolution_status = ResolutionStatus::Success; | ||
| pending_response_.status_ = ResolutionStatus::Success; | ||
| if (addrinfo != nullptr && addrinfo->nodes != nullptr) { | ||
| if (addrinfo->nodes->ai_family == AF_INET) { | ||
| for (const ares_addrinfo_node* ai = addrinfo->nodes; ai != nullptr; ai = ai->ai_next) { | ||
|
|
@@ -143,7 +149,7 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |
| address.sin_port = 0; | ||
| address.sin_addr = reinterpret_cast<sockaddr_in*>(ai->ai_addr)->sin_addr; | ||
|
|
||
| address_list.emplace_back( | ||
| pending_response_.address_list_.emplace_back( | ||
| DnsResponse(std::make_shared<const Address::Ipv4Instance>(&address), | ||
| std::chrono::seconds(ai->ai_ttl))); | ||
| } | ||
|
|
@@ -154,66 +160,50 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |
| address.sin6_family = AF_INET6; | ||
| address.sin6_port = 0; | ||
| address.sin6_addr = reinterpret_cast<sockaddr_in6*>(ai->ai_addr)->sin6_addr; | ||
| address_list.emplace_back( | ||
| pending_response_.address_list_.emplace_back( | ||
| DnsResponse(std::make_shared<const Address::Ipv6Instance>(address), | ||
| std::chrono::seconds(ai->ai_ttl))); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (!address_list.empty()) { | ||
| if (!pending_response_.address_list_.empty() && dns_lookup_family_ != DnsLookupFamily::All) { | ||
| completed_ = true; | ||
| } | ||
|
|
||
| ASSERT(addrinfo != nullptr); | ||
| ares_freeaddrinfo(addrinfo); | ||
| } else { | ||
| resolution_status = ResolutionStatus::Failure; | ||
| if (!pending_response_.address_list_.empty()) { | ||
| ASSERT(dns_lookup_family_ == DnsLookupFamily::All && !dual_resolution_); | ||
| // The only way the resolver would have addresses here is if it is configured to resolve All | ||
| // families, and the first resolution was successful. In that case, we might as well return | ||
| // the addresses that resolved. | ||
| } else { | ||
| pending_response_.status_ = ResolutionStatus::Failure; | ||
| } | ||
| } | ||
|
|
||
| if (timeouts > 0) { | ||
| ENVOY_LOG(debug, "DNS request timed out {} times", timeouts); | ||
| } | ||
|
|
||
| if (completed_) { | ||
| if (!cancelled_) { | ||
| // Use a raw try here because it is used in both main thread and filter. | ||
| // Can not convert to use status code as there may be unexpected exceptions in server fuzz | ||
| // tests, which must be handled. Potential exception may come from getAddressWithPort() or | ||
| // portFromTcpUrl(). | ||
| // TODO(chaoqin-li1123): remove try catch pattern here once we figure how to handle unexpected | ||
| // exception in fuzz tests. | ||
| ENVOY_LOG_EVENT(debug, "cares_dns_resolution_complete", | ||
| "dns resolution for {} completed with status {}", dns_name_, | ||
| resolution_status); | ||
|
|
||
| TRY_NEEDS_AUDIT { callback_(resolution_status, std::move(address_list)); } | ||
| catch (const EnvoyException& e) { | ||
| ENVOY_LOG(critical, "EnvoyException in c-ares callback: {}", e.what()); | ||
| dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); }); | ||
| } | ||
| catch (const std::exception& e) { | ||
| ENVOY_LOG(critical, "std::exception in c-ares callback: {}", e.what()); | ||
| dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); }); | ||
| } | ||
| catch (...) { | ||
| ENVOY_LOG(critical, "Unknown exception in c-ares callback"); | ||
| dispatcher_.post([] { throw EnvoyException("unknown"); }); | ||
| } | ||
| } | ||
| if (owned_) { | ||
| delete this; | ||
| return; | ||
| } | ||
| finishResolve(); | ||
| // Nothing can follow a call to finishResolve due to the deletion of this object upon | ||
| // finishResolve(). | ||
| return; | ||
| } | ||
|
|
||
| if (!completed_ && fallback_if_failed_) { | ||
| fallback_if_failed_ = false; | ||
| if (dual_resolution_) { | ||
| dual_resolution_ = false; | ||
|
|
||
| // Perform a second lookup for DnsLookupFamily::Auto and DnsLookupFamily::V4Preferred, given | ||
| // that the first lookup failed to return any addresses. Note that DnsLookupFamily::All issues | ||
| // both lookups concurrently so there is no need to fire a second lookup here. | ||
| if (dns_lookup_family_ == DnsLookupFamily::Auto) { | ||
| getAddrInfo(AF_INET); | ||
| } else { | ||
| ASSERT(dns_lookup_family_ == DnsLookupFamily::V4Preferred); | ||
| } else if (dns_lookup_family_ == DnsLookupFamily::V4Preferred) { | ||
| getAddrInfo(AF_INET6); | ||
| } | ||
|
|
||
|
|
@@ -223,6 +213,40 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |
| } | ||
| } | ||
|
|
||
| void DnsResolverImpl::PendingResolution::finishResolve() { | ||
| if (!cancelled_) { | ||
| // Use a raw try here because it is used in both main thread and filter. | ||
| // Can not convert to use status code as there may be unexpected exceptions in server fuzz | ||
| // tests, which must be handled. Potential exception may come from getAddressWithPort() or | ||
| // portFromTcpUrl(). | ||
| // TODO(chaoqin-li1123): remove try catch pattern here once we figure how to handle unexpected | ||
| // exception in fuzz tests. | ||
| ENVOY_LOG_EVENT(debug, "cares_dns_resolution_complete", | ||
| "dns resolution for {} completed with status {}", dns_name_, | ||
| pending_response_.status_); | ||
|
|
||
| TRY_NEEDS_AUDIT { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does TRY_NEEDS_AUDIT imply that we're trying to avoids exceptions on the data plane but we're required to use them here? (Is there an actual audit happening?)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I think there is a push to avoid exceptions on the data plane, except some times on the main thread. But this code does not always run on the main thread so it was left as one of the remaining uses of try that needs to be addressed. Looks like @chaoqin-li1123 has a TODO to clean up. |
||
| callback_(pending_response_.status_, std::move(pending_response_.address_list_)); | ||
| } | ||
| catch (const EnvoyException& e) { | ||
| ENVOY_LOG(critical, "EnvoyException in c-ares callback: {}", e.what()); | ||
| dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); }); | ||
| } | ||
| catch (const std::exception& e) { | ||
| ENVOY_LOG(critical, "std::exception in c-ares callback: {}", e.what()); | ||
| dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); }); | ||
| } | ||
| catch (...) { | ||
| ENVOY_LOG(critical, "Unknown exception in c-ares callback"); | ||
| dispatcher_.post([] { throw EnvoyException("unknown"); }); | ||
| } | ||
| } | ||
| if (owned_) { | ||
| delete this; | ||
| return; | ||
| } | ||
| } | ||
|
|
||
| void DnsResolverImpl::updateAresTimer() { | ||
| // Update the timeout for events. | ||
| timeval timeout; | ||
|
|
@@ -283,15 +307,30 @@ ActiveDnsQuery* DnsResolverImpl::resolve(const std::string& dns_name, | |
| auto pending_resolution = std::make_unique<PendingResolution>( | ||
| *this, callback, dispatcher_, channel_, dns_name, dns_lookup_family); | ||
| if (dns_lookup_family == DnsLookupFamily::Auto || | ||
| dns_lookup_family == DnsLookupFamily::V4Preferred) { | ||
| pending_resolution->fallback_if_failed_ = true; | ||
| dns_lookup_family == DnsLookupFamily::V4Preferred || | ||
| dns_lookup_family == DnsLookupFamily::All) { | ||
| pending_resolution->dual_resolution_ = true; | ||
| } | ||
|
|
||
| if (dns_lookup_family == DnsLookupFamily::V4Only || | ||
| dns_lookup_family == DnsLookupFamily::V4Preferred) { | ||
| switch (dns_lookup_family) { | ||
| case DnsLookupFamily::V4Only: | ||
| case DnsLookupFamily::V4Preferred: | ||
| pending_resolution->getAddrInfo(AF_INET); | ||
| break; | ||
| case DnsLookupFamily::V6Only: | ||
| case DnsLookupFamily::Auto: | ||
| pending_resolution->getAddrInfo(AF_INET6); | ||
| break; | ||
| // NOTE: DnsLookupFamily::All performs both lookups concurrently as addresses from both families | ||
| // are being requested. | ||
| // TODO: DnsLookupFamily::Auto and DnsLookupFamily::V4Preferred could also do concurrent lookups. | ||
|
junr03 marked this conversation as resolved.
Outdated
|
||
| // This will require some refactoring and should be done in a subsequent PR. | ||
| case DnsLookupFamily::All: | ||
| pending_resolution->getAddrInfo(AF_INET); | ||
| } else { | ||
| pending_resolution->getAddrInfo(AF_INET6); | ||
| break; | ||
| default: | ||
| NOT_REACHED_GCOVR_EXCL_LINE; | ||
| } | ||
|
|
||
| if (pending_resolution->completed_) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding this correctly, it's adding v4 addresses before v6 addresses. In general I'd suggest v6 first but you can get fancier as per the happy eyeballs spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that's the case. But the callback receiver can do what they want with the list, and is where I would expect the ordering complexity to happen; so I am not too opinionated with order here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the RFC 8305 (the happy eyeballs spec) as David suggest, and I agree with Jose that we should probably do that in HappyEyeballsConnectionImpl. I'd be happy to take a whack at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to do this outside.