Skip to content

dns_cache: tracking ttl but not yet using it#17951

Merged
snowp merged 10 commits intoenvoyproxy:mainfrom
alyssawilk:dns_ttl
Sep 13, 2021
Merged

dns_cache: tracking ttl but not yet using it#17951
snowp merged 10 commits intoenvoyproxy:mainfrom
alyssawilk:dns_ttl

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

  1. Fixing the dns cache to not fake resolution
  2. tracking TTL and staleness
  3. flushing ttl and last resolution time to disk

This shouldn't have a functional change (apart from what we cache to disk, which isn't yet used) but I have mild preference for keeping the refactor separate from resolution + lifetime changes.

Risk Level: medium (refactoring code)
Testing: updated unit tests, integration test
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

const std::string& resolvedHost() const override { return resolved_host_; }
bool isIpAddress() const override { return is_ip_address_; }
void touch() final { last_used_time_ = time_source_.monotonicTime().time_since_epoch(); }
void updateStale(uint64_t time) { stale_at_time_ = time; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there by any chance a data type we can use here which expresses the semantics here? For example is this a time_t (seconds since the UNIX epoch) or is it more of a TTL (a time delta)? Perhaps one of the std::chrono types? Oh, or maybe just SystemTime directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

uint64_t ms_since_epoch =
std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()).count();
uint64_t stale_at = ms_since_epoch + std::chrono::duration_cast<std::chrono::seconds>(response.front().ttl_).count();
primary_host_info->host_info_->updateStale(stale_at);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might make sense for this method to take the ttl directly and do the work in the method of coverting the ttl to whatever the internal representation is of "now + ttl"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Network::Address::InstanceConstSharedPtr address;
const auto parts = StringUtil::splitToken(value, "|");
std::chrono::seconds ttl(0);
if (parts.size() == 3) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misreading this code but it seems like up on line 452, we write 3 things into the cache value. Down here, we spilt on | and ensure that we have 2 parts. This all makes sense. But I only see code looking at parts[0] (line 478) and parts[1] (line 483) but I don't see a reference to parts[2]. Is this expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I wasn't using part two yet. I was going to do in the PR where we start using stale, but I've worked it into this one with the time updates.
I'll say I'm not 100% on the conversion back and forth but it's hard to test until we have stale expirey which is coming in the next PR!

uint64_t ms_since_epoch =
std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()).count();
const std::string value = absl::StrCat(address->asString(), "|", ttl.count(), "|",
ms_since_epoch);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but it seems like we could add a wrapper around KeyValueStore in which the Value is allowed to be a type which implements an interfaces that has methods like std::string serialize() and bool deserialise(const std::string&). Then this wrapper class could be something like: TypedKeyValueStore which would have a vanilla KeyValueStore as a member. When reading from / writing to the store, it would transparently transform the Value types to/from strings.

/shrug. Maybe something to think about down the road...

EXPECT_CALL(*timeout_timer, disableTimer());
// Make sure the store gets the first insert.
EXPECT_CALL(*store, addOrUpdate("foo.com", "10.0.0.1:80"));
EXPECT_CALL(*store, addOrUpdate("foo.com", ContainsRegex("10.0.0.1:80|30|\\d*)")));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a Regex, could we use the actual value? Maybe that's hard cause we're using the "real" clock and not a mock clock? (If so, would it be possible to use a mock clock? Probably!)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now!

sendRequestAndWaitForResponse(request_headers, 1024, default_response_headers_, 1024);
checkSimpleRequestSuccess(1024, 1024, response.get());
EXPECT_EQ(1, test_server_->counter("dns_cache.foo.cache_load")->value());
EXPECT_EQ(1, test_server_->counter("dns_cache.foo.dns_query_attempt")->value());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be valid to check that this is 0 since the cached entry won't be re-queried?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might race becasue it reregisters for refresh and I'm not sure how long it'll take

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
RyanTheOptimist
RyanTheOptimist previously approved these changes Sep 2, 2021
Copy link
Copy Markdown
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

/wait on ci

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits but otherwise LGTM

void addCacheEntry(const std::string& host,
const Network::Address::InstanceConstSharedPtr& address);
const Network::Address::InstanceConstSharedPtr& address,
const std::chrono::seconds& ttl);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think normally we'd pass this by value since it's just a wrapper around an integer

const std::string& resolvedHost() const override { return resolved_host_; }
bool isIpAddress() const override { return is_ip_address_; }
void touch() final { last_used_time_ = time_source_.monotonicTime().time_since_epoch(); }
void updateStale(MonotonicTime resolution_time, const std::chrono::seconds ttl) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid const value params

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

Matt can you pick up with snow out?

@mattklein123
Copy link
Copy Markdown
Member

@snowp is still in today. @snowp if it's just a final pass can you review today? Otherwise I can take it over.

Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@snowp snowp merged commit c51e30c into envoyproxy:main Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants