dns: configuring a basic key value store to persist to disk#17745
dns: configuring a basic key value store to persist to disk#17745alyssawilk merged 17 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
/wait on CI |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
I think this would potentially be a nice solution to think about it. FWIW we had discussed something similar today for wrapping DNS in order to get DNS specific analytic events. cc @snowp |
|
I'm not sure what the ask is for this PR. I don't want this tied to DNS but I think it'd be reasonable to make sure the DNS refactor allows for all the platform resolvers to use the cache. |
|
I'm suggesting the DNS resolver itself is a cache. |
|
i.e. that we build DNS caching inside the DNS resolver framework. I'm not sure why we want to reuse the existing data plane caching framework for this. |
RyanTheOptimist
left a comment
There was a problem hiding this comment.
A few small questions to get started...
| // | ||
| // All keys and values are flushed to a single file as | ||
| // [length]\n[key][length]\n[value] | ||
| class FileBasedKeyValueStore : public KeyValueStoreBase { |
There was a problem hiding this comment.
I was a bit surprised to see that the interface here is Envoy::KeyValueStore but the concrete class is Envoy::Extensions::Cache::KeyValueCache. I had expected them to both live in the same namespace, roughly. But maybe this is a typical way to lay things out?
There was a problem hiding this comment.
yeah, per offline conversation with harvey I moved all the things to a location he prefered, so now it's KeyValue and then KeyValueStore which is at least a substring?
alyssawilk
left a comment
There was a problem hiding this comment.
FYI there's an outstanding issue with both clang tidy and docs build, but API should be ready for review. PTAL.
| // | ||
| // All keys and values are flushed to a single file as | ||
| // [length]\n[key][length]\n[value] | ||
| class FileBasedKeyValueStore : public KeyValueStoreBase { |
There was a problem hiding this comment.
yeah, per offline conversation with harvey I moved all the things to a location he prefered, so now it's KeyValue and then KeyValueStore which is at least a substring?
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
htuch
left a comment
There was a problem hiding this comment.
API looks good modulo comments.
|
|
||
| // [#not-implemented-hide:] | ||
| // Configuration to flush the DNS cache to long term storage. | ||
| key_value.v3.KeyValueStoreConfig persistent_cache_config = 13; |
There was a problem hiding this comment.
Nit on naming: given this is referencing an extension point, do we know that there will always be persistence guaranteed?
There was a problem hiding this comment.
Good question. I can't think of a use case where we'd add a key-value store and not persist it. We already have a DNS cache so the point of this is to persist. That said, just because I can't think of a use case it certainly doesn't hurt to future-proof :-)
|
|
||
| // The interval at which the key value store should be flushed to long term | ||
| // storage. | ||
| google.protobuf.Duration flush_interval = 2; |
There was a problem hiding this comment.
I guess this gets to my earlier question, do we know that all implementations will flush? Or will flushing be a property of a specific key-value story, e.g. the FileBasedKeyValueStoreConfig?
There was a problem hiding this comment.
Ditto - current uses all involve persisting but can't hurt to move it to file based and just dup the single value for mobile based.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk
left a comment
There was a problem hiding this comment.
looking at docs pr - must have failed a string somewhere
|
|
||
| // [#not-implemented-hide:] | ||
| // Configuration to flush the DNS cache to long term storage. | ||
| key_value.v3.KeyValueStoreConfig persistent_cache_config = 13; |
There was a problem hiding this comment.
Good question. I can't think of a use case where we'd add a key-value store and not persist it. We already have a DNS cache so the point of this is to persist. That said, just because I can't think of a use case it certainly doesn't hurt to future-proof :-)
|
|
||
| // The interval at which the key value store should be flushed to long term | ||
| // storage. | ||
| google.protobuf.Duration flush_interval = 2; |
There was a problem hiding this comment.
Ditto - current uses all involve persisting but can't hurt to move it to file based and just dup the single value for mobile based.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
| @@ -0,0 +1,27 @@ | |||
| syntax = "proto3"; | |||
|
|
|||
| package envoy.extensions.common.key_value.file_based.v3; | |||
There was a problem hiding this comment.
Nit: this should be in api/envoy/extensions/key_value/filed_based/v3/config.proto and same package namespace.
The rationale is that the KeyValueConfig is a common extension point declaration reused across protos. The FileBasedKkeyValueStoreConfig is a key_value extension, and all extension classes have a root like api/envoy/extensions/<extension class>.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
argh sorry, force pushed format fixes. Bad @alyssawilk no biscuit |
snowp
left a comment
There was a problem hiding this comment.
Overall this looks pretty good, just a few nits and a question on the interaction between cache load and DNS resolution
envoy/common/key_value_store.h
Outdated
There was a problem hiding this comment.
Should we just pass ServerFactoryContext instead of specifically passing the file system? Might future proof against some other use cases
There was a problem hiding this comment.
Unfortunately we can't - the DNS store is created both from ServerFactoryContext and from the cluster's TransportSocketFactoryContextImpl which have wide overlap but no common base class.
I have that filed in away in my clean up list but I think resolving it is out of scope for this PR
| // TODO(alyssawilk) change finishResolve to actually use the TTL rather than | ||
| // putting 0 here, return the remaining TTL or indicate the result is stale. | ||
| response.emplace_back(Network::DnsResponse(address, std::chrono::seconds(0) /* ttl */)); | ||
| startCacheLoad(key, address->ip()->port()); |
There was a problem hiding this comment.
I'm not that familiar with this code, but it looks like this will kick off a DNS query? And then we immediately resolve it?
If so I wonder if this will cause unforeseen consequences, like having entries being resolved twice (once from cache and once from DNS) and confusing stats (e.g. stat increment for DNS resolution and cache load for same entry)
Thoughts?
There was a problem hiding this comment.
it works today (see integration test) but I agree it's not great, hence the TODO :-)
What I want to have happen is be able to differentiate stale cached results from fresh. I do want to immediately kick off a resolve, but also have stale data in there, config for how long we wait for fresh data before we use stale data. I plan to do that in the follow up
There was a problem hiding this comment.
For my understanding, would a DNS resolution then overwrite this value once it arrives?
There was a problem hiding this comment.
yeah, fresh resolve always overwrites older data (if it's different)
in this case it'd be the same so not trigger any changes.
| MockKeyValueStoreFactory factory; | ||
| EXPECT_CALL(factory, createEmptyConfigProto()).WillRepeatedly(Invoke([]() { | ||
| return std::make_unique< | ||
| envoy::extensions::key_value::file_based::v3::FileBasedKeyValueStoreConfig>(); | ||
| })); | ||
| MockKeyValueStore* store{}; | ||
| EXPECT_CALL(factory, createStore(_, _, _, _)).WillOnce(Invoke([&store]() { | ||
| auto ret = std::make_unique<NiceMock<MockKeyValueStore>>(); | ||
| store = ret.get(); | ||
| // Make sure there's an attempt to load from the key value store. | ||
| EXPECT_CALL(*store, iterate); | ||
| return ret; | ||
| })); |
There was a problem hiding this comment.
Should this be implemented on the MockKeyValueStoreFactory itself? Having them in the test case here is a bit distracting from what the test is actually testing
There was a problem hiding this comment.
maybe? Honestly I regularly get bit by expectations in mock timer where I don't understand what's going on, so I'm hesitant to add complexity like that to the mock base classes
There was a problem hiding this comment.
Sure not a big deal, just what I usually see done with mocks. Agree that MockTimer is especially confusing :)
| InSequence s; | ||
| ASSERT(store != nullptr); | ||
|
|
||
| // Make sure the store gets the first insert. |
There was a problem hiding this comment.
It's a bit hard to follow how the below code block verifies anything with the store, maybe make that clearer with comments? (I think it's L1067 but there is a lot of boiler plate here)
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
@davinci26 @wrowe any ideas on the windows failure? I've commented out with TODO to unblock this but I'm wondering if you have ideas what went wrong. For example are there known gotchas with file paths on windows CI? |
snowp
left a comment
There was a problem hiding this comment.
LG! We definitely need to figure out the Windows thing, but I'm okay with doing that in a follow up
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
@alyssawilk I am not sure, I will have to debug locally but it is super weird because the same test passes on windows with clang. |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
coverage flake unrelated: from a combination of quic refactor and setec patches |
* main: config: fix dfp config validation (envoyproxy#17835) docs: updating where meetings are uploaded (envoyproxy#17832) h2: moving a comment (envoyproxy#17846) quiche: early fail listener config with both quic and connection_balencer (envoyproxy#17834) dns: configuring a basic key value store to persist to disk (envoyproxy#17745) quic: fix receiving STOP_SENDING (envoyproxy#17815) tooling: Add Github release manager (envoyproxy#17741) tooling: Use upstream pytest-patches (envoyproxy#17809) Remove `hidden_envoy_deprecated_use_http2` (envoyproxy#17805) kafka: produce request for mesh-filter (envoyproxy#17818) Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Clean up inspired by #17745 Risk Level: low (interface refactor) Testing: n/a Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Risk Level: low (config guarded)
Testing: new unit, integration tests
Docs Changes: n/a (hidden)
Release Notes: n/a
Part of envoyproxy/envoy-mobile#1587