filter: add HazelcastHttpCache by enozcan · Pull Request #10536 · envoyproxy/envoy

enozcan · 2020-03-26T15:06:54Z

Description: Implements HttpCache interface introduced by cache filter #7198, using Hazelcast IMDG.
The plugin connects to a Hazelcast member (started as a sidecar, or a member in the same cluster, etc.) using Hazelcast cpp client and stores cached responses in a distributed map. Multiple cache filters from different apps can use the same cache as long as they use the same app prefix in the config file.

The plugin offers two different modes for Http cache:

Unified cache mode

A cached Http response is stored as a single entry in the Hazelcast map. On a range Http request, regardless of the requested range, the whole response body is fetched from the map and then only the desired bytes are served along with the headers and trailers (if any). This mode is handy where response body sizes are reasonably large, or range requests are not frequent, or they are not allowed at all.

Divided cache mode

Two separate maps are used to store a single response. In one of them, response headers, body size, and trailers (if supported by the filter) are stored. In the other one, the corresponding response body is stored in multiple entries each of which has a certain size configured via partition size. These entries store the response in a contiguous manner regardless of the size of insertBody calls made by the filter. That is, for a response of size 5 MB, if partition size is 2 MB then the maps will look like:

<key(long)> --> {headers, trailers, body size} 

<key(string) + "0"> --> body{0-2 MB}
<key(string) + "1"> --> body{2-4 MB}
<key(string) + "2"> --> body{4-5 MB}

On a range request, not the whole body for a response but only the necessary partitions are fetched from the cache. This option helps to serve range requests faster and in a stream-like fashion but comes with a cost. Every body entry has its own fixed memory cost and partitions entries will occupy memory larger than the actual body size. Also, to keep these two maps even, extra operations might be needed (i.e. cleaning up a malformed body sequence, recovery from version mismatch between body and header, etc.)

Risk Level: Low. Introduces an optional plugin and not related to any internal, not used by any component.

Testing: Two of the cache modes have their tests. Instead of running a real Hazelcast instance before tests, TestableLocalCache is used which stores data locally and mimics a real instance. TestableRemoteCache on the other hand needs a running Hazelcast instance and tests the cache just like in deployments. Local one is used by default for the sake of CI tools.

Docs Changes: source/docs/filters/http/cache/hazelcast_cache_plugin.md

Release Notes: N/A

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

jmarantz · 2020-03-27T15:08:32Z

@toddmgreer also

ping when this is no longer in 'draft' mode, thank you! Great to see an application of this interface.

wrowe · 2020-03-27T18:37:15Z

The windows build failure is trivial, you must not inject -Wno-deprecated on windows, that's a syntax error to cl.exe. We already globally force that exception using "/wd" flags on Windows, anyways.

https://dev.azure.com/cncf/envoy/_build/results?buildId=36037&view=logs&j=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&t=168f295b-0553-5364-35f7-923225ecd8b3&l=168

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

jmarantz · 2020-03-30T16:50:49Z

/wait

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

enozcan · 2020-04-07T17:23:40Z

bazel/repository_locations.bzl

+        sha256 = "3c43c81135e415ce708486564dc125bde93c2c9f8965d5af4b603ec91ff52f6e",
+        strip_prefix = "hazelcast-cpp-client-3.12.1",
+        # Using non official tarball due to missing submodule files in the official release.
+        # TODO(enozcan): Use official release with init & updating submodules


Is it possible to update submodules before building the external dependency?

enozcan · 2020-04-07T17:25:45Z

source/docs/filters/http/cache/hazelcast_cache_plugin.md

@@ -0,0 +1,186 @@
+### Hazelcast Http Cache Plugin


I see the docs here for the filter are kind of for developers. This doc targets the users, not developers. Is that the right place?

enozcan · 2020-04-07T17:28:56Z

source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_context.cc

+    raw_key.add_custom_fields(std::move(header.second));
+  }
+
+  // TODO(enozcan): Ensure the generation of the same key for the same response independent


I could not decide if parsing all vary headers for a response and then sorting them would be a right approach.

source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache_impl.cc

enozcan · 2020-04-07T17:42:18Z

I left a few comments at some points in the code where I could not conclude how they should be.
Also, the coverage build fails most likely due to the local cache used for the tests. I'm not sure because I could not manage to run the build locally. I will try to figure out this in the meantime.
Other than the coverage issue, I think it's ready for review. cc @jmarantz

stale · 2020-04-14T21:03:39Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

enozcan · 2020-05-30T10:57:34Z

The only failing check is the Windows one and fixing it will not affect the code but BUILD file. Could you please take a look @toddmgreer?

toddmgreer · 2020-06-01T20:55:28Z

Happy to--I'll dig in immediately.

…

On Sat, May 30, 2020 at 3:57 AM Enes Ozcan ***@***.***> wrote: The only failing check is the Windows one and fixing it will not affect the code but BUILD file. Could you please take a look @toddmgreer <https://github.com/toddmgreer>? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10536 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFRAWPMAROR75VZBPA6NPALRUDRCVANCNFSM4LUJVUAQ> .

source/docs/filters/http/cache/hazelcast_cache_plugin.md

source/extensions/filters/http/cache/hazelcast_http_cache/config.proto

source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache.h

source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache.cc

toddmgreer · 2020-06-02T00:48:32Z

source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache.cc

+  if (!cache_) {
+    HazelcastHttpCacheConfig hz_cache_config;
+    MessageUtil::unpackTo(config.typed_config(), hz_cache_config);
+    cache_ = std::make_unique<HazelcastHttpCache>(hz_cache_config);


This isn't threadsafe. Envoy creates exactly one HazelcastHttpCacheFactory, but getCache is called multiple times, so more than one thread may try to assign to cache_ at the same time. Making cache_ a thread-local static might be the simplest efficient fix.

tbh I could not manage to make it thread safe with static thread_local unique ptr. However I'd like to hear your opinion on creating the cache_ via call_once or double-checked locking.

First, two bits of background, because I have a hard time keeping some of this straight:

There is one HazelcastHttpCacheFactory, created and owned by Registry::RegisterFactory<HazelcastHttpCacheFactory, HttpCacheFactory>.

For every new connection, HazelcastHttpCacheFactory::getCache will be called on whichever worker thread is serving that connection, so getCache must expect concurrent calls.

HazelcastHttpCacheFactory::getCache returns pointers to the same HazelcastHttpCache. Is that the intent? If so, then HazelcastHttpCache must be threadsafe (and so documented), and I suggest creating cache_ in HazelcastHttpCacheFactory's constructor.

If you want to have one HazelcastHttpCache per worker thread, then a static thread_local unique_ptr would be a likely choice, and would make getCache threadsafe.

If you want to have one HazelcastHttpCache per connection, then getCache should create and return one, except the getCache interface makes lifetime difficult. If that's the intent, we'll need to fix (mea culpa) the interface to better support it.

HazelcastHttpCacheFactory::getCache returns pointers to the same HazelcastHttpCache. Is that the intent?

Yes indeed. It uses thread safe APIs of Hazelcast. Other than the connect and shutdown methods, the cache itself is actually thread safe. I will modify both accordingly and document about it. Also I think it's better to use only one client (and so cache) for now and make them per-worker or per-connection or whatever later on in case of performance issues.

and I suggest creating cache_ in HazelcastHttpCacheFactory's constructor.

But in that case it would not be able to connect until the first getCache call as it needs typed_config from CacheConfig. So a similar logic to ptr check would be required.

Different calls to getCache can have different CacheConfigs, due to configuration changing over time, or due to different listeners instantiating the CacheFilter with different configurations. Does Hazelcast need global and/or static config?

Well, I afraid I did not take this dynamic configuration change into consideration..
HazelcastClient must be configured before it starts. It takes a few seconds to be started (and terminated) and hence it's not applicable to do that for each getCache call. A possible solution might be using only one HazelcastClient configured with the initial config values. So getCache will simply ignore the client connection configurations but will create a new cache according to the typed config with the existing HazelcastClient (created with the very first getCache call.). This could be somehow acceptable. Because once a client connects, it can keep its connection alive regardless of the changes happened afterwards. However, creating a new cache per call sounds really poor to me. In that case I should probably get rid of start, shutdown, etc. stuff as much as possible and minimize the cost of cache initialization.

@jmarantz, can you advise on how an extension should get static config?

@toddmgreer, I've been thinking on this and decided to configure Hazelcast statically. However, I could not come up with a solution for other configurations (i.e. HazelcastHttpCacheConfig::unified/body_partition_size/app_prefix).

I tried to create a single cache and expose these configurations to lookup and insertion contexts. That would solve the dynamic configuration issue for the cache since contexts can be arranged to operate with these values. But passing them to contexts seems not possible using a single cache for all threads.

I also considered to create a cache per getCache call with the passed CacheConfig's non-static configurations (unified, body_partition_size and app_prefix). But since we're returning a reference to the cache here, I got stuck again.

Let me ask this way: How the implementation would be like if SimpleHttpCache needed a config value from typed_config during lookup and/or insertion? If caches are supposed to be independent from this type of configuration, what's the point of typed_config then?

@toddmgreer, a kindly reminder as I will change the impl. according to your answer.

@enozcan, I apologize for not responding to your June 17th comment--I somehow didn't see it until just now.

While some types of config (such as those that HazelCast requires at startup) are naturally static, there are others that could be different for each request. Suppose SimpleHttpCache had a configuration option that specified a list of Content-Types not to insert into cache, for the purpose of not wasting cache space on content-types known (for that deployment) to get poor hit rates. SimpleHttpCacheFactory could create a request-specific HttpCache implementation that rejects those insertions, and otherwise inserts into a global hash map. Requests that hit differently configured filters might have different Content-Type blocklists, and that's fine.

I don't know the best way to handle global config. I intend to find out, as it's probably needed by all cache implementations (e.g. global limits for SimpleHttpCacheFactory, where to send RPCs for anything distributed). Unfortunately, there's some other work I have to get done first. If you're able to figure it out first, I'll try to be more prompt with discussions and PR reviews.

enozcan · 2020-06-05T09:26:16Z

Thanks @toddmgreer. Before addressing your reviews, I want to ask about a few points.

When I give it a try (both Hazelcast and PoC cache) I see that response is cached and served from the cache as expected. But after each cache hit, the cached response is updated without hitting the backend service. Is that update - happening after each hit, the expected behavior of the filter or happening due to being not implemented completely at (as I see another age header added after each cache hit):

envoy/source/extensions/filters/http/cache/cache_filter.cc

Line 118 in 8ad0884

result.headers_->addReferenceKey(Http::Headers::get().Age, 0);

Also per #10019 (comment), I could not find a way of shutting down the cluster connection in the cache destructor without an Envoy crash. That is, the cache causes fault when tearing down with the current destruction logic. Do you have any suggestion to overcome this or is that possible to bring onDestroy calls back?

toddmgreer · 2020-06-05T20:06:28Z

When we serve a response from cache without hitting the backend, the response served to the client should have exactly one Age header, and the response stored in cache should not change. If you're seeing otherwise, that's a CacheFilter bug that I wasn't aware of--please submit an issue. When exactly do you want to shut down the cluster connection? Are they per-process, per-worker-thread, per-filter-instance, or something else? Thank you, Todd

…

On Fri, Jun 5, 2020 at 2:26 AM Enes Ozcan ***@***.***> wrote: Thanks @toddmgreer <https://github.com/toddmgreer>. Before addressing your reviews, I want to ask about a few points. When I give it a try (both Hazelcast and PoC cache) I see that response is cached and served from the cache as expected. But after each cache hit, the cached response is updated without hitting the backend service. Is that update - happening after each hit, the expected behavior of the filter or happening due to being not implemented completely at *(as I see another age header added after each cache hit)*: https://github.com/envoyproxy/envoy/blob/8ad0884d0ecc4c5e6e0671b7727b1e5abad47653/source/extensions/filters/http/cache/cache_filter.cc#L118 Also per #10019 (comment) <#10019 (comment)>, I could not find a way of shutting down the cluster connection in the cache destructor without an Envoy crash. That is, the cache causes fault when tearing down with the current destruction logic. Do you have any suggestion to overcome this or is that possible to bring onDestroy calls back? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10536 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFRAWPODPCNJ734Z3EZKCOLRVC24RANCNFSM4LUJVUAQ> .

enozcan · 2020-06-05T23:04:12Z

I will double check and then create an issue with a reproducer for the age header case.

I thought HttpCache also had onDestroy method prior to the changes and that's why I asked about it. Now that I see they were only for contexts, please ignore the question as it's not related to onDestroy calls.

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

stale · 2020-08-22T19:26:16Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

stale · 2020-08-30T23:32:59Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

stale · 2020-09-07T01:40:21Z

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Add HazelcastHttpCache

3ff732d

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

mattklein123 assigned jmarantz Mar 26, 2020

mstrYoda mentioned this pull request Mar 27, 2020

add request middleware or something similar istio/istio#22541

Closed

Secure body key uniqueness and set defaults.

c39db10

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Use Envoy's random for versioning.

f359a8e

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

repokitteh-read-only bot added the waiting label Mar 30, 2020

Handle all connection failures. Use parameterized test for common ones.

d6f2568

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

repokitteh-read-only bot removed the waiting label Apr 1, 2020

enozcan added 3 commits April 3, 2020 16:19

Add local cache to use during tests.

8630701

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Add plugin doc and fix spell check.

2232761

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Fix code format & clean up.

8ca71fc

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

enozcan commented Apr 7, 2020

View reviewed changes

source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache_impl.cc Outdated Show resolved Hide resolved

enozcan marked this pull request as ready for review April 7, 2020 17:42

enozcan requested a review from jmarantz as a code owner April 7, 2020 17:42

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 14, 2020

Fix clang_tidy errors.

a3a1af4

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 15, 2020

enozcan requested review from alyssawilk, dio, htuch, junr03 and lizan as code owners April 17, 2020 17:07

enozcan added 2 commits May 29, 2020 20:01

Cover remote calls over offline client.

bb9ce1e

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

tsan fix

ce6651b

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

toddmgreer suggested changes Jun 2, 2020

View reviewed changes

enozcan added 12 commits June 10, 2020 11:11

Address reviews.

8e3129a

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Use max_body_bytes from CacheConfig.

570e362

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Use core.SocketAddress for member address.

66a1f01

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Fix format.

0acc4d1

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Fix Windows lib name.

d3b0cf5

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Merge branch 'master' into hazelcast_http_cache

d40185f

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Fix format.

eb50942

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Remove accessor cast and move logs to accessor.

7413c54

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Pass accessor via cache::start

a1167a1

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Revert back to unit64 max body bytes.

639bfbb

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Add tests.

ab2f3fd

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

Add Windows build defines.

49ca85b

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>

enozcan mentioned this pull request Jul 24, 2020

cache: fix cache proto's max_body_bytes type #12273

Closed

jmarantz removed the no stalebot Disables stalebot from closing an issue label Aug 15, 2020

stale bot added stale stalebot believes this issue/PR has not been touched recently and removed stale stalebot believes this issue/PR has not been touched recently labels Aug 22, 2020

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 30, 2020

stale bot closed this Sep 7, 2020

enozcan mentioned this pull request Nov 23, 2023

Can you provide example envoy config to use this filter after merging this PR and building? enozcan/hazelcast-envoy-http-cache#1

Open

Conversation

enozcan commented Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarantz commented Mar 27, 2020

Uh oh!

wrowe commented Mar 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarantz commented Mar 30, 2020

Uh oh!

enozcan Apr 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enozcan Apr 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

enozcan commented Apr 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stale bot commented Apr 14, 2020

Uh oh!

enozcan commented May 30, 2020

Uh oh!

toddmgreer commented Jun 1, 2020 via email

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddmgreer Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enozcan Jun 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enozcan Jun 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enozcan commented Jun 5, 2020

Uh oh!

toddmgreer commented Jun 5, 2020 via email

Uh oh!

enozcan commented Jun 5, 2020

Uh oh!

stale bot commented Aug 22, 2020

Uh oh!

stale bot commented Aug 30, 2020

Uh oh!

stale bot commented Sep 7, 2020

Uh oh!

Reviewers

Assignees

enozcan commented Mar 26, 2020 •

edited

Loading

wrowe commented Mar 27, 2020 •

edited

Loading

enozcan Apr 7, 2020 •

edited

Loading

enozcan Apr 7, 2020 •

edited

Loading

enozcan commented Apr 7, 2020 •

edited

Loading

toddmgreer Jun 12, 2020 •

edited

Loading

enozcan Jun 13, 2020 •

edited

Loading

enozcan Jun 17, 2020 •

edited

Loading