filter: add HazelcastHttpCache#10536
Conversation
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
|
@toddmgreer also ping when this is no longer in 'draft' mode, thank you! Great to see an application of this interface. |
|
The windows build failure is trivial, you must not inject -Wno-deprecated on windows, that's a syntax error to cl.exe. We already globally force that exception using "/wd" flags on Windows, anyways. |
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
|
/wait |
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
| sha256 = "3c43c81135e415ce708486564dc125bde93c2c9f8965d5af4b603ec91ff52f6e", | ||
| strip_prefix = "hazelcast-cpp-client-3.12.1", | ||
| # Using non official tarball due to missing submodule files in the official release. | ||
| # TODO(enozcan): Use official release with init & updating submodules |
There was a problem hiding this comment.
Is it possible to update submodules before building the external dependency?
| @@ -0,0 +1,186 @@ | |||
| ### Hazelcast Http Cache Plugin | |||
There was a problem hiding this comment.
I see the docs here for the filter are kind of for developers. This doc targets the users, not developers. Is that the right place?
| raw_key.add_custom_fields(std::move(header.second)); | ||
| } | ||
|
|
||
| // TODO(enozcan): Ensure the generation of the same key for the same response independent |
There was a problem hiding this comment.
I could not decide if parsing all vary headers for a response and then sorting them would be a right approach.
source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache_impl.cc
Outdated
Show resolved
Hide resolved
|
I left a few comments at some points in the code where I could not conclude how they should be. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
|
The only failing check is the Windows one and fixing it will not affect the code but BUILD file. Could you please take a look @toddmgreer? |
|
Happy to--I'll dig in immediately.
…On Sat, May 30, 2020 at 3:57 AM Enes Ozcan ***@***.***> wrote:
The only failing check is the Windows one and fixing it will not affect
the code but BUILD file. Could you please take a look @toddmgreer
<https://github.com/toddmgreer>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10536 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFRAWPMAROR75VZBPA6NPALRUDRCVANCNFSM4LUJVUAQ>
.
|
source/extensions/filters/http/cache/hazelcast_http_cache/config.proto
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/cache/hazelcast_http_cache/config.proto
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache.h
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache.cc
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache.cc
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/cache/hazelcast_http_cache/hazelcast_http_cache.cc
Outdated
Show resolved
Hide resolved
| if (!cache_) { | ||
| HazelcastHttpCacheConfig hz_cache_config; | ||
| MessageUtil::unpackTo(config.typed_config(), hz_cache_config); | ||
| cache_ = std::make_unique<HazelcastHttpCache>(hz_cache_config); |
There was a problem hiding this comment.
This isn't threadsafe. Envoy creates exactly one HazelcastHttpCacheFactory, but getCache is called multiple times, so more than one thread may try to assign to cache_ at the same time. Making cache_ a thread-local static might be the simplest efficient fix.
There was a problem hiding this comment.
tbh I could not manage to make it thread safe with static thread_local unique ptr. However I'd like to hear your opinion on creating the cache_ via call_once or double-checked locking.
There was a problem hiding this comment.
First, two bits of background, because I have a hard time keeping some of this straight:
- There is one HazelcastHttpCacheFactory, created and owned by Registry::RegisterFactory<HazelcastHttpCacheFactory, HttpCacheFactory>.
- For every new connection, HazelcastHttpCacheFactory::getCache will be called on whichever worker thread is serving that connection, so getCache must expect concurrent calls.
HazelcastHttpCacheFactory::getCache returns pointers to the same HazelcastHttpCache. Is that the intent? If so, then HazelcastHttpCache must be threadsafe (and so documented), and I suggest creating cache_ in HazelcastHttpCacheFactory's constructor.
If you want to have one HazelcastHttpCache per worker thread, then a static thread_local unique_ptr would be a likely choice, and would make getCache threadsafe.
If you want to have one HazelcastHttpCache per connection, then getCache should create and return one, except the getCache interface makes lifetime difficult. If that's the intent, we'll need to fix (mea culpa) the interface to better support it.
There was a problem hiding this comment.
HazelcastHttpCacheFactory::getCache returns pointers to the same HazelcastHttpCache. Is that the intent?
Yes indeed. It uses thread safe APIs of Hazelcast. Other than the connect and shutdown methods, the cache itself is actually thread safe. I will modify both accordingly and document about it. Also I think it's better to use only one client (and so cache) for now and make them per-worker or per-connection or whatever later on in case of performance issues.
and I suggest creating cache_ in HazelcastHttpCacheFactory's constructor.
But in that case it would not be able to connect until the first getCache call as it needs typed_config from CacheConfig. So a similar logic to ptr check would be required.
There was a problem hiding this comment.
Different calls to getCache can have different CacheConfigs, due to configuration changing over time, or due to different listeners instantiating the CacheFilter with different configurations. Does Hazelcast need global and/or static config?
There was a problem hiding this comment.
Well, I afraid I did not take this dynamic configuration change into consideration..
HazelcastClient must be configured before it starts. It takes a few seconds to be started (and terminated) and hence it's not applicable to do that for each getCache call. A possible solution might be using only one HazelcastClient configured with the initial config values. So getCache will simply ignore the client connection configurations but will create a new cache according to the typed config with the existing HazelcastClient (created with the very first getCache call.). This could be somehow acceptable. Because once a client connects, it can keep its connection alive regardless of the changes happened afterwards. However, creating a new cache per call sounds really poor to me. In that case I should probably get rid of start, shutdown, etc. stuff as much as possible and minimize the cost of cache initialization.
There was a problem hiding this comment.
@jmarantz, can you advise on how an extension should get static config?
There was a problem hiding this comment.
@toddmgreer, I've been thinking on this and decided to configure Hazelcast statically. However, I could not come up with a solution for other configurations (i.e. HazelcastHttpCacheConfig::unified/body_partition_size/app_prefix).
I tried to create a single cache and expose these configurations to lookup and insertion contexts. That would solve the dynamic configuration issue for the cache since contexts can be arranged to operate with these values. But passing them to contexts seems not possible using a single cache for all threads.
I also considered to create a cache per getCache call with the passed CacheConfig's non-static configurations (unified, body_partition_size and app_prefix). But since we're returning a reference to the cache here, I got stuck again.
Let me ask this way: How the implementation would be like if SimpleHttpCache needed a config value from typed_config during lookup and/or insertion? If caches are supposed to be independent from this type of configuration, what's the point of typed_config then?
There was a problem hiding this comment.
@toddmgreer, a kindly reminder as I will change the impl. according to your answer.
There was a problem hiding this comment.
@enozcan, I apologize for not responding to your June 17th comment--I somehow didn't see it until just now.
While some types of config (such as those that HazelCast requires at startup) are naturally static, there are others that could be different for each request. Suppose SimpleHttpCache had a configuration option that specified a list of Content-Types not to insert into cache, for the purpose of not wasting cache space on content-types known (for that deployment) to get poor hit rates. SimpleHttpCacheFactory could create a request-specific HttpCache implementation that rejects those insertions, and otherwise inserts into a global hash map. Requests that hit differently configured filters might have different Content-Type blocklists, and that's fine.
I don't know the best way to handle global config. I intend to find out, as it's probably needed by all cache implementations (e.g. global limits for SimpleHttpCacheFactory, where to send RPCs for anything distributed). Unfortunately, there's some other work I have to get done first. If you're able to figure it out first, I'll try to be more prompt with discussions and PR reviews.
|
Thanks @toddmgreer. Before addressing your reviews, I want to ask about a few points. When I give it a try (both Hazelcast and PoC cache) I see that response is cached and served from the cache as expected. But after each cache hit, the cached response is updated without hitting the backend service. Is that update - happening after each hit, the expected behavior of the filter or happening due to being not implemented completely at (as I see another age header added after each cache hit): Also per #10019 (comment), I could not find a way of shutting down the cluster connection in the cache destructor without an Envoy crash. That is, the cache causes fault when tearing down with the current destruction logic. Do you have any suggestion to overcome this or is that possible to bring |
|
When we serve a response from cache without hitting the backend, the
response served to the client should have exactly one Age header, and the
response stored in cache should not change. If you're seeing otherwise,
that's a CacheFilter bug that I wasn't aware of--please submit an issue.
When exactly do you want to shut down the cluster connection? Are they
per-process, per-worker-thread, per-filter-instance, or something else?
Thank you,
Todd
…On Fri, Jun 5, 2020 at 2:26 AM Enes Ozcan ***@***.***> wrote:
Thanks @toddmgreer <https://github.com/toddmgreer>. Before addressing
your reviews, I want to ask about a few points.
When I give it a try (both Hazelcast and PoC cache) I see that response is
cached and served from the cache as expected. But after each cache hit, the
cached response is updated without hitting the backend service. Is that
update - happening after each hit, the expected behavior of the filter or
happening due to being not implemented completely at *(as I see another
age header added after each cache hit)*:
https://github.com/envoyproxy/envoy/blob/8ad0884d0ecc4c5e6e0671b7727b1e5abad47653/source/extensions/filters/http/cache/cache_filter.cc#L118
Also per #10019 (comment)
<#10019 (comment)>, I
could not find a way of shutting down the cluster connection in the cache
destructor without an Envoy crash. That is, the cache causes fault when
tearing down with the current destruction logic. Do you have any suggestion
to overcome this or is that possible to bring onDestroy calls back?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10536 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFRAWPODPCNJ734Z3EZKCOLRVC24RANCNFSM4LUJVUAQ>
.
|
|
I will double check and then create an issue with a reproducer for the age header case. I thought HttpCache also had |
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Description: Implements HttpCache interface introduced by cache filter #7198, using Hazelcast IMDG.
The plugin connects to a Hazelcast member (started as a sidecar, or a member in the same cluster, etc.) using Hazelcast cpp client and stores cached responses in a distributed map. Multiple cache filters from different apps can use the same cache as long as they use the same app prefix in the config file.
The plugin offers two different modes for Http cache:
A cached Http response is stored as a single entry in the Hazelcast map. On a range Http request, regardless of the requested range, the whole response body is fetched from the map and then only the desired bytes are served along with the headers and trailers (if any). This mode is handy where response body sizes are reasonably large, or range requests are not frequent, or they are not allowed at all.
Two separate maps are used to store a single response. In one of them, response headers, body size, and trailers (if supported by the filter) are stored. In the other one, the corresponding response body is stored in multiple entries each of which has a certain size configured via
partition size. These entries store the response in a contiguous manner regardless of the size ofinsertBodycalls made by the filter. That is, for a response of size 5 MB, if partition size is 2 MB then the maps will look like:On a range request, not the whole body for a response but only the necessary partitions are fetched from the cache. This option helps to serve range requests faster and in a stream-like fashion but comes with a cost. Every body entry has its own fixed memory cost and partitions entries will occupy memory larger than the actual body size. Also, to keep these two maps even, extra operations might be needed (i.e. cleaning up a malformed body sequence, recovery from version mismatch between body and header, etc.)
Risk Level: Low. Introduces an optional plugin and not related to any internal, not used by any component.
Testing: Two of the cache modes have their tests. Instead of running a real Hazelcast instance before tests, TestableLocalCache is used which stores data locally and mimics a real instance. TestableRemoteCache on the other hand needs a running Hazelcast instance and tests the cache just like in deployments. Local one is used by default for the sake of CI tools.
Docs Changes: source/docs/filters/http/cache/hazelcast_cache_plugin.md
Release Notes: N/A