Skip to content

filter: add HazelcastHttpCache#10536

Closed
enozcan wants to merge 35 commits intoenvoyproxy:masterfrom
enozcan:hazelcast_http_cache
Closed

filter: add HazelcastHttpCache#10536
enozcan wants to merge 35 commits intoenvoyproxy:masterfrom
enozcan:hazelcast_http_cache

Conversation

@enozcan
Copy link

@enozcan enozcan commented Mar 26, 2020

Description: Implements HttpCache interface introduced by cache filter #7198, using Hazelcast IMDG.
The plugin connects to a Hazelcast member (started as a sidecar, or a member in the same cluster, etc.) using Hazelcast cpp client and stores cached responses in a distributed map. Multiple cache filters from different apps can use the same cache as long as they use the same app prefix in the config file.

The plugin offers two different modes for Http cache:

  • Unified cache mode

A cached Http response is stored as a single entry in the Hazelcast map. On a range Http request, regardless of the requested range, the whole response body is fetched from the map and then only the desired bytes are served along with the headers and trailers (if any). This mode is handy where response body sizes are reasonably large, or range requests are not frequent, or they are not allowed at all.

  • Divided cache mode

Two separate maps are used to store a single response. In one of them, response headers, body size, and trailers (if supported by the filter) are stored. In the other one, the corresponding response body is stored in multiple entries each of which has a certain size configured via partition size. These entries store the response in a contiguous manner regardless of the size of insertBody calls made by the filter. That is, for a response of size 5 MB, if partition size is 2 MB then the maps will look like:

<key(long)> --> {headers, trailers, body size} 

<key(string) + "0"> --> body{0-2 MB}
<key(string) + "1"> --> body{2-4 MB}
<key(string) + "2"> --> body{4-5 MB}

On a range request, not the whole body for a response but only the necessary partitions are fetched from the cache. This option helps to serve range requests faster and in a stream-like fashion but comes with a cost. Every body entry has its own fixed memory cost and partitions entries will occupy memory larger than the actual body size. Also, to keep these two maps even, extra operations might be needed (i.e. cleaning up a malformed body sequence, recovery from version mismatch between body and header, etc.)


Risk Level: Low. Introduces an optional plugin and not related to any internal, not used by any component.

Testing: Two of the cache modes have their tests. Instead of running a real Hazelcast instance before tests, TestableLocalCache is used which stores data locally and mimics a real instance. TestableRemoteCache on the other hand needs a running Hazelcast instance and tests the cache just like in deployments. Local one is used by default for the sake of CI tools.

Docs Changes: source/docs/filters/http/cache/hazelcast_cache_plugin.md

Release Notes: N/A

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
@jmarantz
Copy link
Contributor

@toddmgreer also

ping when this is no longer in 'draft' mode, thank you! Great to see an application of this interface.

@wrowe
Copy link
Contributor

wrowe commented Mar 27, 2020

The windows build failure is trivial, you must not inject -Wno-deprecated on windows, that's a syntax error to cl.exe. We already globally force that exception using "/wd" flags on Windows, anyways.

https://dev.azure.com/cncf/envoy/_build/results?buildId=36037&view=logs&j=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&t=168f295b-0553-5364-35f7-923225ecd8b3&l=168

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
@jmarantz
Copy link
Contributor

/wait

Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
enozcan added 3 commits April 3, 2020 16:19
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
sha256 = "3c43c81135e415ce708486564dc125bde93c2c9f8965d5af4b603ec91ff52f6e",
strip_prefix = "hazelcast-cpp-client-3.12.1",
# Using non official tarball due to missing submodule files in the official release.
# TODO(enozcan): Use official release with init & updating submodules
Copy link
Author

@enozcan enozcan Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to update submodules before building the external dependency?

@@ -0,0 +1,186 @@
### Hazelcast Http Cache Plugin
Copy link
Author

@enozcan enozcan Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the docs here for the filter are kind of for developers. This doc targets the users, not developers. Is that the right place?

raw_key.add_custom_fields(std::move(header.second));
}

// TODO(enozcan): Ensure the generation of the same key for the same response independent
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not decide if parsing all vary headers for a response and then sorting them would be a right approach.

@enozcan
Copy link
Author

enozcan commented Apr 7, 2020

I left a few comments at some points in the code where I could not conclude how they should be.
Also, the coverage build fails most likely due to the local cache used for the tests. I'm not sure because I could not manage to run the build locally. I will try to figure out this in the meantime.
Other than the coverage issue, I think it's ready for review. cc @jmarantz

@enozcan enozcan marked this pull request as ready for review April 7, 2020 17:42
@enozcan enozcan requested a review from jmarantz as a code owner April 7, 2020 17:42
@stale
Copy link

stale bot commented Apr 14, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 14, 2020
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 15, 2020
enozcan added 2 commits May 29, 2020 20:01
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
@enozcan
Copy link
Author

enozcan commented May 30, 2020

The only failing check is the Windows one and fixing it will not affect the code but BUILD file. Could you please take a look @toddmgreer?

@toddmgreer
Copy link
Contributor

toddmgreer commented Jun 1, 2020 via email

if (!cache_) {
HazelcastHttpCacheConfig hz_cache_config;
MessageUtil::unpackTo(config.typed_config(), hz_cache_config);
cache_ = std::make_unique<HazelcastHttpCache>(hz_cache_config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't threadsafe. Envoy creates exactly one HazelcastHttpCacheFactory, but getCache is called multiple times, so more than one thread may try to assign to cache_ at the same time. Making cache_ a thread-local static might be the simplest efficient fix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I could not manage to make it thread safe with static thread_local unique ptr. However I'd like to hear your opinion on creating the cache_ via call_once or double-checked locking.

Copy link
Contributor

@toddmgreer toddmgreer Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, two bits of background, because I have a hard time keeping some of this straight:

  1. There is one HazelcastHttpCacheFactory, created and owned by Registry::RegisterFactory<HazelcastHttpCacheFactory, HttpCacheFactory>.
  2. For every new connection, HazelcastHttpCacheFactory::getCache will be called on whichever worker thread is serving that connection, so getCache must expect concurrent calls.

HazelcastHttpCacheFactory::getCache returns pointers to the same HazelcastHttpCache. Is that the intent? If so, then HazelcastHttpCache must be threadsafe (and so documented), and I suggest creating cache_ in HazelcastHttpCacheFactory's constructor.

If you want to have one HazelcastHttpCache per worker thread, then a static thread_local unique_ptr would be a likely choice, and would make getCache threadsafe.

If you want to have one HazelcastHttpCache per connection, then getCache should create and return one, except the getCache interface makes lifetime difficult. If that's the intent, we'll need to fix (mea culpa) the interface to better support it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HazelcastHttpCacheFactory::getCache returns pointers to the same HazelcastHttpCache. Is that the intent?

Yes indeed. It uses thread safe APIs of Hazelcast. Other than the connect and shutdown methods, the cache itself is actually thread safe. I will modify both accordingly and document about it. Also I think it's better to use only one client (and so cache) for now and make them per-worker or per-connection or whatever later on in case of performance issues.

and I suggest creating cache_ in HazelcastHttpCacheFactory's constructor.

But in that case it would not be able to connect until the first getCache call as it needs typed_config from CacheConfig. So a similar logic to ptr check would be required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different calls to getCache can have different CacheConfigs, due to configuration changing over time, or due to different listeners instantiating the CacheFilter with different configurations. Does Hazelcast need global and/or static config?

Copy link
Author

@enozcan enozcan Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I afraid I did not take this dynamic configuration change into consideration..
HazelcastClient must be configured before it starts. It takes a few seconds to be started (and terminated) and hence it's not applicable to do that for each getCache call. A possible solution might be using only one HazelcastClient configured with the initial config values. So getCache will simply ignore the client connection configurations but will create a new cache according to the typed config with the existing HazelcastClient (created with the very first getCache call.). This could be somehow acceptable. Because once a client connects, it can keep its connection alive regardless of the changes happened afterwards. However, creating a new cache per call sounds really poor to me. In that case I should probably get rid of start, shutdown, etc. stuff as much as possible and minimize the cost of cache initialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz, can you advise on how an extension should get static config?

Copy link
Author

@enozcan enozcan Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toddmgreer, I've been thinking on this and decided to configure Hazelcast statically. However, I could not come up with a solution for other configurations (i.e. HazelcastHttpCacheConfig::unified/body_partition_size/app_prefix).

I tried to create a single cache and expose these configurations to lookup and insertion contexts. That would solve the dynamic configuration issue for the cache since contexts can be arranged to operate with these values. But passing them to contexts seems not possible using a single cache for all threads.

I also considered to create a cache per getCache call with the passed CacheConfig's non-static configurations (unified, body_partition_size and app_prefix). But since we're returning a reference to the cache here, I got stuck again.

Let me ask this way: How the implementation would be like if SimpleHttpCache needed a config value from typed_config during lookup and/or insertion? If caches are supposed to be independent from this type of configuration, what's the point of typed_config then?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toddmgreer, a kindly reminder as I will change the impl. according to your answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enozcan, I apologize for not responding to your June 17th comment--I somehow didn't see it until just now.

While some types of config (such as those that HazelCast requires at startup) are naturally static, there are others that could be different for each request. Suppose SimpleHttpCache had a configuration option that specified a list of Content-Types not to insert into cache, for the purpose of not wasting cache space on content-types known (for that deployment) to get poor hit rates. SimpleHttpCacheFactory could create a request-specific HttpCache implementation that rejects those insertions, and otherwise inserts into a global hash map. Requests that hit differently configured filters might have different Content-Type blocklists, and that's fine.

I don't know the best way to handle global config. I intend to find out, as it's probably needed by all cache implementations (e.g. global limits for SimpleHttpCacheFactory, where to send RPCs for anything distributed). Unfortunately, there's some other work I have to get done first. If you're able to figure it out first, I'll try to be more prompt with discussions and PR reviews.

@enozcan
Copy link
Author

enozcan commented Jun 5, 2020

Thanks @toddmgreer. Before addressing your reviews, I want to ask about a few points.

When I give it a try (both Hazelcast and PoC cache) I see that response is cached and served from the cache as expected. But after each cache hit, the cached response is updated without hitting the backend service. Is that update - happening after each hit, the expected behavior of the filter or happening due to being not implemented completely at (as I see another age header added after each cache hit):

result.headers_->addReferenceKey(Http::Headers::get().Age, 0);

Also per #10019 (comment), I could not find a way of shutting down the cluster connection in the cache destructor without an Envoy crash. That is, the cache causes fault when tearing down with the current destruction logic. Do you have any suggestion to overcome this or is that possible to bring onDestroy calls back?

@toddmgreer
Copy link
Contributor

toddmgreer commented Jun 5, 2020 via email

@enozcan
Copy link
Author

enozcan commented Jun 5, 2020

I will double check and then create an issue with a reproducer for the age header case.

I thought HttpCache also had onDestroy method prior to the changes and that's why I asked about it. Now that I see they were only for contexts, please ignore the question as it's not related to onDestroy calls.

enozcan added 12 commits June 10, 2020 11:11
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
Signed-off-by: Enes Ozcan <enes.ozcan@hazelcast.com>
@jmarantz jmarantz removed the no stalebot Disables stalebot from closing an issue label Aug 15, 2020
@stale
Copy link

stale bot commented Aug 22, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added stale stalebot believes this issue/PR has not been touched recently and removed stale stalebot believes this issue/PR has not been touched recently labels Aug 22, 2020
@stale
Copy link

stale bot commented Aug 30, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 30, 2020
@stale
Copy link

stale bot commented Sep 7, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale stalebot believes this issue/PR has not been touched recently

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants