Replace caffeine cache with reference to full map #2214

rosalind210 · 2021-06-24T16:36:23Z

Originally had the CaffeineCache set up in the RequestResource by user and call parameters, but the return values couldn't be cached across all users because of differing authorization and settings. We ended up still experiencing 504s when SingularityClient's getRequests got hit hard again. We decided to split up the cache to cover the heaviest ZK calls in the getRequest method: RequestManager's fetchRequests and DeployManager's fetchDeployStatesByRequestIds, which don't have caching around client calls, but then there was a performance degradation in the getRequest call because of the cache expiry.

Now we are replacing caffeine cache altogether with an AtomicReference to the full map of requests and deploys that are updated every ~~second~~ five seconds by a different thread. The update to five seconds came after overloading the CuratorFramework with every second calls for request and deploy data.

cc: @jschlather

jschlather · 2021-06-24T17:00:25Z

SingularityService/src/main/java/com/hubspot/singularity/data/ManagerCache.java

+    return values;
+  }
+
+  public void put(K key, V value) {


I think we want to use a loader method. If we do it this way, then we won't get any debouncing from the cache.

Both caches are now LoadingCaches.

pschoenfelder · 2021-06-24T21:22:54Z

SingularityService/src/main/java/com/hubspot/singularity/data/ManagerCache.java

+      LOG.trace("Grabbed mapped values for {} from cache", keys);
+    }
+
+    return values.isEmpty() ? null : values;


I think it's more idiomatic to return an empty map instead of null

pschoenfelder · 2021-06-24T21:26:20Z

SingularityService/src/main/java/com/hubspot/singularity/data/DeployManager.java

+              Collections.singletonList(getRequestDeployStatePath(requestId)),
+              requestDeployStateTranscoder
+            )
+            .get(0)


What happens if this throws an IndexOutOfBoundsException?

I added a check of that returned list before grabbing the first.

pschoenfelder · 2021-06-25T15:31:00Z

SingularityService/src/main/java/com/hubspot/singularity/config/SingularityConfiguration.java

  }

-  public void setCaffeineCacheTtl(int caffeineCacheTtl) {
-    this.caffeineCacheTtl = caffeineCacheTtl;
+  public void setDeployCaffeineCacheTtl(int deployCaffeineCacheTtl) {


Nit — can we remove caffeine from the config terminology here? Users don't need to be aware of cache choice/impl, and we could change caches in the future if we wanted to without any awkwardness

I've also updated the useCaffeineCache to useZKFastCache

pschoenfelder · 2021-06-25T15:36:46Z

SingularityService/src/main/java/com/hubspot/singularity/config/SingularityConfiguration.java

+  }
+
+  public int getRequestCaffeineCacheTtl() {
+    return requestCaffeineCacheTtl;


One last nit — can we append the time unit to this so users don't have to search code for it?

jschlather · 2021-06-28T16:00:47Z

LGTM

ssalinas · 2021-06-29T14:28:19Z

SingularityService/src/main/java/com/hubspot/singularity/data/ManagerCache.java

+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ManagerCache<K, V> {


Check out the ZkCache class. It's basically the same thing as this and might be better suited (at least for the deployManager part of things)

The ZkCache just uses a Guava cache and we were hoping to get benefits from the LoadingCache created by CaffeineCache for debouncing requests because we were also getting timeouts at the DeployManager's request, do you still think that I should update the DeployManager's usage?

Ah ok, thought the Zk one was more similar. Can keep the separate one then

ssalinas · 2021-06-29T14:31:05Z

SingularityService/src/main/java/com/hubspot/singularity/data/RequestManager.java

+    if (requestsCache.isEnabled()) {
+      List<SingularityRequestWithState> requests = requestsCache.get("all");
+      if (requests != null) {
+        return requests;
+      }
+    }


I wonder if we should just not use the web cache here if this is the case. Seems overkill to have the leaderCache -> web cache -> zk cache all here. Lots of duplicate memory usage to store it all 3 times

As discussed offline, to have ease in revert, the removal of web cache will be in a separate PR

ssalinas · 2021-07-28T14:41:19Z

SingularityService/src/main/java/com/hubspot/singularity/data/DeployManager.java

+        configuration.useApiCacheInDeployManager(),
+        configuration.getDeployCacheTtl(),
+        this::fetchAllDeployStates,
+        executorServiceFactory.get("deploy-api-cache-reloader")


I think there is a version of this that returns a single threaded executor, which is all we should need for this case

We don't have an option for a single threaded ScheduledExecutorService in SingularityManagedScheduledExecutorServiceFactory or SingularityManagedThreadPoolFactory but Executors has newSingleThreadScheduledExecutor that I can add.

ssalinas · 2021-07-28T19:27:47Z

SingularityService/src/main/java/com/hubspot/singularity/data/RequestManager.java

@@ -90,6 +93,7 @@
  );

  private final Map<Class<? extends SingularityExpiringRequestActionParent<? extends SingularityExpiringRequestParent>>, Transcoder<? extends SingularityExpiringRequestActionParent<? extends SingularityExpiringRequestParent>>> expiringTranscoderMap;
+  private final ApiCache<String, SingularityRequestWithState> requestsCache;


shoudl the getRequests(List<String> requestIds) and the singular getRequest(String requestId) also use the cache? Doesn't look like they are at the moment

I didn't update elsewhere yet because I wasn't sure if that's what we wanted to do since we were most concerned by the endpoint to get all requests

since it all pulls form the same place, and we are constnatly updating everything, I think it'd be worth it to update. I believe that the individual request endpoint was also pretty high up on the usage. Can always be a follow up PR if we want to check how effective this is first too

I added the cache to the getRequests(List requestIds) and the singular getRequest(String requestId), and the singular request has a non-cache flag now for Orion usage

ssalinas · 2021-07-28T19:28:56Z

SingularityService/src/main/java/com/hubspot/singularity/data/ApiCache.java

+    this.executor = executor;
+  }
+
+  public void startReloader() {


should this also synchronously perform the first fetch I wonder? Would avoid the case where getAll was incorrectly empty or get(K key) was null

rosalind210 · 2021-07-30T18:23:18Z

SingularityService/src/main/java/com/hubspot/singularity/data/RequestManager.java

+    if (requestsCache.isEnabled() && !skipApiCache) {
+      SingularityRequestWithState request = requestsCache.get(requestId);
+      if (request != null) {
+        return Optional.of(request);


I didn't make this Optional.ofNullable(...) in case it was a false null, aka the request came in between the 5 second reload, but this means that we will go to ZK for true nulls (the request was removed).

ssalinas · 2021-08-03T18:49:16Z

SingularityService/src/main/java/com/hubspot/singularity/data/RequestManager.java

@@ -544,6 +567,15 @@ public SingularityDeleteResult markDeleted(
      }
    }

+    if (requestsCache.isEnabled()) {
+      List<SingularityRequestWithState> requests = new ArrayList<>(
+        (requestsCache.getAll()).values()


nit, extra parens here?

ssalinas

One overall comment for zk efficiency. Do we want to have the reloader short circuit if the current instance is the leader (i.e. will have the leader cache instead)? I don't think there are any cases where we would hit this over the leader cache, so would likely want to limit the calls flooding the leader CuratorFramework

ssalinas · 2021-08-04T20:00:01Z

...src/main/java/com/hubspot/singularity/SingularityManagedScheduledExecutorServiceFactory.java

+    return getSingleThreaded(name, false);
+  }
+
+  public synchronized ScheduledExecutorService getSingleThreaded(


looks like this method is unused?

I was following the pattern in the SingularityManagedThreadPoolFactory for getSingleThreaded but I'll update to not overload that method

🤦 just realzied the one above calls it. Nevermind didn't even see that

ssalinas · 2021-08-04T20:01:52Z

SingularityClient/src/main/java/com/hubspot/singularity/client/SingularityClient.java

+  @Deprecated
+  public void setSkipApiCache(boolean skipApiCache) {
+    this.skipApiCache = skipApiCache;
+  }


for structure/usability of the client. Is there a reason you have this at the class level instead of just being an arg on the relevant method(s)?

I looked through the request call usages in Orion and talked to Suruu and we decided it would be easier on Deploy's side to have a class level argument rather than updating all usages. It also makes clean up easier for when we've solved the underlying CuratorFramework issue

ssalinas · 2021-08-04T20:03:31Z

SingularityService/src/main/java/com/hubspot/singularity/data/ApiCache.java

+    if (allValues.isEmpty()) {
+      LOG.debug("ApiCache getAll returned empty");
+    } else {
+      LOG.debug("getAll returned {} values", allValues.size());


for this and the debug statement above, maybe these are more like TRACE level lines? Would get pretty noisy given that we could acall these multiple times a second

I made all of the get calls' logs trace level.

ssalinas · 2021-08-06T14:11:20Z

🚢

Rosie Ellis added 2 commits June 24, 2021 12:27

Split up caffeine cache

3b66cc1

Fix unit tests

72348b6

jschlather reviewed Jun 24, 2021

View reviewed changes

Rosie Ellis added 4 commits June 24, 2021 16:51

Update cache to LoadingCache

bc29c9c

Use isEnabled

652d42a

Clean up test module

df17bbe

Clean up more imports

8cb4138

pschoenfelder reviewed Jun 24, 2021

View reviewed changes

Update ttls and deploy states call

a020b92

pschoenfelder reviewed Jun 25, 2021

View reviewed changes

Rosie Ellis added 3 commits June 25, 2021 12:14

Rename config values

7a77ce2

Missed a value

80bb901

update logging

1e95103

ssalinas reviewed Jun 29, 2021

View reviewed changes

Rosie Ellis added 12 commits June 29, 2021 15:51

rename to ApiCache

7b1ca90

Update deploy api cache load

e8af8b1

Split up cache enabling config values

ee556cd

Add debug for cache loading

50e1dee

Use a set as key

69ebebc

Use cache loader for async reloading

f7174d3

copypasta error config value

488b35c

Use AtomiceReference map instead of LoadingCache

dd24913

Remove caffeine dep

5947b22

update zk call

6a0732b

another log line and return typo

9373f5e

More debug lines

1517b59

ssalinas reviewed Jul 28, 2021

View reviewed changes

Rosie Ellis added 2 commits July 28, 2021 14:41

Single threaded executor, lifecycle manage reloading

642ebe1

Add managers to test Lifecycle

c23fcaf

ssalinas reviewed Jul 28, 2021

View reviewed changes

Rosie Ellis added 8 commits July 28, 2021 15:43

Load map ref when started

76aaa10

use request cache for other calls

14a9ab9

Update caching to 5 seconds

9e7555b

Add skipCache flag

efcd68b

Add flag to deploy manager side

5b97f8b

make skipApiCache a setter and deprecate

254d6e4

More debug and fallback to ZK

12d6162

Finesse logging

862c22e

rosalind210 commented Jul 30, 2021

View reviewed changes

ssalinas reviewed Aug 3, 2021

View reviewed changes

Rosie Ellis added 6 commits August 3, 2021 15:09

ApiCache disabled for leader & missed params

3514f4b

Try implement leader listener

9cafb6c

Logging around starting and stopping reloader on leader update

2223fdb

give ApiCache the leaderlatch instead

9cdd205

debug logs for leader clean up

73875d6

More debug logs

6f2849b

ssalinas reviewed Aug 4, 2021

View reviewed changes

Make all get* logs trace

e7bc9d9

rosalind210 merged commit 916716d into master Aug 6, 2021

rosalind210 deleted the break_up_caffeine_caching branch August 6, 2021 14:19

ssalinas added this to the 1.5.0 milestone May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace caffeine cache with reference to full map #2214

Replace caffeine cache with reference to full map #2214

rosalind210 commented Jun 24, 2021 •

edited

Loading

jschlather Jun 24, 2021

rosalind210 Jun 25, 2021

pschoenfelder Jun 24, 2021

pschoenfelder Jun 24, 2021

rosalind210 Jun 25, 2021

pschoenfelder Jun 25, 2021 •

edited

Loading

rosalind210 Jun 25, 2021

pschoenfelder Jun 25, 2021

jschlather commented Jun 28, 2021

ssalinas Jun 29, 2021

rosalind210 Jun 29, 2021

ssalinas Jun 29, 2021

ssalinas Jun 29, 2021

rosalind210 Jun 29, 2021

ssalinas Jul 28, 2021

rosalind210 Jul 28, 2021 •

edited

Loading

ssalinas Jul 28, 2021

rosalind210 Jul 28, 2021

ssalinas Jul 29, 2021

rosalind210 Jul 30, 2021 •

edited

Loading

ssalinas Jul 28, 2021

rosalind210 Jul 30, 2021

ssalinas Aug 3, 2021

ssalinas left a comment

ssalinas Aug 4, 2021

rosalind210 Aug 4, 2021

ssalinas Aug 4, 2021

ssalinas Aug 4, 2021

rosalind210 Aug 4, 2021

ssalinas Aug 4, 2021

rosalind210 Aug 4, 2021

ssalinas commented Aug 6, 2021

Replace caffeine cache with reference to full map #2214

Replace caffeine cache with reference to full map #2214

Conversation

rosalind210 commented Jun 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pschoenfelder Jun 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jschlather commented Jun 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rosalind210 Jul 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rosalind210 Jul 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssalinas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssalinas commented Aug 6, 2021

rosalind210 commented Jun 24, 2021 •

edited

Loading

pschoenfelder Jun 25, 2021 •

edited

Loading

rosalind210 Jul 28, 2021 •

edited

Loading

rosalind210 Jul 30, 2021 •

edited

Loading