-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace caffeine cache with reference to full map #2214
Conversation
return values; | ||
} | ||
|
||
public void put(K key, V value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to use a loader method. If we do it this way, then we won't get any debouncing from the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both caches are now LoadingCaches.
LOG.trace("Grabbed mapped values for {} from cache", keys); | ||
} | ||
|
||
return values.isEmpty() ? null : values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more idiomatic to return an empty map instead of null
Collections.singletonList(getRequestDeployStatePath(requestId)), | ||
requestDeployStateTranscoder | ||
) | ||
.get(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this throws an IndexOutOfBoundsException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check of that returned list before grabbing the first.
} | ||
|
||
public void setCaffeineCacheTtl(int caffeineCacheTtl) { | ||
this.caffeineCacheTtl = caffeineCacheTtl; | ||
public void setDeployCaffeineCacheTtl(int deployCaffeineCacheTtl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit — can we remove caffeine
from the config terminology here? Users don't need to be aware of cache choice/impl, and we could change caches in the future if we wanted to without any awkwardness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also updated the useCaffeineCache
to useZKFastCache
} | ||
|
||
public int getRequestCaffeineCacheTtl() { | ||
return requestCaffeineCacheTtl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last nit — can we append the time unit to this so users don't have to search code for it?
LGTM |
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class ManagerCache<K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the ZkCache class. It's basically the same thing as this and might be better suited (at least for the deployManager part of things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZkCache just uses a Guava cache and we were hoping to get benefits from the LoadingCache created by CaffeineCache for debouncing requests because we were also getting timeouts at the DeployManager's request, do you still think that I should update the DeployManager's usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, thought the Zk one was more similar. Can keep the separate one then
if (requestsCache.isEnabled()) { | ||
List<SingularityRequestWithState> requests = requestsCache.get("all"); | ||
if (requests != null) { | ||
return requests; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just not use the web cache here if this is the case. Seems overkill to have the leaderCache -> web cache -> zk cache all here. Lots of duplicate memory usage to store it all 3 times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, to have ease in revert, the removal of web cache will be in a separate PR
configuration.useApiCacheInDeployManager(), | ||
configuration.getDeployCacheTtl(), | ||
this::fetchAllDeployStates, | ||
executorServiceFactory.get("deploy-api-cache-reloader") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a version of this that returns a single threaded executor, which is all we should need for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have an option for a single threaded ScheduledExecutorService in SingularityManagedScheduledExecutorServiceFactory or SingularityManagedThreadPoolFactory but Executors has newSingleThreadScheduledExecutor that I can add.
@@ -90,6 +93,7 @@ | |||
); | |||
|
|||
private final Map<Class<? extends SingularityExpiringRequestActionParent<? extends SingularityExpiringRequestParent>>, Transcoder<? extends SingularityExpiringRequestActionParent<? extends SingularityExpiringRequestParent>>> expiringTranscoderMap; | |||
private final ApiCache<String, SingularityRequestWithState> requestsCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoudl the getRequests(List<String> requestIds)
and the singular getRequest(String requestId)
also use the cache? Doesn't look like they are at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't update elsewhere yet because I wasn't sure if that's what we wanted to do since we were most concerned by the endpoint to get all requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it all pulls form the same place, and we are constnatly updating everything, I think it'd be worth it to update. I believe that the individual request endpoint was also pretty high up on the usage. Can always be a follow up PR if we want to check how effective this is first too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the cache to the getRequests(List requestIds) and the singular getRequest(String requestId), and the singular request has a non-cache flag now for Orion usage
this.executor = executor; | ||
} | ||
|
||
public void startReloader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this also synchronously perform the first fetch I wonder? Would avoid the case where getAll was incorrectly empty or get(K key) was null
if (requestsCache.isEnabled() && !skipApiCache) { | ||
SingularityRequestWithState request = requestsCache.get(requestId); | ||
if (request != null) { | ||
return Optional.of(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make this Optional.ofNullable(...)
in case it was a false null, aka the request came in between the 5 second reload, but this means that we will go to ZK for true nulls (the request was removed).
@@ -544,6 +567,15 @@ public SingularityDeleteResult markDeleted( | |||
} | |||
} | |||
|
|||
if (requestsCache.isEnabled()) { | |||
List<SingularityRequestWithState> requests = new ArrayList<>( | |||
(requestsCache.getAll()).values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, extra parens here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One overall comment for zk efficiency. Do we want to have the reloader short circuit if the current instance is the leader (i.e. will have the leader cache instead)? I don't think there are any cases where we would hit this over the leader cache, so would likely want to limit the calls flooding the leader CuratorFramework
return getSingleThreaded(name, false); | ||
} | ||
|
||
public synchronized ScheduledExecutorService getSingleThreaded( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this method is unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the pattern in the SingularityManagedThreadPoolFactory for getSingleThreaded
but I'll update to not overload that method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 just realzied the one above calls it. Nevermind didn't even see that
@Deprecated | ||
public void setSkipApiCache(boolean skipApiCache) { | ||
this.skipApiCache = skipApiCache; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for structure/usability of the client. Is there a reason you have this at the class level instead of just being an arg on the relevant method(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through the request call usages in Orion and talked to Suruu and we decided it would be easier on Deploy's side to have a class level argument rather than updating all usages. It also makes clean up easier for when we've solved the underlying CuratorFramework issue
if (allValues.isEmpty()) { | ||
LOG.debug("ApiCache getAll returned empty"); | ||
} else { | ||
LOG.debug("getAll returned {} values", allValues.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this and the debug statement above, maybe these are more like TRACE level lines? Would get pretty noisy given that we could acall these multiple times a second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all of the get calls' logs trace level.
🚢 |
Originally had the CaffeineCache set up in the RequestResource by user and call parameters, but the return values couldn't be cached across all users because of differing authorization and settings. We ended up still experiencing 504s when SingularityClient's
getRequests
got hit hard again. We decided to split up the cache to cover the heaviest ZK calls in thegetRequest
method:RequestManager
'sfetchRequests
andDeployManager
'sfetchDeployStatesByRequestIds
, which don't have caching around client calls, but then there was a performance degradation in thegetRequest
call because of the cache expiry.Now we are replacing caffeine cache altogether with an AtomicReference to the full map of requests and deploys that are updated every
secondfive seconds by a different thread. The update to five seconds came after overloading the CuratorFramework with every second calls for request and deploy data.cc: @jschlather