Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background refreshes not happening #120

Closed
panghy opened this issue Sep 2, 2016 · 9 comments
Closed

Background refreshes not happening #120

panghy opened this issue Sep 2, 2016 · 9 comments

Comments

@panghy
Copy link

panghy commented Sep 2, 2016

We are running 2.3.3 in production and we have noticed that a particular Caffeine cache has simply stopped refreshing entries (it's still serving loaded ones) even though it's configured to do so after 30 seconds.

Caffeine.newBuilder().refreshAfterWrite(30, TimeUnit.SECONDS)

It's a deduction at this point since we have a log message during the refresh (but we are not seeing it) and we are certainly not seeing any exceptions being thrown either. Thread dumps do not show any stuck ForkJoinPool workers (they all seem to be awaiting work).

This is a somewhat new issue I'd reckon since we only recently upgraded to 2.3.x.

Thanks for looking into it.

@ben-manes
Copy link
Owner

Sorry to hear about this. If you can help write a test case that demonstrates the problem, I'd appreciate it. It would be helpful to know if you're cache is synchronous or asynchronous. The asynchronous variant adds complexity to avoid additional threads, premature refresh while the original future is loading, etc.

I'll try to take advantage of the long weekend to see if I can reproduce and debug the problem.

@ben-manes
Copy link
Owner

ben-manes commented Sep 2, 2016

I have a theory that I'll need to explore further.

refreshIfNeeded ends up in a compute method and assumes that it will update the write timestamp when complete. However an afterWrite() is costly, so it has an optimization to use afterRead() if possible. The final check only takes into account expiresAfterWrite() and not refreshAfterWrite(). When using an AsyncLoadingCache the timestamp is bumped to infinity due to the async client computation, so without an external write to the entry it would be stuck. An immediate solution, if I'm right, is to enable expiresAfterWrite with an excessively high value.

I'll have to write a test to fail, understand why my coverage and validation are inadequate, and review all other writes for a similar oversight. So assuming that's the culprit it will still take me the weekend to get a proper fix in place.

@ben-manes
Copy link
Owner

In the described case, the writeTime is properly set during the computation. That is perhaps redundant in the common case, but okay. So nothing stands out when reading through the code and debugging a test case.

The next possibility is a visibility problem where the relaxed write is lost so the infinite timestamp remains. That's going to require a threaded test to see if we can observe it. In general a relaxed write is fine as best effort is good enough, but this timestamp is thorny. Of all the features, for me refresh is the most confusing due to all of the interactions at play and less familiarity by not writing Guava's version. I don't recall why I felt the infinite was needed and only for an async cache, meaning either it was unnecessary or I'm forgetting the scenario being guarded against.

@panghy
Copy link
Author

panghy commented Sep 2, 2016

Thanks for looking into this and the detailed analysis. If anything, we noticed that nothing was being refreshed. Since the cache had many entries and stack frames show that the cache was returning results, it's going to be a case of somehow the cache getting into a state where all refresh attempts are ignored. We are turning on record stats to get more insights into it at this time.

@ben-manes
Copy link
Owner

My best guess is that you're using an async cache and all entries somehow are stuck as if the refresh is in flight so no others for that entry should be triggered. If that's not the case then I'm stumped unless the executor is hostile by ignoring our submissions.

@ben-manes
Copy link
Owner

Please provide more details so that we can determine if it's a bug in the cache or usage. So far I haven't been able to reproduce the issue, so a failing test case would be very useful.

@panghy
Copy link
Author

panghy commented Sep 9, 2016

Not a problem. I am on vacation right now, when I get back, I'll try to see
if I can create one. Perhaps an OOM during refresh?

On Sep 9, 2016 10:21 AM, "Ben Manes" [email protected] wrote:

Please provide more details so that we can determine if it's a bug in the
cache or usage. So far I haven't been able to reproduce the issue, so a
failing test case would be very useful.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#120 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAgKM3Lf5_mwmeRgUubHeDSHGp3ytWz_ks5qoRb0gaJpZM4JzTbw
.

@panghy
Copy link
Author

panghy commented Sep 15, 2016

I think at this point we can't really reproduce this so I think it's safe to say that perhaps it was an OOM (which usually causes havoc in many places anyways). We have since turned on exit on OOM just in case but I don't think it's something that's specifically wrong with Caffeine.

Thanks for looking into this. I am not sure if you will be attending JavaOne this weekend but I am going to be speaking at the keynote briefly with Georges Saab from Oracle. I'll make sure to give a shout out to Caffeine :)

@panghy panghy closed this as completed Sep 15, 2016
@ben-manes
Copy link
Owner

Thanks for the update. I'm not attending JavaOne, but will be watching the videos. Have a great time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants