-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: lru_cache issues + meta info missing #72
Conversation
50c3699
to
4bacedc
Compare
Codecov ReportPatch coverage is
☔ View full report in Codecov by Sentry. |
Codecov Report
@@ Coverage Diff @@
## main #72 +/- ##
=====================================
Coverage 95.30 95.30
=====================================
Files 693 693
Lines 14734 14736 +2
=====================================
+ Hits 14041 14044 +3
+ Misses 693 692 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
good catch on there only being one ArchiveField instance for the class just like so the goal here is to avoid multiple requests to archive storage for the same resource, right? do we already know why those requests happen or is this speculative? the ideal caching approach depends on the access characteristics, i guess. the per-instance caching is the best match if, hypothetically, i have a weak preference for the class-wide caching, which i think is your option 2. but i don't think we need to build anything complex; i think we can just use
that fixes the memory leak problem + behaves like your option 2. and it picks a default cache size, but leaves it open to configure per field if we ever need to thanks for putting so much thought into this! |
My understanding of the "memory leak" is that it holds on to the references of the objects passed to We then need to be careful with the EDIT |
Context: codecov/engineering-team#119 So the real issue with the meta info is fixed in codecov/shared#22. spoiler: reusing the report details cached values and _changing_ them is not a good idea. However in the process of debuging that @matt-codecov pointed out that we were not using lru_cache correctly. Check this very well made video: https://www.youtube.com/watch?v=sVjtp6tGo0g So the present changes upgrade shared so we fix the meta info stuff AND address the cache issue. There are further complications with the caching situation, which explain why I decided to add the cached value in the `obj` instead of `self`. The thing is that there's only 1 instance of `ArchiveField` shared among ALL instances of the model class (for example, all `ReportDetail` instances). This kinda makes sense because we only create an instance of `ArchiveField` in the declaration of the `ReportDetail` class. Because of that if the cache is in the `self` of `ArchiveField` different instances of `ReportDetails` will have dirty cached value of other `ReportDetails` instances and we get wrong values. To fix that I envision 3 possibilities: 1. Putting the cached value in the `ReportDetails` instance directly (the `obj`), and checking for the presence of that value. If it's there it's guaranteed that we put it there, and we can update it on writes, so that we can always use it. Because it is for each `ReportDetails` instance we always get the correct value, and it's cleared when the instance is killed and garbage collected. 2. Storing an entire table of cached values in the `self` (`ArchiveField`) and using the appropriate cache value when possible. The problem here is that we need to manage the cache ourselves (which is not that hard, honestly) and probably set a max value. Then we will populate the cache and over time evict old values. The 2nd problem is that the values themselves might be too big to hold in memory (which can be fixed by setting a very small value in the cache size). There's a fine line there, but it's more work than option 1 anyway. 3. We move the getting and parsing of the value to outside `ArchiveField` (so it's a normal function) and use `lru_cache` in that function. Because the `rehydrate` function takes a reference to `obj` I don't think we should pass that, so the issue here is that we can't cache the rehydrated value, and would have to rehydrate every time (which currently is not expensive at all in any model) This is an instance cache, so it shouldn't need to be cleaned for the duration of the instance's life (because it is updates on the SET) closes codecov/engineering-team#119
4bacedc
to
f3f34c9
Compare
These changes are similar to codecov/codecov-api#72 Same reasoning applies.
* Handle case were a single model uses multiple archived fields (dynamic archived field cached property name) * Concentrate getting/setting cache in `__get__` and `__set__` methods
These changes are similar to codecov/codecov-api#72 Same reasoning applies. The test changed happens because now we update the cache on the write. Because of that we are not doing the encode/decode/rehydrate operations. So the data you put in is the data you get. On one hand we can do such operations to guarantee consistency. On the other this is no different than what we used before `ArchiveField` AND such operations might be expensive.
Context: codecov/engineering-team#119
So the real issue with the meta info is fixed in codecov/shared#22.
spoiler: reusing the report details cached values and changing them is not a good idea.
However in the process of debuging that @matt-codecov pointed out that we were not using lru_cache correctly.
Check this very well made video: https://www.youtube.com/watch?v=sVjtp6tGo0g
So the present changes upgrade shared so we fix the meta info stuff AND address the cache issue.
There are further complications with the caching situation, which explain why I decided to add the cached value in the
obj
instead ofself
. The thing is that there's only 1 instance ofArchiveField
shared among ALL instances ofthe model class (for example, all
ReportDetail
instances). This kinda makes sense because we only create an instanceof
ArchiveField
in the declaration of theReportDetail
class.Because of that if the cache is in the
self
ofArchiveField
different instances ofReportDetails
will have dirty cached value of otherReportDetails
instances and we get wrong values. To fix that I envision 3 possibilities:Putting the cached value in the
ReportDetails
instance directly (theobj
), and checking for the presence of that value.If it's there it's guaranteed that we put it there, and we can update it on writes, so that we can always use it. Because it is
for each
ReportDetails
instance we always get the correct value, and it's cleared when the instance is killed and garbage collected.Storing an entire table of cached values in the
self
(ArchiveField
) and using the appropriate cache value when possible. The problem here is that we need to manage the cache ourselves (which is not that hard, honestly) and probably set a max value. Then we will populate the cache and over time evict old values. The 2nd problem is that the values themselves might be too big to hold in memory (which can be fixed by setting a very small value in the cache size). There's a fine line there, but it's more work than option 1 anyway.We move the getting and parsing of the value to outside
ArchiveField
(so it's a normal function) and uselru_cache
in that function. Because therehydrate
function takes a reference toobj
I don't think we should pass that, so the issue here is that we can't cache the rehydrated value, and would have to rehydrate every time (which currently is not expensive at all in any model)This is an instance cache, so it shouldn't need to be cleaned for the duration of the instance's life
(because it is updates on the SET)
closes codecov/engineering-team#119