fix: lru_cache issues + meta info missing #72

giovanni-guidini · 2023-08-08T22:09:17Z

So the real issue with the meta info is fixed in codecov/shared#22.
spoiler: reusing the report details cached values and changing them is not a good idea.

However in the process of debuging that @matt-codecov pointed out that we were not using lru_cache correctly.
Check this very well made video: https://www.youtube.com/watch?v=sVjtp6tGo0g

So the present changes upgrade shared so we fix the meta info stuff AND address the cache issue.
There are further complications with the caching situation, which explain why I decided to add the cached value in the
obj instead of self. The thing is that there's only 1 instance of ArchiveField shared among ALL instances of
the model class (for example, all ReportDetail instances). This kinda makes sense because we only create an instance
of ArchiveField in the declaration of the ReportDetail class.

Because of that if the cache is in the self of ArchiveField different instances of ReportDetails will have dirty cached value of other ReportDetails instances and we get wrong values. To fix that I envision 3 possibilities:

Putting the cached value in the ReportDetails instance directly (the obj), and checking for the presence of that value.
If it's there it's guaranteed that we put it there, and we can update it on writes, so that we can always use it. Because it is
for each ReportDetails instance we always get the correct value, and it's cleared when the instance is killed and garbage collected.
Storing an entire table of cached values in the self (ArchiveField) and using the appropriate cache value when possible. The problem here is that we need to manage the cache ourselves (which is not that hard, honestly) and probably set a max value. Then we will populate the cache and over time evict old values. The 2nd problem is that the values themselves might be too big to hold in memory (which can be fixed by setting a very small value in the cache size). There's a fine line there, but it's more work than option 1 anyway.
We move the getting and parsing of the value to outside ArchiveField (so it's a normal function) and use lru_cache in that function. Because the rehydrate function takes a reference to obj I don't think we should pass that, so the issue here is that we can't cache the rehydrated value, and would have to rehydrate every time (which currently is not expensive at all in any model)

This is an instance cache, so it shouldn't need to be cleaned for the duration of the instance's life
(because it is updates on the SET)

closes codecov/engineering-team#119

codecov-staging · 2023-08-10T16:22:03Z

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed	Coverage Δ
utils/model_utils.py	`96.77% <100.00%> (+1.77%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov · 2023-08-10T16:22:21Z

Codecov Report

Merging #72 (6528bd0) into main (b37dcd5) will not change coverage.
The diff coverage is 100.00%.

@@          Coverage Diff          @@
##            main     #72   +/-   ##
=====================================
  Coverage   95.30   95.30           
=====================================
  Files        693     693           
  Lines      14734   14736    +2     
=====================================
+ Hits       14041   14044    +3     
+ Misses       693     692    -1

Flag	Coverage Δ
unit	`95.24% <100.00%> (+<0.01%)`	⬆️
unit-latest-uploader	`95.24% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
utils/model_utils.py	`96.87% <100.00%> (+1.71%)`	⬆️

matt-codecov · 2023-08-10T16:52:35Z

good catch on there only being one ArchiveField instance for the class just like lru_cache

so the goal here is to avoid multiple requests to archive storage for the same resource, right? do we already know why those requests happen or is this speculative?

the ideal caching approach depends on the access characteristics, i guess. the per-instance caching is the best match if, hypothetically, ReportDetails repeatedly dereferences files_array during its processing. the class-wide caching is the best match if, hypothetically, each Commit instance only dereferences its ArchiveField once, but we create many Commit instances for the same commit by querying it from the db whenever we need it / across requests. and if you pick a good cache size for the class-wide cache, it also helps the first case as well

i have a weak preference for the class-wide caching, which i think is your option 2. but i don't think we need to build anything complex; i think we can just use @lru_cache slightly differently:

class ArchiveField:
    def __init__(self, maxsize=30):
        self._get_value_from_archive = lru_cache(maxsize=maxsize)(self._uncached_get_value_from_archive)

    def _uncached_get_value_from_archive(self):
        pass

that fixes the memory leak problem + behaves like your option 2. and it picks a default cache size, but leaves it open to configure per field if we ever need to

thanks for putting so much thought into this!

giovanni-guidini · 2023-08-10T18:04:18Z

that fixes the memory leak problem + behaves like your option 2. and it picks a default cache size, but leaves it open to configure per field if we ever need to

My understanding of the "memory leak" is that it holds on to the references of the objects passed to lru_cache. That was the case of self in the video. Your suggestion does fixes that (although we can argue that it was not a problem because we only have 1 ArchiveField instance that lives until the end of the process anyway).

We then need to be careful with the obj reference that is passed to _uncached_get_value_from_archive. Otherwise those references might leak, and that would be a big leak. It is doable, but I personally think that if we are to do that it's better to go straight to option 3 and move the function outside ArchiveField completely... maybe compromise somewhere between 2 and 3 and use the lru_cache inside the ArchiveField class is a good option

EDIT
Given that the cached objects can be in the order of hundreds of Mbs (for commit.report) I think we should stick with option 1 - put the value in the object instance. Because it uses less memory and we can be sure the object references will not stick around in the ArchiveField cache. When the object is done the memory is freed completely.

@matt-codecov

Context: codecov/engineering-team#119 So the real issue with the meta info is fixed in codecov/shared#22. spoiler: reusing the report details cached values and _changing_ them is not a good idea. However in the process of debuging that @matt-codecov pointed out that we were not using lru_cache correctly. Check this very well made video: https://www.youtube.com/watch?v=sVjtp6tGo0g So the present changes upgrade shared so we fix the meta info stuff AND address the cache issue. There are further complications with the caching situation, which explain why I decided to add the cached value in the `obj` instead of `self`. The thing is that there's only 1 instance of `ArchiveField` shared among ALL instances of the model class (for example, all `ReportDetail` instances). This kinda makes sense because we only create an instance of `ArchiveField` in the declaration of the `ReportDetail` class. Because of that if the cache is in the `self` of `ArchiveField` different instances of `ReportDetails` will have dirty cached value of other `ReportDetails` instances and we get wrong values. To fix that I envision 3 possibilities: 1. Putting the cached value in the `ReportDetails` instance directly (the `obj`), and checking for the presence of that value. If it's there it's guaranteed that we put it there, and we can update it on writes, so that we can always use it. Because it is for each `ReportDetails` instance we always get the correct value, and it's cleared when the instance is killed and garbage collected. 2. Storing an entire table of cached values in the `self` (`ArchiveField`) and using the appropriate cache value when possible. The problem here is that we need to manage the cache ourselves (which is not that hard, honestly) and probably set a max value. Then we will populate the cache and over time evict old values. The 2nd problem is that the values themselves might be too big to hold in memory (which can be fixed by setting a very small value in the cache size). There's a fine line there, but it's more work than option 1 anyway. 3. We move the getting and parsing of the value to outside `ArchiveField` (so it's a normal function) and use `lru_cache` in that function. Because the `rehydrate` function takes a reference to `obj` I don't think we should pass that, so the issue here is that we can't cache the rehydrated value, and would have to rehydrate every time (which currently is not expensive at all in any model) This is an instance cache, so it shouldn't need to be cleaned for the duration of the instance's life (because it is updates on the SET) closes codecov/engineering-team#119

These changes are similar to codecov/codecov-api#72 Same reasoning applies.

* Handle case were a single model uses multiple archived fields (dynamic archived field cached property name) * Concentrate getting/setting cache in `__get__` and `__set__` methods

These changes are similar to codecov/codecov-api#72 Same reasoning applies. The test changed happens because now we update the cache on the write. Because of that we are not doing the encode/decode/rehydrate operations. So the data you put in is the data you get. On one hand we can do such operations to guarantee consistency. On the other this is no different than what we used before `ArchiveField` AND such operations might be expensive.

giovanni-guidini requested a review from matt-codecov August 8, 2023 22:09

This comment was marked as outdated.

Sign in to view

giovanni-guidini force-pushed the gio/fix-session-totals-warning branch from 50c3699 to 4bacedc Compare August 10, 2023 15:56

giovanni-guidini changed the title ~~fix: session totals missing meta warning~~ @giovanni-guidini fix: lru_cache issues + meta info missing Aug 10, 2023

giovanni-guidini changed the title ~~@giovanni-guidini fix: lru_cache issues + meta info missing~~ fix: lru_cache issues + meta info missing Aug 10, 2023

giovanni-guidini requested a review from matt-codecov August 10, 2023 16:16

giovanni-guidini force-pushed the gio/fix-session-totals-warning branch from 4bacedc to f3f34c9 Compare August 10, 2023 18:41

giovanni-guidini added a commit to codecov/worker that referenced this pull request Aug 10, 2023

fix: change cache of ArchiveField

3331efc

These changes are similar to codecov/codecov-api#72 Same reasoning applies.

giovanni-guidini mentioned this pull request Aug 10, 2023

fix: change cache of ArchiveField codecov/worker#55

Merged

fix: update archive field cache

7ecd515

* Handle case were a single model uses multiple archived fields (dynamic archived field cached property name) * Concentrate getting/setting cache in `__get__` and `__set__` methods

matt-codecov approved these changes Aug 11, 2023

View reviewed changes

Merge branch 'main' into gio/fix-session-totals-warning

6528bd0

giovanni-guidini merged commit e2c6b1c into main Aug 11, 2023
11 of 12 checks passed

giovanni-guidini deleted the gio/fix-session-totals-warning branch August 11, 2023 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: lru_cache issues + meta info missing #72

fix: lru_cache issues + meta info missing #72

giovanni-guidini commented Aug 8, 2023 •

edited

Loading

This comment was marked as outdated.

codecov-staging bot commented Aug 10, 2023 •

edited

Loading

codecov bot commented Aug 10, 2023 •

edited

Loading

matt-codecov commented Aug 10, 2023

giovanni-guidini commented Aug 10, 2023 •

edited

Loading

fix: lru_cache issues + meta info missing #72

fix: lru_cache issues + meta info missing #72

Conversation

giovanni-guidini commented Aug 8, 2023 • edited Loading

This comment was marked as outdated.

codecov-staging bot commented Aug 10, 2023 • edited Loading

Codecov Report

codecov bot commented Aug 10, 2023 • edited Loading

Codecov Report

matt-codecov commented Aug 10, 2023

giovanni-guidini commented Aug 10, 2023 • edited Loading

giovanni-guidini commented Aug 8, 2023 •

edited

Loading

codecov-staging bot commented Aug 10, 2023 •

edited

Loading

codecov bot commented Aug 10, 2023 •

edited

Loading

giovanni-guidini commented Aug 10, 2023 •

edited

Loading