Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change cache of ArchiveField #55

Merged
merged 3 commits into from
Aug 11, 2023

Conversation

giovanni-guidini
Copy link
Contributor

These changes are similar to codecov/codecov-api#72
Same reasoning applies.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

These changes are similar to codecov/codecov-api#72
Same reasoning applies.
Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully this'll be the last iteration and you can wipe your hands of this bug soon. great work with the investigation and considering all the options for the fix!

i am happy with this approach provided you get one more version up to centralize the caching stuff in __get__()

for your consideration: i just realized we could also solve the "one ArchiveField for the whole class" problem the same way we would solve it for lru_cache:

class ArchiveField:
    def __init__(self):
        self.cached = lru_cache(maxsize=1)(self._uncached)

class ReportDetails:
    def __init__(self):
        self.files_array = ArchiveField(...)

i don't think defining a class property in __init__() is necessarily cleaner than getattr()/setattr() (which we're already doing anyway), and lru_cache has the downside that __set__() can't seed the cache, you have to re-request on next access. so i'm happy with the approach you're taking in the current version of the PR, but i'll leave it up to you

def _get_value_from_archive(self, obj):
repository = obj.get_repository()
archive_service = ArchiveService(repository=repository)
archive_field = getattr(obj, self.archive_field_name)
if archive_field:
try:
file_str = archive_service.read_file(archive_field)
return self.rehydrate_fn(obj, json.loads(file_str))
value = self.rehydrate_fn(obj, json.loads(file_str))
setattr(obj, "__archive_field_cached_value", value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one model could potentially have more than one ArchiveField, i think you should tie this name to self.public_name. maybe in __init__() add self.cached_value_name = f"__{self.public_name}_cached_value"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a very good observation. you are absolutely correct

@@ -106,9 +107,14 @@ def _get_value_from_archive(self, obj):
return self.default_value

def __get__(self, obj, objtype=None):
cached_value = getattr(obj, "__archive_field_cached_value", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it'd be nice to keep all the cache-related stuff into __get__() and __set__() and out of the other functions

def __get__(self, obj, objtype=None):
    if getattr(obj, self.cached_value_name) is None:
        db_field = getattr(obj, self.db_field_name)
        if db_field is not None:
            value = ...
        else:
            value = self._get_value_from_archive(obj)
        setattr(obj, self.cached_value_name, value)
    return getattr(obj, self.cached_value_name)

* Set varialbe cached_value property name, for the case were a single model might have
multiple archived fields
* Concentrate all archive getting/setting in the `__get__` and `__set__` functions

The test changed happens because now we update the cache on the write. Because of that we
are not doing the encode/decode/rehydrate operations. So the data you put in is the data you get.
On one hand we can do such operations to guarantee consistency. On the other this is no different than
what we used before `ArchiveField` AND such operations might be expensive.
@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Merging #55 (1558f14) into main (3ee8958) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main      #55   +/-   ##
=======================================
  Coverage   98.61%   98.61%           
=======================================
  Files         357      357           
  Lines       25942    25943    +1     
=======================================
+ Hits        25583    25584    +1     
  Misses        359      359           
Flag Coverage Δ
integration 98.58% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 98.58% <100.00%> (+<0.01%) ⬆️
onlysomelabels 98.61% <100.00%> (+<0.01%) ⬆️
unit 98.58% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 97.36% <100.00%> (+<0.01%) ⬆️
OutsideTasks 98.35% <100.00%> (+<0.01%) ⬆️
Files Changed Coverage Δ
services/tests/test_report.py 100.00% <ø> (ø)
database/tests/unit/test_model_utils.py 100.00% <100.00%> (ø)
database/tests/unit/test_models.py 100.00% <100.00%> (ø)
database/utils.py 96.92% <100.00%> (+0.25%) ⬆️
Related Entrypoints
run/app.tasks.upload.Upload
run/app.tasks.upload.UploadProcessor
run/app.tasks.notify.Notify
run/app.tasks.pulls.Sync
run/app.tasks.upload.UploadFinisher
run/app.tasks.compute_comparison.ComputeComparison
run/app.tasks.timeseries.backfill_commits

@giovanni-guidini giovanni-guidini merged commit 319753b into main Aug 11, 2023
12 checks passed
@giovanni-guidini giovanni-guidini deleted the gio/fix-archive-field-cache branch August 11, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants