-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate why GCS data is being saved without meta info #119
Investigate why GCS data is being saved without meta info #119
Comments
I checked several commits in several repos and they all had the "meta" info. That hasn't happened for the last 2 days tho. |
Marking as in review. Please check it again before the end of sprint and lets close pending more cases arising. |
In an effort to understand why we're getting `SessionArray` data from GCS without 'meta' info we'll add a temporary log line when saving data from the API. Mostly because I don't think this should happen, so I don't expect to see that line. But who knows... Part of codecov/engineering-team#119
The are still cases of this happening. It's like a silent error because it doesn't matter anymore since the fix. But I'll add some log lines to help us debug where the problem is. |
In an effort to understand why we're getting `SessionArray` data from GCS without 'meta' info we'll add a temporary log line when saving data from the API. Mostly because I don't think this should happen, so I don't expect to see that line. But who knows... Part of codecov/engineering-team#119
In an effort to understand why we're getting `SessionArray` data from GCS without 'meta' info we'll add a temporary log line when saving data from the API. Mostly because I don't think this should happen, so I don't expect to see that line. But who knows... Part of codecov/engineering-team#119
You can play around the logs with the following messages:
From what I see in the logs we had 2 times data without meta (when creating the report details, the data was So I think this leads us to believe that:
UPDATE |
I reproduced this locally. The thing that uses this is the sunburst chart in the UI (see image) Now it starts to get interesting. I changed the retrieve function to do this: # Roughly https://github.com/codecov/codecov-api/blob/14dfd7c868e34a840387f9529b51194924c2a669/utils/model_utils.py#L87
@lru_cache(maxsize=1)
def _get_value_from_archive(self, obj):
repository = obj.get_repository()
archive_service = ArchiveService(repository=repository)
archive_field = getattr(obj, self.archive_field_name)
if archive_field:
try:
file_str = archive_service.read_file(archive_field)
if self.public_name == 'files_array':
rehydrated_data = self.rehydrate_fn(obj, json.loads(file_str))
for file in rehydrated_data:
if "session_totals" not in file or "meta" not in file['session_totals']:
log.warning("META NOT IN SESSION_TOTALS", extra=dict(data=file))
return self.rehydrate_fn(obj, json.loads(file_str))
except FileNotInStorageError:
log.error(
"Archive enabled field not in storage",
extra=dict(
storage_path=archive_field,
object_id=obj.id,
commit=obj.get_commitid(),
),
)
... so we would get the At the same time I changed the place where we use this in a similar fashion # Roughly https://github.com/codecov/codecov-api/blob/14dfd7c868e34a840387f9529b51194924c2a669/services/report.py#L206
files_array = report_details.files_array
for file in files_array:
if "session_totals" not in file or "meta" not in file['session_totals']:
log.warning("really missing meta info", extra=dict(data=file))
return {
file["filename"]: ReportFileSummary(
file_index=file["file_index"],
file_totals=ReportTotals(*file["file_totals"]),
session_totals=file["session_totals"],
diff_totals=file["diff_totals"],
)
for file in files_array
} And... well... WARNING: really missing meta info --- {"asctime": "2023-08-08 21:38:27,275", "name": "services.report", "lineno": 210, "pathname": "/app/services/report.py", "funcName": "build_files", "threadName": "MainThread", "data": {"filename": "services/fetch_data/tests/test_fetch_data.py", "file_index": 24, "file_totals": [0, 1, 1, 0, 0, "100", 0, 0, 0, 0, 0, 0, 0], "session_totals": {"0": [0, 1, 1, 0, 0, "100"]}, "diff_totals": null}} EDIT |
Context: codecov/engineering-team#119 So it seems that after much investigation the issue was not in our code per se, but in the `lru_cache` function. Probably because of the complex nature of both the arguments and the return value. I propose this solution as quick and simple. The idea being that string values are safer to use as the cached return. Obviously there's some extra processing that will need to be done on each access, but I suppose it's an acceptable trade-off if we get the correct values every time. closes codecov/engineering-team#119
recently we had [issues](codecov/engineering-team#119) with meta info missing from the encoded data of sessions array. After much investigation and head scratching the reason is that moving data to GCS and *caching* the data causes us to re-use the dictionary instance of files_array. Why does this matter? because dictionaries are passed via *reference*. This means that if you change the dict reference it will change in all pointers to that dictionary (read the *cached value* in `ArchiveField`). So basically we were shooting ourselves in the foot. The good news is that data is being saved and fetched correctly. We just need to not make destructive changes to the encoded data, because it might be reused in the future. The current changes address that, making the building of `SessionTotalArray` not destructive for the encoded data. closes codecov/engineering-team#119
recently we had [issues](codecov/engineering-team#119) with meta info missing from the encoded data of sessions array. After much investigation and head scratching the reason is that moving data to GCS and *caching* the data causes us to re-use the dictionary instance of files_array. Why does this matter? because dictionaries are passed via *reference*. This means that if you change the dict reference it will change in all pointers to that dictionary (read the *cached value* in `ArchiveField`). So basically we were shooting ourselves in the foot. The good news is that data is being saved and fetched correctly. We just need to not make destructive changes to the encoded data, because it might be reused in the future. The current changes address that, making the building of `SessionTotalArray` not destructive for the encoded data. closes codecov/engineering-team#119
Context: codecov/engineering-team#119 So the real issue with the meta info is fixed in codecov/shared#22. spoiler: reusing the report details cached values and _changing_ them is not a good idea. However in the process of debuging that @matt-codecov pointed out that we were not using lru_cache correctly. Check this very well made video: https://www.youtube.com/watch?v=sVjtp6tGo0g So the present changes upgrade shared so we fix the meta info stuff AND address the cache issue. There are further complications with the caching situation, which explain why I decided to add the cached value in the `obj` instead of `self`. The thing is that there's only 1 instance of `ArchiveField` shared among ALL instances of the model class (for example, all `ReportDetail` instances). This kinda makes sense because we only create an instance of `ArchiveField` in the declaration of the `ReportDetail` class. Because of that if the cache is in the `self` of `ArchiveField` different instances of `ReportDetails` will have dirty cached value of other `ReportDetails` instances and we get wrong values. To fix that I envision 3 possibilities: 1. Putting the cached value in the `ReportDetails` instance directly (the `obj`), and checking for the presence of that value. If it's there it's guaranteed that we put it there, and we can update it on writes, so that we can always use it. Because it is for each `ReportDetails` instance we always get the correct value, and it's cleared when the instance is killed and garbage collected. 2. Storing an entire table of cached values in the `self` (`ArchiveField`) and using the appropriate cache value when possible. The problem here is that we need to manage the cache ourselves (which is not that hard, honestly) and probably set a max value. Then we will populate the cache and over time evict old values. The 2nd problem is that the values themselves might be too big to hold in memory (which can be fixed by setting a very small value in the cache size). There's a fine line there, but it's more work than option 1 anyway. 3. We move the getting and parsing of the value to outside `ArchiveField` (so it's a normal function) and use `lru_cache` in that function. Because the `rehydrate` function takes a reference to `obj` I don't think we should pass that, so the issue here is that we can't cache the rehydrated value, and would have to rehydrate every time (which currently is not expensive at all in any model) This is an instance cache, so it shouldn't need to be cleaned for the duration of the instance's life (because it is updates on the SET) closes codecov/engineering-team#119
* Add trial logic * Get rid of unecessary comments + linting * reintroduce print statement and revert gql nullability * Added expiration value when expiring trial * Add pretrial user count field * create a new endpoint that marks the completion step for uploads, which triggers notifs and syncs pulls (#49) * fix: Cast UUID to str before saving in the db (#56) * chore: add debug log for files array saving (#47) In an effort to understand why we're getting `SessionArray` data from GCS without 'meta' info we'll add a temporary log line when saving data from the API. Mostly because I don't think this should happen, so I don't expect to see that line. But who knows... Part of codecov/engineering-team#119 * chore: change logic to write data to storage (#48) After moving `ReportDetails.files_array` and `Commit.report` to archive successfilly for Codecov we want to do the same for some selected customers (in `repo_ids`) and then we will want to start the process for new fields. Only for codecov. The way things were before, if you have a field set to write to GCS it is valid for all codecov repos AND all repos under `repo_ids`, while we would only want that for codecov repos. So by changing the `master_write_switch` to strings we can have more power over each individual field. Now we can use: * `True` or `codecov_access` to write *only* for codecov repos * `restricted_access` to write to all repos in `repo_ids` (+ codecov) * `general_access` to write to all repos Because currently we're only writing for codecov repos the change will not affect any customers Part of codecov/engineering-team#100 * fix: Force owner login during pagination test (#58) * merge * Small fixes + addons * Misc additions to the plan service + trial * fix: Ensure we don't accidentally leak stack trace info (#59) * Add pretrial_plan resolver * Fix test --------- Co-authored-by: Dana Yaish <[email protected]> Co-authored-by: scott-codecov <[email protected]> Co-authored-by: Giovanni M Guidini <[email protected]>
Context: codecov/engineering-team#119 So the real issue with the meta info is fixed in codecov/shared#22. spoiler: reusing the report details cached values and _changing_ them is not a good idea. However in the process of debuging that @matt-codecov pointed out that we were not using lru_cache correctly. Check this very well made video: https://www.youtube.com/watch?v=sVjtp6tGo0g So the present changes upgrade shared so we fix the meta info stuff AND address the cache issue. There are further complications with the caching situation, which explain why I decided to add the cached value in the `obj` instead of `self`. The thing is that there's only 1 instance of `ArchiveField` shared among ALL instances of the model class (for example, all `ReportDetail` instances). This kinda makes sense because we only create an instance of `ArchiveField` in the declaration of the `ReportDetail` class. Because of that if the cache is in the `self` of `ArchiveField` different instances of `ReportDetails` will have dirty cached value of other `ReportDetails` instances and we get wrong values. To fix that I envision 3 possibilities: 1. Putting the cached value in the `ReportDetails` instance directly (the `obj`), and checking for the presence of that value. If it's there it's guaranteed that we put it there, and we can update it on writes, so that we can always use it. Because it is for each `ReportDetails` instance we always get the correct value, and it's cleared when the instance is killed and garbage collected. 2. Storing an entire table of cached values in the `self` (`ArchiveField`) and using the appropriate cache value when possible. The problem here is that we need to manage the cache ourselves (which is not that hard, honestly) and probably set a max value. Then we will populate the cache and over time evict old values. The 2nd problem is that the values themselves might be too big to hold in memory (which can be fixed by setting a very small value in the cache size). There's a fine line there, but it's more work than option 1 anyway. 3. We move the getting and parsing of the value to outside `ArchiveField` (so it's a normal function) and use `lru_cache` in that function. Because the `rehydrate` function takes a reference to `obj` I don't think we should pass that, so the issue here is that we can't cache the rehydrated value, and would have to rehydrate every time (which currently is not expensive at all in any model) This is an instance cache, so it shouldn't need to be cleaned for the duration of the instance's life (because it is updates on the SET) closes codecov/engineering-team#119
Context: codecov/engineering-team#119 So the real issue with the meta info is fixed in codecov/shared#22. spoiler: reusing the report details cached values and _changing_ them is not a good idea. However in the process of debuging that @matt-codecov pointed out that we were not using lru_cache correctly. Check this very well made video: https://www.youtube.com/watch?v=sVjtp6tGo0g So the present changes upgrade shared so we fix the meta info stuff AND address the cache issue. There are further complications with the caching situation, which explain why I decided to add the cached value in the `obj` instead of `self`. The thing is that there's only 1 instance of `ArchiveField` shared among ALL instances of the model class (for example, all `ReportDetail` instances). This kinda makes sense because we only create an instance of `ArchiveField` in the declaration of the `ReportDetail` class. Because of that if the cache is in the `self` of `ArchiveField` different instances of `ReportDetails` will have dirty cached value of other `ReportDetails` instances and we get wrong values. To fix that I envision 3 possibilities: 1. Putting the cached value in the `ReportDetails` instance directly (the `obj`), and checking for the presence of that value. If it's there it's guaranteed that we put it there, and we can update it on writes, so that we can always use it. Because it is for each `ReportDetails` instance we always get the correct value, and it's cleared when the instance is killed and garbage collected. 2. Storing an entire table of cached values in the `self` (`ArchiveField`) and using the appropriate cache value when possible. The problem here is that we need to manage the cache ourselves (which is not that hard, honestly) and probably set a max value. Then we will populate the cache and over time evict old values. The 2nd problem is that the values themselves might be too big to hold in memory (which can be fixed by setting a very small value in the cache size). There's a fine line there, but it's more work than option 1 anyway. 3. We move the getting and parsing of the value to outside `ArchiveField` (so it's a normal function) and use `lru_cache` in that function. Because the `rehydrate` function takes a reference to `obj` I don't think we should pass that, so the issue here is that we can't cache the rehydrated value, and would have to rehydrate every time (which currently is not expensive at all in any model) This is an instance cache, so it shouldn't need to be cleaned for the duration of the instance's life (because it is updates on the SET) closes codecov/engineering-team#119
https://sentry.slack.com/archives/C04M5GE2R5M/p1689792185704099 context
The text was updated successfully, but these errors were encountered: