-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use encoded labels in reports with them #200
Use encoded labels in reports with them #200
Conversation
Updates shared to the version where `Report` has a `_labels_index` and makes use of them in processing.
context: codecov/engineering-team #768 Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild. Notice that currently the label compression does nothing. There are comments similar to `TODO: needs shared update`. The update in question is codecov/shared#79 So these changes mostly prep the terrain for future changes that will actually do something, and add the guardrails to avoid issues when deploying. In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`, passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur. If you're curious as to what the end result will probably look like see #180
These changes extend the `Report` to use the `_labels_index`. Everything is controlled via a rollout, so in practice there's a single repo that will use the compressed labels, my sentry fork (giovanni-guidini/sentry). For testing purposes. There are 2 important facts here: 1. Given worker / shared pairs we can only have (old, old) or (new, new). where "old" denote the code that doesn't do label encoding, and "new" the code that does. 2. The (new, new) pair can handle the cases the (old, old) does and the new one of label encoding, but the (old, old) pair will generate corrupted results if it tries to process an encoded report. SO it's important to woll out the (new, new) pair into prod _before_ turning the feature on for everyone. Which is what we are doing, so we're fine (probably). At first I was very much in the "let's tear everything down and build again", but because the (new, new) pair needs to handle the unencoded versions as well I decided to make smaller changes so the code can co-exist in harmony.
context: codecov/engineering-team #768 Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild. Notice that currently the label compression does nothing. There are comments similar to `TODO: needs shared update`. The update in question is codecov/shared#79 So these changes mostly prep the terrain for future changes that will actually do something, and add the guardrails to avoid issues when deploying. In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`, passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur. If you're curious as to what the end result will probably look like see #180
When generating the report the labels_index was being generaged with the SpecialLabel in the index. That's a problem because: 1. It's different from the default index wheere you have the corresponding label, so you end up having 2 indexes for the same label 2. THe enum value is not JSON serializable By already using the corresponding label we solve both problems at once.
While processing reports they get merged into a temporary report (because there might be multiple reports in a single upload) THe temporary reports is what ultimately gets merged with the original one, so the labels need to be passed to it.
WHen carrying forward the report we also need to carryforward the label index, if the parent_commit has one.
As pointed out by @matt-codecov, there was a difference between the old behaviour creates "outer list containing many single-entry inner lists" whereas the new behavior was creating "outer list containing a single large inner list". This difference is subtle but it matters. Each of the inner lists become a datapoint with all the labels in the list. Later on, when [deleting labels](https://github.com/codecov/shared/blob/main/shared/reports/editable.py#L52) we delete a datapoint if _any_ of the labels to delete is in it. This means that the old behavior made the datapoints stick around for longer. I should probably investigate if we ever merge datapoints... not sure Anyway, matching old behavior.
Addressing various review comments around renaming functions, updating comments and small refactors. Also updates shared again, where `_labels_index` was renamed to `labels_index`, given that it's accessed all the time. The bigest refactor here is around carrying forawrd the labels index. As suggested we are skipping the `LabelsIndexService` and directly using the `ArchiveService` to copy the file from parent commit to child commit.
lables_index is used in the following scenarios: 1. merging reports during processing 2. label_analysis In these cases we load it, use it, unload it. Unloading includes saving it to storage again, and that is how changes are persisted. ONLY scenario 1 actually makes changes to it. Now imagine that for some reason there's a label-analysis request that takes a long time with the index open(unlikely) and in that period an upload is processed (unlikely). Then we would loose the updated index when label-analysis re-writes it to storage. To prevent that without a proper locking mechanism we can just let label_analysis NOT save it back. And because the merging scenario already runs inside a lock we are fine :)
Changes the `shared` dependency to the version where the labels_index is inside the report. Removes the `LabelsIndexService` and all code associated with it.
These changes are meant to replace #189 |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #200 +/- ##
========================================
Coverage 98.34% 98.35%
========================================
Files 354 356 +2
Lines 27935 28485 +550
========================================
+ Hits 27473 28015 +542
- Misses 462 470 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov Report
Changes have been made to critical files, which contain lines commonly executed in production. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #200 +/- ##
========================================
Coverage 98.31% 98.31%
========================================
Files 383 387 +4
Lines 28580 29180 +600
========================================
+ Hits 28098 28689 +591
- Misses 482 491 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes
|
efa97e3
to
da2a563
Compare
da2a563
to
ec947a4
Compare
Codecov Report
@@ Coverage Diff @@
## main #200 +/- ##
=======================================
Coverage ? 98.35%
=======================================
Files ? 356
Lines ? 28485
Branches ? 0
=======================================
Hits ? 28015
Misses ? 470
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Uses encoded labels with the labels_index to point to labels in the right direction.