Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use encoded labels in reports with them #200

Merged
merged 18 commits into from
Dec 6, 2023

Conversation

giovanni-guidini
Copy link
Contributor

Uses encoded labels with the labels_index to point to labels in the right direction.

Updates shared to the version where `Report` has a `_labels_index` and makes use of them in processing.
context: codecov/engineering-team #768

Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild.
Notice that currently the label compression does nothing.
There are comments similar to `TODO: needs shared update`.
The update in question is codecov/shared#79

So these changes mostly prep the terrain for future changes that will actually do something,
and add the guardrails to avoid issues when deploying.
In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`,
passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur.

If you're curious as to what the end result will probably look like see #180
These changes extend the `Report` to use the `_labels_index`.
Everything is controlled via a rollout, so in practice there's a single repo that will use
the compressed labels, my sentry fork (giovanni-guidini/sentry). For testing purposes.

There are 2 important facts here:
1. Given worker / shared pairs we can only have (old, old) or (new, new).
    where "old" denote the code that doesn't do label encoding, and "new" the code that does.
2. The (new, new) pair can handle the cases the (old, old) does and the new one of label encoding,
    but the (old, old) pair will generate corrupted results if it tries to process an encoded report.

SO it's important to woll out the (new, new) pair into prod _before_ turning the feature on for everyone.
Which is what we are doing, so we're fine (probably).

At first I was very much in the "let's tear everything down and build again", but because the (new, new) pair
needs to handle the unencoded versions as well I decided to make smaller changes so the code can co-exist in harmony.
context: codecov/engineering-team #768

Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild.
Notice that currently the label compression does nothing.
There are comments similar to `TODO: needs shared update`.
The update in question is codecov/shared#79

So these changes mostly prep the terrain for future changes that will actually do something,
and add the guardrails to avoid issues when deploying.
In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`,
passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur.

If you're curious as to what the end result will probably look like see #180
When generating the report the labels_index was being generaged with the SpecialLabel
in the index. That's a problem because:
1. It's different from the default index wheere you have the corresponding label, so you end up
   having 2 indexes for the same label
2. THe enum value is not JSON serializable

By already using the corresponding label we solve both problems at once.
While processing reports they get merged into a temporary report
(because there might be multiple reports in a single upload)

THe temporary reports is what ultimately gets merged with the original one,
so the labels need to be passed to it.
WHen carrying forward the report we also need to carryforward the label index,
if the parent_commit has one.
As pointed out by @matt-codecov, there was a difference between the
old behaviour creates "outer list containing many single-entry inner
lists" whereas the new behavior was creating "outer list containing
a single large inner list".

This difference is subtle but it matters.
Each of the inner lists become a datapoint with all the labels in
the list.

Later on, when [deleting labels](https://github.com/codecov/shared/blob/main/shared/reports/editable.py#L52)
we delete a datapoint if _any_ of the labels to delete is in it.

This means that the old behavior made the datapoints stick around
for longer.
I should probably investigate if we ever merge datapoints... not sure

Anyway, matching old behavior.
Addressing various review comments around renaming functions,
updating comments and small refactors.

Also updates shared again, where `_labels_index` was renamed to
`labels_index`, given that it's accessed all the time.

The bigest refactor here is around carrying forawrd the labels
index. As suggested we are skipping the `LabelsIndexService` and
directly using the `ArchiveService` to copy the file from
parent commit to child commit.
lables_index is used in the following scenarios:
1. merging reports during processing
2. label_analysis

In these cases we load it, use it, unload it.
Unloading includes saving it to storage again, and that is
how changes are persisted.

ONLY scenario 1 actually makes changes to it.
Now imagine that for some reason there's a label-analysis
request that takes a long time with the index open(unlikely)
and in that period an upload is processed (unlikely).
Then we would loose the updated index when label-analysis
re-writes it to storage.

To prevent that without a proper locking mechanism we can
just let label_analysis NOT save it back.
And because the merging scenario already runs inside a lock
we are fine :)
Changes the `shared` dependency to the version where the labels_index is inside the report.
Removes the `LabelsIndexService` and all code associated with it.
@giovanni-guidini
Copy link
Contributor Author

These changes are meant to replace #189

@codecov-qa
Copy link

codecov-qa bot commented Dec 4, 2023

Codecov Report

Merging #200 (ec947a4) into main (58c31fa) will increase coverage by 0.00%.
The diff coverage is 98.85%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #200    +/-   ##
========================================
  Coverage   98.34%   98.35%            
========================================
  Files         354      356     +2     
  Lines       27935    28485   +550     
========================================
+ Hits        27473    28015   +542     
- Misses        462      470     +8     
Flag Coverage Δ
integration 98.35% <98.85%> (+<0.01%) ⬆️
latest-uploader-overall 98.35% <98.85%> (+<0.01%) ⬆️
unit 98.35% <98.85%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.74% <95.34%> (-0.05%) ⬇️
OutsideTasks 98.14% <97.96%> (-0.02%) ⬇️
Files Coverage Δ
helpers/labels.py 100.00% <100.00%> (ø)
rollouts/__init__.py 100.00% <ø> (ø)
services/archive.py 93.44% <100.00%> (-0.68%) ⬇️
services/report/__init__.py 97.93% <100.00%> (+0.01%) ⬆️
services/report/languages/pycoverage.py 100.00% <100.00%> (ø)
...ces/report/languages/tests/unit/test_pycoverage.py 100.00% <ø> (ø)
...uages/tests/unit/test_pycoverage_encoded_labels.py 100.00% <100.00%> (ø)
services/report/tests/unit/test_report_builder.py 100.00% <ø> (ø)
...t/tests/unit/test_report_builder_encoded_labels.py 100.00% <100.00%> (ø)
services/report/tests/unit/test_sessions.py 100.00% <ø> (ø)
... and 8 more

Copy link

codecov bot commented Dec 4, 2023

Codecov Report

Merging #200 (ec947a4) into main (58c31fa) will increase coverage by 0.00%.
The diff coverage is 98.86%.

Changes have been made to critical files, which contain lines commonly executed in production. Learn more

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #200    +/-   ##
========================================
  Coverage   98.31%   98.31%            
========================================
  Files         383      387     +4     
  Lines       28580    29180   +600     
========================================
+ Hits        28098    28689   +591     
- Misses        482      491     +9     
Flag Coverage Δ
integration 98.35% <98.85%> (+<0.01%) ⬆️
latest-uploader-overall 98.35% <98.85%> (+<0.01%) ⬆️
onlysomelabels 98.31% <98.86%> (+0.05%) ⬆️
unit 98.35% <98.85%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.62% <95.34%> (-0.05%) ⬇️
OutsideTasks 98.14% <97.96%> (-0.02%) ⬇️
Files Coverage Δ
helpers/labels.py 100.00% <100.00%> (ø)
rollouts/__init__.py 100.00% <ø> (ø)
services/archive.py Critical 93.49% <100.00%> (-0.67%) ⬇️
services/report/__init__.py Critical 97.93% <100.00%> (+0.01%) ⬆️
services/report/languages/pycoverage.py 100.00% <100.00%> (ø)
...ces/report/languages/tests/unit/test_pycoverage.py 100.00% <100.00%> (ø)
...uages/tests/unit/test_pycoverage_encoded_labels.py 100.00% <100.00%> (ø)
services/report/tests/unit/test_report_builder.py 100.00% <ø> (ø)
...t/tests/unit/test_report_builder_encoded_labels.py 100.00% <100.00%> (ø)
services/report/tests/unit/test_sessions.py 100.00% <ø> (ø)
... and 8 more

... and 1 file with indirect coverage changes

Related Entrypoints
run/app.tasks.profiling.normalizer
run/app.tasks.pulls.Sync
run/app.tasks.upload.UploadProcessor
run/app.tasks.label_analysis.process_request

helpers/labels.py Outdated Show resolved Hide resolved
helpers/labels.py Outdated Show resolved Hide resolved
services/report/__init__.py Outdated Show resolved Hide resolved
services/report/__init__.py Outdated Show resolved Hide resolved
@giovanni-guidini giovanni-guidini force-pushed the gio/label-compressed/report-with-label-index branch from efa97e3 to da2a563 Compare December 6, 2023 13:55
@giovanni-guidini giovanni-guidini force-pushed the gio/label-compressed/report-with-label-index branch from da2a563 to ec947a4 Compare December 6, 2023 14:18
Copy link

codecov-public-qa bot commented Dec 6, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@58c31fa). Click here to learn what that means.
The diff coverage is 98.85%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #200   +/-   ##
=======================================
  Coverage        ?   98.35%           
=======================================
  Files           ?      356           
  Lines           ?    28485           
  Branches        ?        0           
=======================================
  Hits            ?    28015           
  Misses          ?      470           
  Partials        ?        0           
Flag Coverage Δ
integration 98.35% <98.85%> (?)
latest-uploader-overall 98.35% <98.85%> (?)
unit 98.35% <98.85%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.74% <95.34%> (?)
OutsideTasks 98.14% <97.96%> (?)
Files Coverage Δ
helpers/labels.py 100.00% <100.00%> (ø)
rollouts/__init__.py 100.00% <ø> (ø)
services/archive.py 93.44% <100.00%> (ø)
services/report/__init__.py 97.93% <100.00%> (ø)
services/report/languages/pycoverage.py 100.00% <100.00%> (ø)
...ces/report/languages/tests/unit/test_pycoverage.py 100.00% <ø> (ø)
...uages/tests/unit/test_pycoverage_encoded_labels.py 100.00% <100.00%> (ø)
services/report/tests/unit/test_report_builder.py 100.00% <ø> (ø)
...t/tests/unit/test_report_builder_encoded_labels.py 100.00% <100.00%> (ø)
services/report/tests/unit/test_sessions.py 100.00% <ø> (ø)
... and 8 more

@giovanni-guidini giovanni-guidini merged commit 455f7e4 into main Dec 6, 2023
22 of 26 checks passed
@giovanni-guidini giovanni-guidini deleted the gio/label-compressed/report-with-label-index branch December 6, 2023 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants