Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - encode labels into index #180

Merged
merged 1 commit into from
Dec 6, 2023
Merged

WIP - encode labels into index #180

merged 1 commit into from
Dec 6, 2023

Conversation

giovanni-guidini
Copy link
Contributor

This is a lot of code, and I'll probably break this PR in several others. I'm looking for early feedback here. It's a lot of code and a lot of changes.

This depends on the changes in codecov/shared#79. Check that PR for context.

New things

  • services/report/labels_index.py houses the LabelIndexService. This little bit of code controls read/write in the label index file (that is separate from chunks.txt). It does the loading and unloading of the Report._labels_index as needed.
  • MinioEndpoints.label_index and functions to write-to and read-from the file. There can be 1 such file for each CommitReport.
  • services/report/raw_upload_processor.py::make_sure_orginal_report_is_using_label_ids encodes the labels in a Report prior to merging it with another report (if needed)
  • services/report/raw_upload_processor.py::make_sure_label_indexes_match "fixes" the index of the new report to be merged based on the original report's index so that we can be sure that both indexes point to the same labels.

Important changes

  • pycoverage.py language processor now spits out reports with a label lookup table and CoverageDatapoint already encoded
  • The processing of raw reports with labels has extra steps (make_sure_orginal_report_is_using_label_ids, make_sure_label_indexes_match)
  • The label_analysis task works with label_ids and only translates them in the end.

Things missing

  • Feature flag changes. I just changes everything to the "ideal" end state, but we'd have issues deploying this with different worker deploys co-existing
  • label_analysis has to work if the report is not encoded with IDs as well.

Updates shared to the version where `Report` has a `_labels_index` and makes use of them in processing.
@giovanni-guidini giovanni-guidini marked this pull request as ready for review November 21, 2023 14:24
giovanni-guidini added a commit that referenced this pull request Nov 21, 2023
context: codecov/engineering-team #768

Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild.
Notice that currently the label compression does nothing.
There are comments similar to `TODO: needs shared update`.
The update in question is codecov/shared#79

So these changes mostly prep the terrain for future changes that will actually do something,
and add the guardrails to avoid issues when deploying.
In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`,
passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur.

If you're curious as to what the end result will probably look like see #180
@giovanni-guidini
Copy link
Contributor Author

DON'T MERGE
This is a reference guide for the changes that will happen. It's the "end state" we're looking for. But I will try to separate the changes into slightly smaller PRs that are more digestible and easier to review. I will also add rollout capabilities for a safer rollout.

giovanni-guidini added a commit that referenced this pull request Nov 21, 2023
context: codecov/engineering-team #768

Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild.
Notice that currently the label compression does nothing.
There are comments similar to `TODO: needs shared update`.
The update in question is codecov/shared#79

So these changes mostly prep the terrain for future changes that will actually do something,
and add the guardrails to avoid issues when deploying.
In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`,
passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur.

If you're curious as to what the end result will probably look like see #180
giovanni-guidini added a commit that referenced this pull request Nov 24, 2023
context: codecov/engineering-team #768

Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild.
Notice that currently the label compression does nothing.
There are comments similar to `TODO: needs shared update`.
The update in question is codecov/shared#79

So these changes mostly prep the terrain for future changes that will actually do something,
and add the guardrails to avoid issues when deploying.
In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`,
passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur.

If you're curious as to what the end result will probably look like see #180
giovanni-guidini added a commit that referenced this pull request Nov 29, 2023
context: codecov/engineering-team #768

Creates a rollout for label compression. This is gonna help us to test and safely release the feature in the wild.
Notice that currently the label compression does nothing.
There are comments similar to `TODO: needs shared update`.
The update in question is codecov/shared#79

So these changes mostly prep the terrain for future changes that will actually do something,
and add the guardrails to avoid issues when deploying.
In particular it adds some helper methods to the `ArchiveService`, creates a kinda-stubbed `LabelsIndexService`,
passes more data to the `_adjust_sessions` portion of `raw_upload_processor` where most changes will occur.

If you're curious as to what the end result will probably look like see #180
@giovanni-guidini giovanni-guidini merged commit e495820 into main Dec 6, 2023
2 checks passed
@giovanni-guidini giovanni-guidini deleted the gio/label-index branch December 6, 2023 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant