Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recreate base task metrics with prometheus #144

Merged
merged 9 commits into from
Oct 24, 2023

Conversation

matt-codecov
Copy link
Contributor

@matt-codecov matt-codecov commented Oct 12, 2023

depends on codecov/shared#64

by default, worker runs with 2 celery worker processes. we thus have to use prometheus's multiprocess mode to report our metrics. when deploying, we should set the $PROMETHEUS_MULTIPROC_DIR env var to something like /var/tmp/prometheus.

this PR mirrors the statsd base task metrics which include:

  • number of retries/success/failures per task
  • runtimes per task
  • total time in queue per task

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@codecov
Copy link

codecov bot commented Oct 19, 2023

Codecov Report

Merging #144 (9e4edd0) into main (b8ac0fc) will decrease coverage by 0.02%.
The diff coverage is 95.17%.

Changes have been made to critical files, which contain lines commonly executed in production. Learn more

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #144      +/-   ##
==========================================
- Coverage   98.37%   98.35%   -0.02%     
==========================================
  Files         374      374              
  Lines       27929    28048     +119     
==========================================
+ Hits        27475    27587     +112     
- Misses        454      461       +7     
Flag Coverage Δ
integration 98.39% <95.17%> (-0.03%) ⬇️
latest-uploader-overall 98.39% <95.17%> (-0.03%) ⬇️
unit 98.39% <95.17%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.81% <96.77%> (-0.01%) ⬇️
OutsideTasks 98.18% <88.46%> (-0.02%) ⬇️
Files Coverage Δ
tasks/backfill_commit_data_to_storage.py Critical 97.26% <100.00%> (-0.04%) ⬇️
tasks/base.py Critical 98.56% <100.00%> (+0.34%) ⬆️
tasks/brolly_stats_rollup.py 100.00% <100.00%> (ø)
tasks/commit_update.py 100.00% <100.00%> (ø)
tasks/compute_comparison.py 100.00% <100.00%> (ø)
tasks/delete_owner.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 100.00% <100.00%> (ø)
tasks/github_app_webhooks_check.py 100.00% <100.00%> (ø)
tasks/github_marketplace.py 98.01% <100.00%> (-0.02%) ⬇️
tasks/health_check.py 100.00% <100.00%> (ø)
... and 31 more

... and 1 file with indirect coverage changes

Related Entrypoints
run/app.tasks.pulls.Sync
run/app.tasks.notify.Notify
run/app.tasks.upload.Upload
run/app.tasks.profiling.normalizer
run/app.tasks.compute_comparison.ComputeComparison
run/app.tasks.upload.UploadFinisher
run/app.tasks.upload.UploadProcessor
run/app.tasks.static_analysis.check_suite
run/app.tasks.status.SetError
run/app.cron.profiling.findinguncollected
run/app.cron.hourly_check.HourlyCheckTask
run/app.tasks.sync_repos.SyncRepos
run/app.tasks.commit_update.CommitUpdate
run/app.tasks.label_analysis.process_request
run/app.tasks.new_user_activated.NewUserActivated
run/app.tasks.sync_teams.SyncTeams
run/app.tasks.timeseries.backfill_commits
run/app.tasks.plan.TrialExpirationTask
run/app.tasks.profiling.collection
run/app.tasks.profiling.summarization
run/app.cron.healthcheck.HealthCheckTask
run/app.cron.daily.GitHubAppWebhooksCheckTask
run/app.tasks.timeseries.backfill_dataset
run/app.cron.plan.TrialExpirationCronTask
run/app.tasks.archive.BackfillCommitDataToStorage
run/app.cron.daily.BrollyStatsRollupTask
run/app.tasks.timeseries.delete
run/app.tasks.sync_plans.SyncPlans

@codecov-staging
Copy link

codecov-staging bot commented Oct 19, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@b8ac0fc). Click here to learn what that means.
Report is 1 commits behind head on main.
The diff coverage is 95.17%.

❗ Current head 2692665 differs from pull request most recent head 9e4edd0. Consider uploading reports for the commit 9e4edd0 to get more accurate results

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #144   +/-   ##
=======================================
  Coverage        ?   98.40%           
=======================================
  Files           ?      346           
  Lines           ?    27100           
  Branches        ?        0           
=======================================
  Hits            ?    26669           
  Misses          ?      431           
  Partials        ?        0           
Flag Coverage Δ
integration 98.40% <95.17%> (?)
latest-uploader-overall 98.40% <95.17%> (?)
unit 98.40% <95.17%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.95% <96.77%> (?)
OutsideTasks 98.21% <88.46%> (?)
Files Coverage Δ
tasks/backfill_commit_data_to_storage.py 97.26% <100.00%> (ø)
tasks/base.py 98.56% <100.00%> (ø)
tasks/brolly_stats_rollup.py 100.00% <100.00%> (ø)
tasks/commit_update.py 100.00% <100.00%> (ø)
tasks/compute_comparison.py 100.00% <100.00%> (ø)
tasks/delete_owner.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 100.00% <100.00%> (ø)
tasks/github_app_webhooks_check.py 100.00% <100.00%> (ø)
tasks/github_marketplace.py 98.00% <100.00%> (ø)
tasks/health_check.py 100.00% <100.00%> (ø)
... and 31 more

@codecov-qa
Copy link

codecov-qa bot commented Oct 19, 2023

Codecov Report

Merging #144 (9e4edd0) into main (b8ac0fc) will decrease coverage by 0.03%.
The diff coverage is 95.17%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #144      +/-   ##
==========================================
- Coverage   98.41%   98.39%   -0.03%     
==========================================
  Files         348      348              
  Lines       27433    27464      +31     
==========================================
+ Hits        26998    27022      +24     
- Misses        435      442       +7     
Flag Coverage Δ
integration 98.39% <95.17%> (-0.03%) ⬇️
latest-uploader-overall 98.39% <95.17%> (-0.03%) ⬇️
unit 98.39% <95.17%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.89% <96.77%> (-0.03%) ⬇️
OutsideTasks 98.18% <88.46%> (-0.02%) ⬇️
Files Coverage Δ
tasks/backfill_commit_data_to_storage.py 97.26% <100.00%> (-0.04%) ⬇️
tasks/base.py 98.56% <100.00%> (+0.34%) ⬆️
tasks/brolly_stats_rollup.py 100.00% <100.00%> (ø)
tasks/commit_update.py 100.00% <100.00%> (ø)
tasks/compute_comparison.py 100.00% <100.00%> (ø)
tasks/delete_owner.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 100.00% <100.00%> (ø)
tasks/github_app_webhooks_check.py 100.00% <100.00%> (ø)
tasks/github_marketplace.py 98.00% <100.00%> (-0.02%) ⬇️
tasks/health_check.py 100.00% <100.00%> (ø)
... and 31 more

@codecov-public-qa
Copy link

codecov-public-qa bot commented Oct 19, 2023

Codecov Report

Merging #144 (9e4edd0) into main (b8ac0fc) will decrease coverage by 0.03%.
The diff coverage is 95.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #144      +/-   ##
==========================================
- Coverage   98.41%   98.39%   -0.03%     
==========================================
  Files         348      348              
  Lines       27433    27464      +31     
==========================================
+ Hits        26998    27022      +24     
- Misses        435      442       +7     
Flag Coverage Δ
integration 98.39% <95.17%> (-0.03%) ⬇️
latest-uploader-overall 98.39% <95.17%> (-0.03%) ⬇️
unit 98.39% <95.17%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.89% <96.77%> (-0.03%) ⬇️
OutsideTasks 98.18% <88.46%> (-0.02%) ⬇️
Files Coverage Δ
tasks/backfill_commit_data_to_storage.py 97.26% <100.00%> (-0.04%) ⬇️
tasks/base.py 98.56% <100.00%> (+0.34%) ⬆️
tasks/brolly_stats_rollup.py 100.00% <100.00%> (ø)
tasks/commit_update.py 100.00% <100.00%> (ø)
tasks/compute_comparison.py 100.00% <100.00%> (ø)
tasks/delete_owner.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 100.00% <100.00%> (ø)
tasks/github_app_webhooks_check.py 100.00% <100.00%> (ø)
tasks/github_marketplace.py 98.00% <100.00%> (-0.02%) ⬇️
tasks/health_check.py 100.00% <100.00%> (ø)
... and 31 more

@matt-codecov matt-codecov marked this pull request as ready for review October 20, 2023 00:09
)
TASK_TIME_IN_QUEUE = Histogram(
"worker_tasks_timers_time_in_queue_seconds",
"Time in {TODO} spent waiting in the queue before being run",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Time in {TODO}"?...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thank you! i am used to a linter that stops me from committing if there are TODO comments that aren't accompanied with a task number haha. i'll see if i can set that up with precommit. this was a stand-in for units, which i know now and will update before merging

@property
def metrics_prefix(self):
return f"worker.task.{self.name}"
def __init_subclass__(cls, name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this was a thing.... interesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i haven't gotten into metaclasses yet but i guess this is like metaclasses-lite?

another approach i could maybe have taken (assuming self.__class__ will refer to a subclass and not the base class) is something like:

global_run_counter = Counter('number of runs', 'docstring', labels=['task'])
global_retry_counter = Counter('number of retries', 'docstring', labels=['task'])
...

class PerTaskMetrics:
    def __init__(self, name):
        self.task_run_counter = global_run_counter.labels(task=name)
        self.task_retry_counter = global_retry_counter.labels(task=name)
        ...

def BaseTask:
    @property
    def metrics(self):
        # lazily initialize the per-task metrics
        if not hasattr(self.__class__, '_metrics'):
            self.__class__._metrics = PerTaskMetrics(self.name)
        return self.__class__._metrics

    async def apply_async(self, ...):
        self.metrics.task_run_counter.inc()
        ...

i don't really know what the tradeoffs are, but i have a weak preference for the approach taken in the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. the approach in the PR is quite elegant

@matt-codecov matt-codecov merged commit b9e2bf6 into main Oct 24, 2023
13 of 26 checks passed
@matt-codecov matt-codecov deleted the matt/add-prometheus-task-metrics branch October 24, 2023 19:49
Copy link

sentry-io bot commented Nov 1, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ ConnectionError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /app/install... app.tasks.compute_comparison.ComputeComparison View Issue
  • ‼️ ConnectionError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /app/install... app.tasks.upload.UploadProcessor View Issue
  • ‼️ ConnectionError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /app/install... app.tasks.pulls.Sync View Issue
  • ‼️ ConnectionError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /app/install... app.tasks.upload.Upload View Issue
  • ‼️ ConnectionError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /app/install... app.tasks.status.SetError View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants