Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(alerts): Add ratelimiting for metric alerts #58032

Merged
merged 15 commits into from
Oct 23, 2023

Conversation

isabellaenriquez
Copy link
Member

@isabellaenriquez isabellaenriquez commented Oct 12, 2023

Ratelimiting is added (under an options-backed feature flag) to ensure metric alerts don't fire more than every 10 minutes. Action item from INC-454.

Closes #54730

@isabellaenriquez isabellaenriquez self-assigned this Oct 12, 2023
@isabellaenriquez isabellaenriquez changed the title feat(metric-alerts): Add ratelimiting feat(alerts): Add ratelimiting for metric alerts Oct 12, 2023
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 12, 2023
@codecov
Copy link

codecov bot commented Oct 12, 2023

Codecov Report

Merging #58032 (119ab22) into master (e88e83c) will increase coverage by 0.02%.
Report is 146 commits behind head on master.
The diff coverage is 100.00%.

❗ Current head 119ab22 differs from pull request most recent head f56d3e2. Consider uploading reports for the commit f56d3e2 to get more accurate results

@@            Coverage Diff             @@
##           master   #58032      +/-   ##
==========================================
+ Coverage   79.03%   79.06%   +0.02%     
==========================================
  Files        5131     5138       +7     
  Lines      223537   223607      +70     
  Branches    37640    37650      +10     
==========================================
+ Hits       176678   176784     +106     
+ Misses      41188    41158      -30     
+ Partials     5671     5665       -6     
Files Coverage Δ
src/sentry/incidents/subscription_processor.py 89.41% <100.00%> (+0.24%) ⬆️
src/sentry/options/defaults.py 100.00% <100.00%> (ø)

... and 21 files with indirect coverage changes

@scttcper
Copy link
Member

this feels like something that will need to be on the docs since before it could fire every minute

Copy link
Member

@JoshFerge JoshFerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this is a very sensitive change, could we consider putting it behind an options backed feature flag? so if something goes wrong we can simply turn off? and test internally first?

[project.id for project in last_incident.projects.all()] if last_incident else []
)
minutes_since_last_incident = (
(timezone.now() - last_incident.date_added).seconds / 60 if last_incident else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I slightly prefer using timedelta to raw second math

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate what you mean by that? i believe the .seconds is a timedelta property

Copy link
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking generally good to me. Since we need indexes here I'd recommend making a separate pr for them and merging that pr first


# If an incident was created for this rule, trigger type, and subscription
# within the last 10 minutes, don't make another one
last_incident: Incident | None = trigger.triggered_incidents.order_by("-date_added").first()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to add an index on IncidentTrigger on alert_rule_trigger, date_added as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand the use of an additional index here. i know it would've been helpful had we kept the original approach with Incident.objects.filter(...), but since we use trigger.triggered_incidents.order_by("-date_added").first() to get last_incident, how does an additional index help us make this more efficent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sorry, I misread the code and thought it was sorting on date_added in IncidentTrigger, but it's in Incident

We might need an index on the Incident table though... (id, date_added)

Here's what this query will look like

SELECT *
FROM "sentry_incident" 
INNER JOIN "sentry_incidenttrigger" ON ("sentry_incident"."id" = "sentry_incidenttrigger"."incident_id") 
WHERE "sentry_incidenttrigger"."alert_rule_trigger_id" = 1 
ORDER BY "sentry_incident"."date_added" DESC

isabellaenriquez added a commit that referenced this pull request Oct 17, 2023
This PR adds a new index to `Incident` to make querying for the metric
alert rate limiting feature more efficient (#58032).
isabellaenriquez added a commit that referenced this pull request Oct 19, 2023
This PR gets rid of the index on `id, date_added`. This was originally
added in #58207 to make a query in #58032 more efficient, however, after
it was added, we determined that it was still really slow. We'll be
using a different approach, so this one is no longer needed.
@wedamija
Copy link
Member

Just a note to not merge this until the index is created

isabellaenriquez added a commit that referenced this pull request Oct 20, 2023
@isabellaenriquez isabellaenriquez merged commit 03e3b04 into master Oct 23, 2023
50 checks passed
@isabellaenriquez isabellaenriquez deleted the isabella/metric-alert-rate-limit branch October 23, 2023 20:18
@github-actions github-actions bot locked and limited conversation to collaborators Nov 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INC-454 Metric Alerts Ratelimiting
5 participants