Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Org stats for accepted message counts stopped showing up after 24.3.0 (self-hosted) #67849

Closed
caseyduquettesc opened this issue Mar 28, 2024 · 8 comments

Comments

@caseyduquettesc
Copy link

Environment

self-hosted (https://develop.sentry.dev/self-hosted/)

Steps to Reproduce

Upgraded from 23.6.2 to 24.3.0 and everything seems to be working fine. Errors and transactions show up. Alerts fire. However, as an administrator, I've lost visibility into how many events are coming in to Sentry.

image image

24.3.0 seems to only show the number of dropped events.

I found #55786, which gave me a pointer on where to look. I checked our outcomes table in clickhouse and it had no recent rows with outcome = 0, which I expect is the successful outcome. I checked when the last accepted outcome was and it came back as just before the upgrade.

SELECT *
FROM outcomes_hourly_dist
ORDER BY timestamp DESC
LIMIT 1000

Query id: 5b887ecb-beb8-4245-98e7-77470835dd8d

┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─category─┬─outcome─┬─reason───────────────┬─quantity─┬─times_seen─┬─bytes_received─┐
│      0 │        112 │      0 │ 2024-03-28 06:00:00 │        1 │       3 │ too_large            │        2 │          2 │              0 │
│      0 │         33 │      0 │ 2024-03-28 06:00:00 │        1 │       2 │ key_quota            │        1 │          1 │              0 │
│      0 │         33 │      0 │ 2024-03-28 06:00:00 │        1 │       2 │ key_quota            │        1 │          1 │              0 │
│      0 │         33 │      0 │ 2024-03-28 06:00:00 │        1 │       2 │ key_quota            │       20 │          6 │              0 │
│      1 │        217 │    224 │ 2024-03-28 06:00:00 │        5 │       5 │ network_error        │        1 │          1 │              0 │
│      1 │        209 │    216 │ 2024-03-28 06:00:00 │        2 │       3 │ too_large            │        1 │          1 │              0 │
│      1 │        201 │    208 │ 2024-03-28 06:00:00 │        1 │       1 │ web-crawlers         │        1 │          1 │              0 │
│      1 │        209 │    216 │ 2024-03-28 06:00:00 │        2 │       3 │ too_large            │        1 │          1 │              0 │
│      1 │         25 │     26 │ 2024-03-28 06:00:00 │        2 │       3 │ invalid_transaction  │        1 │          1 │              0 │
│      1 │         50 │     55 │ 2024-03-28 06:00:00 │        1 │       5 │ network_error        │        2 │          2 │              0 │
│      1 │        166 │    172 │ 2024-03-28 06:00:00 │        1 │       5 │ network_error        │        1 │          1 │              0 │
│      1 │          1 │      0 │ 2024-03-28 06:00:00 │       10 │       3 │ invalid_monitor      │       20 │         20 │              0 │
│      1 │        209 │    216 │ 2024-03-28 06:00:00 │        1 │       5 │ event_processor      │      198 │         51 │              0 │

SELECT *
FROM outcomes_raw_dist
WHERE outcome = 0
ORDER BY timestamp DESC
LIMIT 100

Query id: 29538f8f-b892-4e3a-b769-2f4a23343595

┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─outcome─┬─reason─┬─event_id─────────────────────────────┬─quantity─┬─category─┬─size─┐
│      1 │        215 │    222 │ 2024-03-27 20:38:04 │       0 │ ᴺᵁᴸᴸ   │ 2adcf6bf-fde6-4c91-9d62-d07f2db46e88 │        1 │        1 │    0 │
│      1 │        227 │    234 │ 2024-03-27 20:38:04 │       0 │ ᴺᵁᴸᴸ   │ d8811d73-9476-44fc-88a5-b4cbbdaefbf5 │        1 │        2 │    0 │
│      1 │         33 │     36 │ 2024-03-27 20:38:04 │       0 │ ᴺᵁᴸᴸ   │ a845bfd9-2df0-035f-01e0-636c69ad66d6 │        1 │        1 │    0 │
│      1 │        108 │    113 │ 2024-03-27 20:38:04 │       0 │ ᴺᵁᴸᴸ   │ 56999b04-2363-4a82-a7b0-e45441fee795 │        1 │        2 │    0 │
│      1 │        201 │    208 │ 2024-03-27 20:38:04 │       0 │ ᴺᵁᴸᴸ   │ b2d48a4a-8a08-4230-b380-efb4279ad044 │        1 │        1 │    0 │
│      1 │        108 │    113 │ 2024-03-27 20:38:04 │       0 │ ᴺᵁᴸᴸ   │ 303d7581-7529-4217-8cad-ec6deca4392f │        1 │        2 │    0 │
│      1 │        231 │    239 │ 2024-03-27 20:38:03 │       0 │ ᴺᵁᴸᴸ   │ 2f7d6b6c-c8c4-4cb1-bf34-acae6a41095c │        1 │        2 │    0 │

Then I tailed the outcomes kafka topic and data was flowing in, but all the messages had a non-zero outcome.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic outcomes --offset 16204899386 --partition 0

Was there a change in Relay to stop emitting accepted outcomes? And if not, what conditions could cause Relay not to report accepted?

Expected Result

To see accepted and dropped counts in the org stats page.

Actual Result

Only see non-accepted counts

Product Area

Stats

Link

No response

DSN

No response

Version

24.3.0

@getsantry
Copy link
Contributor

getsantry bot commented Mar 28, 2024

Assigning to @getsentry/support for routing ⏲️

@getsantry
Copy link
Contributor

getsantry bot commented Mar 28, 2024

Routing to @getsentry/product-owners-settings-relay for triage ⏲️

@Dav1dde
Copy link
Member

Dav1dde commented Mar 29, 2024

Relay only emits negative (non-accepted) outcomes, it can't emit accepted because there are more decisions taken later on the pipeline whether an event stays or is dropped. So what you're seeing is probably only Outcomes emitted from Relay (negative, like dropped events).

If I had to randomly guess one of your Sentry consumers isn't running.

Let's see if we can get this routed to someone who can help.

@getsantry
Copy link
Contributor

getsantry bot commented Mar 29, 2024

Routing to @getsentry/product-owners-stats for triage ⏲️

@getsantry
Copy link
Contributor

getsantry bot commented Mar 29, 2024

Routing to @getsentry/product-owners-ingestion-and-filtering for triage ⏲️

@caseyduquettesc
Copy link
Author

caseyduquettesc commented Mar 29, 2024

Figured it out. As accepted outcomes are now considered billing outcomes, they're published to outcomes-billing, but I had nothing consuming from that topic.

    def is_billing(self) -> bool:
        return self in (Outcome.ACCEPTED, Outcome.RATE_LIMITED)

The 24.3.0 release has no such consumer (https://github.com/getsentry/self-hosted/blob/24.3.0/docker-compose.yml), but I noticed one was added after the release in getsentry/self-hosted#2909

I started up the snuba-outcomes-billing-consumer and data started populating.

@caseyduquettesc caseyduquettesc changed the title Relay (self-hosted) stopped emitting accepted outcomes Org stats for accepted message counts stopped showing up after 24.3.0 Mar 29, 2024
@caseyduquettesc caseyduquettesc changed the title Org stats for accepted message counts stopped showing up after 24.3.0 Org stats for accepted message counts stopped showing up after 24.3.0 (self-hosted) Mar 29, 2024
@jjbayer
Copy link
Member

jjbayer commented Apr 2, 2024

@caseyduquettesc thank you for figuring this out!

@github-actions github-actions bot locked and limited conversation to collaborators Apr 17, 2024
@hubertdeng123
Copy link
Member

Great job on digging through to find this. This was some sort of regression that was introduced that we did not realize until the bug report was submitted. We then notified the team involved, and submitted a fix for that. For the future, we're looking to improve end to end testing in order to catch these issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

7 participants