feat(issue-platform): process extra fields from event #3571

barkbarkimashark · 2023-01-05T17:46:15Z

This PR extracts additional field values from the event payload so it can be used for querying.

…rmat add tag mapping promoter

codecov-commenter · 2023-01-05T17:50:41Z

Codecov Report

Base: 92.23% // Head: 92.44% // Increases project coverage by +0.20% 🎉

Coverage data is based on head (85d4e64) compared to base (66197e5).
Patch coverage: 99.32% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3571      +/-   ##
==========================================
+ Coverage   92.23%   92.44%   +0.20%     
==========================================
  Files         724      725       +1     
  Lines       33772    33779       +7     
==========================================
+ Hits        31151    31228      +77     
+ Misses       2621     2551      -70

Impacted Files	Coverage Δ
tests/test_search_issues_api.py	`100.00% <ø> (ø)`
tests/datasets/test_search_issues_processor.py	`99.34% <99.11%> (+1.06%)`	⬆️
...uba/datasets/processors/search_issues_processor.py	`93.38% <100.00%> (+3.03%)`	⬆️
snuba/consumers/strategy_factory.py	`79.59% <0.00%> (-11.15%)`	⬇️
snuba/web/views.py	`69.64% <0.00%> (-0.25%)`	⬇️
snuba/clickhouse/formatter/expression.py	`94.57% <0.00%> (-0.25%)`	⬇️
snuba/cli/consumer.py	`94.44% <0.00%> (-0.11%)`	⬇️
snuba/web/query.py	`98.00% <0.00%> (-0.06%)`	⬇️
snuba/clickhouse/formatter/query.py	`98.76% <0.00%> (-0.05%)`	⬇️
snuba/datasets/table_storage.py	`94.65% <0.00%> (-0.05%)`	⬇️
... and 14 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

snuba/datasets/processors/search_issues_processor.py

ceorourke · 2023-01-05T19:25:38Z

snuba/datasets/processors/search_issues_processor.py

@@ -61,10 +82,10 @@ class SearchIssueEvent(TypedDict, total=False):
    organization_id: int
    project_id: int
    event_id: str
-    group_id: int  # backwards compatibility
-    group_ids: Sequence[int]
+    group_id: int


Why is this changing to singular group_id now? I thought we'd always have multiple ids for non-error issue stuff.

Yeah, I had the same discussion with Dan. An occurrence event will be sent for each group(s) that resulted in processing that event. So if multiple Issues are detected for a single issue, then multiple events will be sent to the eventstream.

snuba/datasets/processors/search_issues_processor.py

ceorourke · 2023-01-05T19:26:54Z

snuba/datasets/processors/search_issues_processor.py

@@ -77,6 +98,13 @@ def ensure_uuid(value: str) -> str:
 class SearchIssuesMessageProcessor(DatasetMessageProcessor):
    FINGERPRINTS_HARD_LIMIT_SIZE = 100

+    PROMOTED_TAGS = {


What does promoted mean in this context?

These are 'special' tag values we lift directly from tags and place them in top-level columns. I basically took this pattern from how transactions was written. I presume this was done for performance - so instead of indexing into the tags array, we do some special processing when ingesting events and normalize it into its own specific column.

evanh · 2023-01-05T21:24:44Z

snuba/datasets/processors/search_issues_processor.py

+    ) -> None:
+        tags_maybe = event_data.get("tags", None)
+        if not tags_maybe:
+            processed["tags.key"], processed["tags.value"] = [], []


I think you should return here.

I actually had it return early before, but then the extra params won't get processed:

processed["release"] = promoted_tags.get( "sentry:release", event_data.get("release"), )

So events like:

{ "project_id": 1, "organization_id": 2, "group_id": 3, "event_id": str(uuid.uuid4()), "retention_days": 90, "primary_hash": str(uuid.uuid4()), "datetime": datetime.utcnow().isoformat() + "Z", "platform": "other", "data": { "tags": {} "release": "[email protected]" } ... }

would get skipped entirely.

OK but you're processing the tags twice then. Why call extract_extra_tags if there are no tags?

Ahh, right. I rearranged process_tags to be a bit more readable and to avoid doing dupe work.

evanh · 2023-01-05T21:25:43Z

snuba/datasets/processors/search_issues_processor.py

+        processed["http_method"] = http_data["http_method"]
+        processed["http_referer"] = http_data["http_referer"]
+
+    def _process_sdk_data(


Do you need this data?

Not totally sure, but I think they're needed to support searching like this?

Let me rephrase that question: do you have a sdk_name and sdk_version in your table? If not, then this function is adding columns that are just being ignored when writing to Clickhouse.

Yeah we do:

snuba/snuba/datasets/configuration/issues/entities/search_issues.yaml

Lines 41 to 42 in 8fad778

{ name: sdk_name, type: String, args: { schema_modifiers: [ low_cardinality, nullable ] } },

{ name: sdk_version, type: String, args: { schema_modifiers: [ low_cardinality, nullable ] } },

OK cool, just making sure.

snuba/datasets/processors/search_issues_processor.py

evanh · 2023-01-06T17:26:02Z

snuba/datasets/configuration/issues/entities/search_issues.yaml

@@ -65,6 +65,14 @@ schema:
 readable_storage: search_issues
 writable_storage: search_issues
 query_processors:
+  - processor: TimeSeriesProcessor


I am trying to deprecate this processor, so I would challenge whether this is necessary on new datasets. It provides two functions:

Allow syntactic sugar for grouping to a particular time range (toStartOfDay(...)). This is easily doable using the Snuba SDK, so it isn't necessary for Snuba to do it. This functionality also tends to cause confusion because users don't understand why one date-based column works differently from another. This has come up many times with time vs. timestamp columns in discover.

Convert conditions on the time_parse_columns to use datetimes instead of strings. This is a legacy problem. Before SnQL, everything was strings, there was no way to tell Snuba that a certain string was supposed to be a datetime (timestamp >= '2011...'). SnQL allows conditions on datetimes so you should never be sending conditions using strings anyways.

Fair enough. I recall you mentioning before about TimeSeriesProcessor. One of the legacy queries I was trying to get working looks to be grouping on time. Let me take a crack at porting that bit over. Appreciate the clarification.

evanh · 2023-01-06T17:27:59Z

snuba/datasets/processors/search_issues_processor.py

+    ) -> None:
+        tags_maybe = event_data.get("tags", None)
+        if not tags_maybe:
+            processed["tags.key"], processed["tags.value"] = [], []


OK but you're processing the tags twice then. Why call extract_extra_tags if there are no tags?

evanh · 2023-01-06T17:30:09Z

snuba/datasets/processors/search_issues_processor.py

+        processed["http_method"] = http_data["http_method"]
+        processed["http_referer"] = http_data["http_referer"]
+
+    def _process_sdk_data(


Let me rephrase that question: do you have a sdk_name and sdk_version in your table? If not, then this function is adding columns that are just being ignored when writing to Clickhouse.

evanh

This looks fine to me.

…or Issue stats (#42865) Tests will fail until the following PRs are merged in: * getsentry/snuba#3571 * getsentry/snuba#3575 Resolves #42038

barkbarkimashark added 5 commits December 29, 2022 15:50

clean-up eventstream data formats. Add comments for expected event fo…

68eb10f

…rmat add tag mapping promoter

more correct data mappings

d97abb7

event format

7db874d

extract tags, and promoted fields

e0e7136

fix test

a93b773

barkbarkimashark requested a review from a team January 5, 2023 17:46

barkbarkimashark requested a review from a team as a code owner January 5, 2023 17:46

barkbarkimashark mentioned this pull request Jan 5, 2023

Extract out extra fields in new dataset #3573

Closed

ceorourke reviewed Jan 5, 2023

View reviewed changes

snuba/datasets/processors/search_issues_processor.py Outdated Show resolved Hide resolved

ceorourke reviewed Jan 5, 2023

View reviewed changes

snuba/datasets/processors/search_issues_processor.py Show resolved Hide resolved

ceorourke reviewed Jan 5, 2023

View reviewed changes

evanh reviewed Jan 5, 2023

View reviewed changes

ceorourke mentioned this pull request Jan 5, 2023

Allow tagstore to query issue platform issues getsentry/sentry#42589

Closed

add TimeSeriersProcessor to be able to translate time columns

8fad778

barkbarkimashark mentioned this pull request Jan 6, 2023

feat(issue-platform): update GroupSerializers et. al to query snuba for Issue stats getsentry/sentry#42865

Merged

add TODO comment for contexts.trace.trace_id value

2383f25

rjo100 reviewed Jan 6, 2023

View reviewed changes

snuba/datasets/processors/search_issues_processor.py Outdated Show resolved Hide resolved

snuba/datasets/processors/search_issues_processor.py Outdated Show resolved Hide resolved

evanh requested changes Jan 6, 2023

View reviewed changes

barkbarkimashark added 2 commits January 6, 2023 11:18

remove TimeSeriesProcessor

4aba6f3

rearrange process_tags to be more readable

85d4e64

barkbarkimashark requested review from evanh, rjo100 and ceorourke January 6, 2023 19:50

evanh approved these changes Jan 6, 2023

View reviewed changes

barkbarkimashark merged commit 6bc04b9 into master Jan 6, 2023

barkbarkimashark deleted the sharky/search-issues-processing-extra-fields branch January 6, 2023 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(issue-platform): process extra fields from event #3571

feat(issue-platform): process extra fields from event #3571

barkbarkimashark commented Jan 5, 2023

codecov-commenter commented Jan 5, 2023 •

edited

Loading

ceorourke Jan 5, 2023

barkbarkimashark Jan 6, 2023

ceorourke Jan 5, 2023

barkbarkimashark Jan 6, 2023 •

edited

Loading

evanh Jan 5, 2023

barkbarkimashark Jan 6, 2023

evanh Jan 6, 2023

barkbarkimashark Jan 6, 2023

evanh Jan 5, 2023

barkbarkimashark Jan 6, 2023

evanh Jan 6, 2023

barkbarkimashark Jan 6, 2023

evanh Jan 6, 2023

evanh Jan 6, 2023

barkbarkimashark Jan 6, 2023

evanh Jan 6, 2023

evanh Jan 6, 2023

evanh left a comment

	{ name: sdk_name, type: String, args: { schema_modifiers: [ low_cardinality, nullable ] } },
	{ name: sdk_version, type: String, args: { schema_modifiers: [ low_cardinality, nullable ] } },

feat(issue-platform): process extra fields from event #3571

feat(issue-platform): process extra fields from event #3571

Conversation

barkbarkimashark commented Jan 5, 2023

codecov-commenter commented Jan 5, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barkbarkimashark Jan 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evanh left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 5, 2023 •

edited

Loading

barkbarkimashark Jan 6, 2023 •

edited

Loading