-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(issue-platform): process extra fields from event #3571
feat(issue-platform): process extra fields from event #3571
Conversation
Codecov ReportBase: 92.23% // Head: 92.44% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #3571 +/- ##
==========================================
+ Coverage 92.23% 92.44% +0.20%
==========================================
Files 724 725 +1
Lines 33772 33779 +7
==========================================
+ Hits 31151 31228 +77
+ Misses 2621 2551 -70
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -61,10 +82,10 @@ class SearchIssueEvent(TypedDict, total=False): | |||
organization_id: int | |||
project_id: int | |||
event_id: str | |||
group_id: int # backwards compatibility | |||
group_ids: Sequence[int] | |||
group_id: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changing to singular group_id
now? I thought we'd always have multiple ids for non-error issue stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had the same discussion with Dan. An occurrence event will be sent for each group(s) that resulted in processing that event. So if multiple Issues are detected for a single issue, then multiple events will be sent to the eventstream.
@@ -77,6 +98,13 @@ def ensure_uuid(value: str) -> str: | |||
class SearchIssuesMessageProcessor(DatasetMessageProcessor): | |||
FINGERPRINTS_HARD_LIMIT_SIZE = 100 | |||
|
|||
PROMOTED_TAGS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does promoted mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are 'special' tag values we lift directly from tags and place them in top-level columns. I basically took this pattern from how transactions was written. I presume this was done for performance - so instead of indexing into the tags array, we do some special processing when ingesting events and normalize it into its own specific column.
) -> None: | ||
tags_maybe = event_data.get("tags", None) | ||
if not tags_maybe: | ||
processed["tags.key"], processed["tags.value"] = [], [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should return
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had it return early before, but then the extra params won't get processed:
processed["release"] = promoted_tags.get(
"sentry:release",
event_data.get("release"),
)
So events like:
{
"project_id": 1,
"organization_id": 2,
"group_id": 3,
"event_id": str(uuid.uuid4()),
"retention_days": 90,
"primary_hash": str(uuid.uuid4()),
"datetime": datetime.utcnow().isoformat() + "Z",
"platform": "other",
"data": {
"tags": {}
"release": "[email protected]"
}
...
}
would get skipped entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK but you're processing the tags twice then. Why call extract_extra_tags
if there are no tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, right. I rearranged process_tags
to be a bit more readable and to avoid doing dupe work.
processed["http_method"] = http_data["http_method"] | ||
processed["http_referer"] = http_data["http_referer"] | ||
|
||
def _process_sdk_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase that question: do you have a sdk_name
and sdk_version
in your table? If not, then this function is adding columns that are just being ignored when writing to Clickhouse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we do:
snuba/snuba/datasets/configuration/issues/entities/search_issues.yaml
Lines 41 to 42 in 8fad778
{ name: sdk_name, type: String, args: { schema_modifiers: [ low_cardinality, nullable ] } }, | |
{ name: sdk_version, type: String, args: { schema_modifiers: [ low_cardinality, nullable ] } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool, just making sure.
@@ -65,6 +65,14 @@ schema: | |||
readable_storage: search_issues | |||
writable_storage: search_issues | |||
query_processors: | |||
- processor: TimeSeriesProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to deprecate this processor, so I would challenge whether this is necessary on new datasets. It provides two functions:
- Allow syntactic sugar for grouping to a particular time range (
toStartOfDay(...)
). This is easily doable using the Snuba SDK, so it isn't necessary for Snuba to do it. This functionality also tends to cause confusion because users don't understand why one date-based column works differently from another. This has come up many times withtime
vs.timestamp
columns in discover. - Convert conditions on the
time_parse_columns
to use datetimes instead of strings. This is a legacy problem. Before SnQL, everything was strings, there was no way to tell Snuba that a certain string was supposed to be a datetime (timestamp >= '2011...'
). SnQL allows conditions on datetimes so you should never be sending conditions using strings anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I recall you mentioning before about TimeSeriesProcessor
. One of the legacy queries I was trying to get working looks to be grouping on time
. Let me take a crack at porting that bit over. Appreciate the clarification.
) -> None: | ||
tags_maybe = event_data.get("tags", None) | ||
if not tags_maybe: | ||
processed["tags.key"], processed["tags.value"] = [], [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK but you're processing the tags twice then. Why call extract_extra_tags
if there are no tags?
processed["http_method"] = http_data["http_method"] | ||
processed["http_referer"] = http_data["http_referer"] | ||
|
||
def _process_sdk_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase that question: do you have a sdk_name
and sdk_version
in your table? If not, then this function is adding columns that are just being ignored when writing to Clickhouse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me.
…or Issue stats (#42865) Tests will fail until the following PRs are merged in: * getsentry/snuba#3571 * getsentry/snuba#3575 Resolves #42038
This PR extracts additional field values from the event payload so it can be used for querying.