-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mep): subscript fix for multiple value arrays #2914
feat(mep): subscript fix for multiple value arrays #2914
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2914 +/- ##
=======================================
Coverage 92.90% 92.90%
=======================================
Files 632 632
Lines 29100 29137 +37
=======================================
+ Hits 27034 27071 +37
Misses 2066 2066
Continue to review full report at Codecov.
|
…or-multiple-value-arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments inline
@@ -135,6 +135,7 @@ class SubscriptableMapper(SubscriptableReferenceMapper): | |||
from_column_name: str | |||
to_nested_col_table: Optional[str] | |||
to_nested_col_name: str | |||
value_subcolumn_name: str = "value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is more work you will need to do to make this work properly.
The mapper (here) transform the SubscriptableReference into a Clickhouse expression and it is managed well here.
The problem is that we also have mappers that identify the Clickhouse expression after the translation and extract parts of them for optimization. In some cases they reference explicitly the .value
column.
Specifically you want to find for usages of this pattern:
https://github.com/getsentry/snuba/blob/master/snuba/clickhouse/translators/snuba/mappers.py#L241-L256
And see if anybody uses this parameter VALUE_COL_MAPPING_PARAM
, which contains the name of the value column.
I think I found only this place that relies on the column to be called value
https://github.com/getsentry/snuba/blob/master/snuba/query/processors/mapping_optimizer.py#L171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this I also noticed that we're not applying this MappingOptimizer on release health metrics. That seems like an oversight, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified that the only hardcoding of .value
was in the file you pointed out (at least as far as uses of mapping_pattern
is concerned) and fixed it.
snuba/query/snql/parser.py
Outdated
@@ -185,7 +185,7 @@ | |||
null_literal = ~r"NULL"i | |||
subscriptable = column_name open_square column_name close_square | |||
column_name = ~r"[a-zA-Z_][a-zA-Z0-9_\.:]*" | |||
tag_column = "tags" open_square tag_name close_square | |||
tag_column = ("tags_raw"/"tags") open_square tag_name close_square |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is a quite fragile approach.
@evanh do you remember why we rely on the subscriptable columns to be called tags
in parsing? it seems that anything[anything_else]
should always parse as a SubscriptableReference.
Also how does it works for context[.....]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I didn't love this either but parsimonious was blowing up on (imo) valid snql queries until I added the special workaround
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We delineate between tag and subscriptable columns, but they are both valid. You can see subscriptable
as an expression two lines above this. That's how context
and any other subscriptable should work. We treat tags
expressions differently from other subscriptable calls, which is why I created a separate expression for it.
@onewland What was the problem you ran into when you didn't add this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this parse error:
2022-07-07 10:21:17,064 Invalid query
Traceback (most recent call last):
File "/Users/oliver/workplace/snuba/./snuba/query/snql/parser.py", line 919, in parse_snql_query_initial
exp_tree = snql_grammar.parse(body)
File "/Users/oliver/workplace/snuba/.venv/lib/python3.8/site-packages/parsimonious/grammar.py", line 115, in parse
return self.default_rule.parse(text, pos=pos)
File "/Users/oliver/workplace/snuba/.venv/lib/python3.8/site-packages/parsimonious/expressions.py", line 122, in parse
raise IncompleteParseError(text, node.end, self)
parsimonious.exceptions.IncompleteParseError: Rule 'query_exp' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with 'AND tags_raw[5] = 'h' (line 1, column 175).
for the following query:
MATCH (generic_metrics_distributions) SELECT min(value) AS total_seconds, arrayJoin(tags.raw_value) BY project_id, org_id, tags.raw_value WHERE org_id = 1 AND project_id = 2 AND tags_raw[5] = 'h' AND metric_id = 8 AND granularity = 2 AND timestamp >= toDateTime('2022-06-01T19:00:00') AND timestamp < toDateTime('2022-06-30T20:00:00')
and I could substitute any value for tags_raw
and get the same error.
I think the issue is that column_name in subscriptable
can't be purely a number/literal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind if you want to make the column_name
in the inner brackets support a numeric value as well.
subscriptable = column_name open_square subscriptable_key close_square
subscriptable_key = column_name / integer_string
integer_string = ~r"[0-9]+"
You'll have to add visit_
functions for those as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We delineate between tag and subscriptable columns, but they are both valid. You can see
subscriptable
as an expression two lines above this. That's howcontext
and any other subscriptable should work. We treattags
expressions differently from other subscriptable calls, which is why I created a separate expression for it.
I don't mind if you want to make the
column_name
in the inner brackets support a numeric value as well.subscriptable = column_name open_square subscriptable_key close_square subscriptable_key = column_name / integer_string integer_string = ~r"[0-9]+"
You'll have to add
visit_
functions for those as well.
This is an extension of tags so unless we want to make subscriptable completely generic and to handle the tags case, I think I am taking the right approach here. Open to disagreement but I can also imagine that we might one day want to accept string literals or other arbitrary data as the key and then we'll just be redefining tags in subscriptable.
Would we find it cleaner to break out ("tags_raw"/tags")
into something like special_tag_column_names
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's back up a bit, I think the point here is that what you're trying to do shouldn't require SnQL changes. You should be able to extend the the subscriptable mapper to pick the value column, and that's it.
Looking at this more, can you change subscriptable
to take a tag_column
value (maybe rename tag_column
to something more generic)? Then you don't need to hardcode more column names into the SnQL language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this preferable to you? (diff is based off HEAD of this branch, not master)
diff --git a/snuba/query/snql/parser.py b/snuba/query/snql/parser.py
index 8d03601c..a00d616f 100644
--- a/snuba/query/snql/parser.py
+++ b/snuba/query/snql/parser.py
@@ -183,9 +183,9 @@ snql_grammar = Grammar(
true_literal = ~r"TRUE"i
false_literal = ~r"FALSE"i
null_literal = ~r"NULL"i
- subscriptable = column_name open_square column_name close_square
+ subscriptable = column_name open_square (column_name/tag_name) close_square
column_name = ~r"[a-zA-Z_][a-zA-Z0-9_\.:]*"
- tag_column = ("tags_raw"/"tags") open_square tag_name close_square
+ tag_column = "tags" open_square tag_name close_square
tag_name = ~r"[^\[\]]*"
identifier = backtick ~r"[a-zA-Z_][a-zA-Z0-9_]*" backtick
function_name = ~r"[a-zA-Z_][a-zA-Z0-9_]*"
(END)
My new e2e test passes with this change, but not sure about the whole suite yet.
edit: all tests pass if I just make subscriptable more flexible (that tag_column definition is the same in master)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this makes more sense to me. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also write a test that uses the SDK to generate the SnQL query. I looked over the code and it shouldn't be a problem, but I would test it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…or-multiple-value-arrays
Functional change required: make
tags[key] = int_value
andtags_raw[key] = text_value
work in queriesMechanism:
tags
ortags_raw