Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mep): subscript fix for multiple value arrays #2914

Merged
merged 15 commits into from
Jul 11, 2022

Conversation

onewland
Copy link
Contributor

@onewland onewland commented Jul 7, 2022

Functional change required: make tags[key] = int_value and tags_raw[key] = text_value work in queries

Mechanism:

  1. change SubscriptableMapper to take a value subcolumn argument
  2. change parsimonious special-casing around tag columns to accept tags or tags_raw

@codecov-commenter
Copy link

codecov-commenter commented Jul 7, 2022

Codecov Report

Merging #2914 (20bf50a) into master (0dd2049) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2914   +/-   ##
=======================================
  Coverage   92.90%   92.90%           
=======================================
  Files         632      632           
  Lines       29100    29137   +37     
=======================================
+ Hits        27034    27071   +37     
  Misses       2066     2066           
Impacted Files Coverage Δ
snuba/query/snql/parser.py 97.01% <ø> (ø)
tests/clickhouse/test_query_profiler.py 100.00% <ø> (ø)
...s/clickhouse/translators/snuba/test_translation.py 100.00% <ø> (ø)
tests/pipeline/test_composite_planner.py 100.00% <ø> (ø)
tests/query/processors/query_builders.py 100.00% <ø> (ø)
snuba/clickhouse/translators/snuba/mappers.py 98.85% <100.00%> (+0.01%) ⬆️
snuba/datasets/entities/generic_metrics.py 97.14% <100.00%> (ø)
snuba/query/processors/mapping_optimizer.py 99.17% <100.00%> (+<0.01%) ⬆️
tests/test_generic_metrics_api.py 100.00% <100.00%> (ø)
tests/test_snql_sdk_api.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dd2049...20bf50a. Read the comment docs.

@onewland onewland marked this pull request as ready for review July 7, 2022 18:58
@onewland onewland requested a review from a team as a code owner July 7, 2022 18:58
Base automatically changed from e2e-gen-metric-dists to master July 7, 2022 19:00
Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments inline

@@ -135,6 +135,7 @@ class SubscriptableMapper(SubscriptableReferenceMapper):
from_column_name: str
to_nested_col_table: Optional[str]
to_nested_col_name: str
value_subcolumn_name: str = "value"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is more work you will need to do to make this work properly.
The mapper (here) transform the SubscriptableReference into a Clickhouse expression and it is managed well here.
The problem is that we also have mappers that identify the Clickhouse expression after the translation and extract parts of them for optimization. In some cases they reference explicitly the .value column.

Specifically you want to find for usages of this pattern:
https://github.com/getsentry/snuba/blob/master/snuba/clickhouse/translators/snuba/mappers.py#L241-L256
And see if anybody uses this parameter VALUE_COL_MAPPING_PARAM, which contains the name of the value column.

I think I found only this place that relies on the column to be called value
https://github.com/getsentry/snuba/blob/master/snuba/query/processors/mapping_optimizer.py#L171

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this I also noticed that we're not applying this MappingOptimizer on release health metrics. That seems like an oversight, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that the only hardcoding of .value was in the file you pointed out (at least as far as uses of mapping_pattern is concerned) and fixed it.

@@ -185,7 +185,7 @@
null_literal = ~r"NULL"i
subscriptable = column_name open_square column_name close_square
column_name = ~r"[a-zA-Z_][a-zA-Z0-9_\.:]*"
tag_column = "tags" open_square tag_name close_square
tag_column = ("tags_raw"/"tags") open_square tag_name close_square
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is a quite fragile approach.
@evanh do you remember why we rely on the subscriptable columns to be called tags in parsing? it seems that anything[anything_else] should always parse as a SubscriptableReference.

Also how does it works for context[.....]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I didn't love this either but parsimonious was blowing up on (imo) valid snql queries until I added the special workaround

Copy link
Member

@evanh evanh Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We delineate between tag and subscriptable columns, but they are both valid. You can see subscriptable as an expression two lines above this. That's how context and any other subscriptable should work. We treat tags expressions differently from other subscriptable calls, which is why I created a separate expression for it.

@onewland What was the problem you ran into when you didn't add this change?

Copy link
Contributor Author

@onewland onewland Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this parse error:

2022-07-07 10:21:17,064 Invalid query
Traceback (most recent call last):
  File "/Users/oliver/workplace/snuba/./snuba/query/snql/parser.py", line 919, in parse_snql_query_initial
    exp_tree = snql_grammar.parse(body)
  File "/Users/oliver/workplace/snuba/.venv/lib/python3.8/site-packages/parsimonious/grammar.py", line 115, in parse
    return self.default_rule.parse(text, pos=pos)
  File "/Users/oliver/workplace/snuba/.venv/lib/python3.8/site-packages/parsimonious/expressions.py", line 122, in parse
    raise IncompleteParseError(text, node.end, self)
parsimonious.exceptions.IncompleteParseError: Rule 'query_exp' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with 'AND tags_raw[5] = 'h' (line 1, column 175).

for the following query:

MATCH (generic_metrics_distributions) SELECT min(value) AS total_seconds, arrayJoin(tags.raw_value) BY project_id, org_id, tags.raw_value WHERE org_id = 1 AND project_id = 2 AND tags_raw[5] = 'h' AND metric_id = 8 AND granularity = 2 AND timestamp >= toDateTime('2022-06-01T19:00:00') AND timestamp < toDateTime('2022-06-30T20:00:00')

and I could substitute any value for tags_raw and get the same error.

I think the issue is that column_name in subscriptable can't be purely a number/literal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if you want to make the column_name in the inner brackets support a numeric value as well.

subscriptable         = column_name open_square subscriptable_key close_square
subscriptable_key     = column_name / integer_string
integer_string        = ~r"[0-9]+"

You'll have to add visit_ functions for those as well.

Copy link
Contributor Author

@onewland onewland Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We delineate between tag and subscriptable columns, but they are both valid. You can see subscriptable as an expression two lines above this. That's how context and any other subscriptable should work. We treat tags expressions differently from other subscriptable calls, which is why I created a separate expression for it.

I don't mind if you want to make the column_name in the inner brackets support a numeric value as well.

subscriptable         = column_name open_square subscriptable_key close_square
subscriptable_key     = column_name / integer_string
integer_string        = ~r"[0-9]+"

You'll have to add visit_ functions for those as well.

This is an extension of tags so unless we want to make subscriptable completely generic and to handle the tags case, I think I am taking the right approach here. Open to disagreement but I can also imagine that we might one day want to accept string literals or other arbitrary data as the key and then we'll just be redefining tags in subscriptable.

Would we find it cleaner to break out ("tags_raw"/tags") into something like special_tag_column_names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's back up a bit, I think the point here is that what you're trying to do shouldn't require SnQL changes. You should be able to extend the the subscriptable mapper to pick the value column, and that's it.

Looking at this more, can you change subscriptable to take a tag_column value (maybe rename tag_column to something more generic)? Then you don't need to hardcode more column names into the SnQL language.

Copy link
Contributor Author

@onewland onewland Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this preferable to you? (diff is based off HEAD of this branch, not master)

diff --git a/snuba/query/snql/parser.py b/snuba/query/snql/parser.py
index 8d03601c..a00d616f 100644
--- a/snuba/query/snql/parser.py
+++ b/snuba/query/snql/parser.py
@@ -183,9 +183,9 @@ snql_grammar = Grammar(
     true_literal          = ~r"TRUE"i
     false_literal         = ~r"FALSE"i
     null_literal          = ~r"NULL"i
-    subscriptable         = column_name open_square column_name close_square
+    subscriptable         = column_name open_square (column_name/tag_name) close_square
     column_name           = ~r"[a-zA-Z_][a-zA-Z0-9_\.:]*"
-    tag_column            = ("tags_raw"/"tags") open_square tag_name close_square
+    tag_column            = "tags" open_square tag_name close_square
     tag_name              = ~r"[^\[\]]*"
     identifier            = backtick ~r"[a-zA-Z_][a-zA-Z0-9_]*" backtick
     function_name         = ~r"[a-zA-Z_][a-zA-Z0-9_]*"
(END)

My new e2e test passes with this change, but not sure about the whole suite yet.

edit: all tests pass if I just make subscriptable more flexible (that tag_column definition is the same in master)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes more sense to me. Thanks!

Copy link
Member

@evanh evanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also write a test that uses the SDK to generate the SnQL query. I looked over the code and it shouldn't be a problem, but I would test it anyway.

@onewland onewland requested a review from fpacifici July 8, 2022 16:05
Copy link
Member

@evanh evanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@onewland onewland enabled auto-merge (squash) July 8, 2022 21:16
@onewland onewland disabled auto-merge July 8, 2022 21:16
@onewland onewland merged commit d6eb346 into master Jul 11, 2022
@onewland onewland deleted the tag-subscript-fix-for-multiple-value-arrays branch July 11, 2022 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants