feat(mep): subscript fix for multiple value arrays #2914

onewland · 2022-07-07T16:27:49Z

Functional change required: make tags[key] = int_value and tags_raw[key] = text_value work in queries

Mechanism:

change SubscriptableMapper to take a value subcolumn argument
change parsimonious special-casing around tag columns to accept tags or tags_raw

codecov-commenter · 2022-07-07T16:40:13Z

Codecov Report

Merging #2914 (20bf50a) into master (0dd2049) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2914   +/-   ##
=======================================
  Coverage   92.90%   92.90%           
=======================================
  Files         632      632           
  Lines       29100    29137   +37     
=======================================
+ Hits        27034    27071   +37     
  Misses       2066     2066

Impacted Files	Coverage Δ
snuba/query/snql/parser.py	`97.01% <ø> (ø)`
tests/clickhouse/test_query_profiler.py	`100.00% <ø> (ø)`
...s/clickhouse/translators/snuba/test_translation.py	`100.00% <ø> (ø)`
tests/pipeline/test_composite_planner.py	`100.00% <ø> (ø)`
tests/query/processors/query_builders.py	`100.00% <ø> (ø)`
snuba/clickhouse/translators/snuba/mappers.py	`98.85% <100.00%> (+0.01%)`	⬆️
snuba/datasets/entities/generic_metrics.py	`97.14% <100.00%> (ø)`
snuba/query/processors/mapping_optimizer.py	`99.17% <100.00%> (+<0.01%)`	⬆️
tests/test_generic_metrics_api.py	`100.00% <100.00%> (ø)`
tests/test_snql_sdk_api.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dd2049...20bf50a. Read the comment docs.

…or-multiple-value-arrays

fpacifici

Please see my comments inline

fpacifici · 2022-07-07T23:09:55Z

snuba/clickhouse/translators/snuba/mappers.py

@@ -135,6 +135,7 @@ class SubscriptableMapper(SubscriptableReferenceMapper):
    from_column_name: str
    to_nested_col_table: Optional[str]
    to_nested_col_name: str
+    value_subcolumn_name: str = "value"


I think there is more work you will need to do to make this work properly.
The mapper (here) transform the SubscriptableReference into a Clickhouse expression and it is managed well here.
The problem is that we also have mappers that identify the Clickhouse expression after the translation and extract parts of them for optimization. In some cases they reference explicitly the .value column.

Specifically you want to find for usages of this pattern:
https://github.com/getsentry/snuba/blob/master/snuba/clickhouse/translators/snuba/mappers.py#L241-L256
And see if anybody uses this parameter VALUE_COL_MAPPING_PARAM, which contains the name of the value column.

I think I found only this place that relies on the column to be called value
https://github.com/getsentry/snuba/blob/master/snuba/query/processors/mapping_optimizer.py#L171

Looking into this I also noticed that we're not applying this MappingOptimizer on release health metrics. That seems like an oversight, right?

Verified that the only hardcoding of .value was in the file you pointed out (at least as far as uses of mapping_pattern is concerned) and fixed it.

fpacifici · 2022-07-07T23:13:35Z

snuba/query/snql/parser.py

@@ -185,7 +185,7 @@
    null_literal          = ~r"NULL"i
    subscriptable         = column_name open_square column_name close_square
    column_name           = ~r"[a-zA-Z_][a-zA-Z0-9_\.:]*"
-    tag_column            = "tags" open_square tag_name close_square
+    tag_column            = ("tags_raw"/"tags") open_square tag_name close_square


I feel this is a quite fragile approach.
@evanh do you remember why we rely on the subscriptable columns to be called tags in parsing? it seems that anything[anything_else] should always parse as a SubscriptableReference.

Also how does it works for context[.....]

yeah, I didn't love this either but parsimonious was blowing up on (imo) valid snql queries until I added the special workaround

We delineate between tag and subscriptable columns, but they are both valid. You can see subscriptable as an expression two lines above this. That's how context and any other subscriptable should work. We treat tags expressions differently from other subscriptable calls, which is why I created a separate expression for it.

@onewland What was the problem you ran into when you didn't add this change?

I got this parse error:

2022-07-07 10:21:17,064 Invalid query Traceback (most recent call last): File "/Users/oliver/workplace/snuba/./snuba/query/snql/parser.py", line 919, in parse_snql_query_initial exp_tree = snql_grammar.parse(body) File "/Users/oliver/workplace/snuba/.venv/lib/python3.8/site-packages/parsimonious/grammar.py", line 115, in parse return self.default_rule.parse(text, pos=pos) File "/Users/oliver/workplace/snuba/.venv/lib/python3.8/site-packages/parsimonious/expressions.py", line 122, in parse raise IncompleteParseError(text, node.end, self) parsimonious.exceptions.IncompleteParseError: Rule 'query_exp' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with 'AND tags_raw[5] = 'h' (line 1, column 175).

for the following query:

MATCH (generic_metrics_distributions) SELECT min(value) AS total_seconds, arrayJoin(tags.raw_value) BY project_id, org_id, tags.raw_value WHERE org_id = 1 AND project_id = 2 AND tags_raw[5] = 'h' AND metric_id = 8 AND granularity = 2 AND timestamp >= toDateTime('2022-06-01T19:00:00') AND timestamp < toDateTime('2022-06-30T20:00:00')

and I could substitute any value for tags_raw and get the same error.

I think the issue is that column_name in subscriptable can't be purely a number/literal

I don't mind if you want to make the column_name in the inner brackets support a numeric value as well.

subscriptable = column_name open_square subscriptable_key close_square subscriptable_key = column_name / integer_string integer_string = ~r"[0-9]+"

You'll have to add visit_ functions for those as well.

We delineate between tag and subscriptable columns, but they are both valid. You can see subscriptable as an expression two lines above this. That's how context and any other subscriptable should work. We treat tags expressions differently from other subscriptable calls, which is why I created a separate expression for it.

I don't mind if you want to make the column_name in the inner brackets support a numeric value as well.

subscriptable = column_name open_square subscriptable_key close_square subscriptable_key = column_name / integer_string integer_string = ~r"[0-9]+"

You'll have to add visit_ functions for those as well.

This is an extension of tags so unless we want to make subscriptable completely generic and to handle the tags case, I think I am taking the right approach here. Open to disagreement but I can also imagine that we might one day want to accept string literals or other arbitrary data as the key and then we'll just be redefining tags in subscriptable.

Would we find it cleaner to break out ("tags_raw"/tags") into something like special_tag_column_names?

Let's back up a bit, I think the point here is that what you're trying to do shouldn't require SnQL changes. You should be able to extend the the subscriptable mapper to pick the value column, and that's it.

Looking at this more, can you change subscriptable to take a tag_column value (maybe rename tag_column to something more generic)? Then you don't need to hardcode more column names into the SnQL language.

Is this preferable to you? (diff is based off HEAD of this branch, not master)

diff --git a/snuba/query/snql/parser.py b/snuba/query/snql/parser.py index 8d03601c..a00d616f 100644 --- a/snuba/query/snql/parser.py +++ b/snuba/query/snql/parser.py @@ -183,9 +183,9 @@ snql_grammar = Grammar( true_literal = ~r"TRUE"i false_literal = ~r"FALSE"i null_literal = ~r"NULL"i - subscriptable = column_name open_square column_name close_square + subscriptable = column_name open_square (column_name/tag_name) close_square column_name = ~r"[a-zA-Z_][a-zA-Z0-9_\.:]*" - tag_column = ("tags_raw"/"tags") open_square tag_name close_square + tag_column = "tags" open_square tag_name close_square tag_name = ~r"[^\[\]]*" identifier = backtick ~r"[a-zA-Z_][a-zA-Z0-9_]*" backtick function_name = ~r"[a-zA-Z_][a-zA-Z0-9_]*" (END)

My new e2e test passes with this change, but not sure about the whole suite yet.

edit: all tests pass if I just make subscriptable more flexible (that tag_column definition is the same in master)

Yeah this makes more sense to me. Thanks!

evanh

I would also write a test that uses the SDK to generate the SnQL query. I looked over the code and it shouldn't be a problem, but I would test it anyway.

evanh

Thanks!

…or-multiple-value-arrays

onewland added 7 commits July 6, 2022 14:23

[wip] wire up distributions

e510e81

copy/paste fixes

9a6e004

basic e2e for distributions

c3f5394

add percentiles test

d12556b

remove unnecessary comment

a101d00

[wip] fix tag subscripting to support multiple value columns

02d6006

fix tag key type

1be9000

add support for raw tag access

cdd6241

onewland added the do not merge label Jul 7, 2022

onewland added 2 commits July 7, 2022 11:53

verify e2e behavior of tags and tags_raw mappers

4e55a65

add test to subscriptable mapper special value subcolumn

65d631d

onewland marked this pull request as ready for review July 7, 2022 18:58

onewland requested a review from a team as a code owner July 7, 2022 18:58

Base automatically changed from e2e-gen-metric-dists to master July 7, 2022 19:00

onewland removed the do not merge label Jul 7, 2022

Merge remote-tracking branch 'origin/master' into tag-subscript-fix-f…

1f225f1

…or-multiple-value-arrays

fpacifici reviewed Jul 7, 2022

View reviewed changes

evanh reviewed Jul 8, 2022

View reviewed changes

onewland mentioned this pull request Jul 8, 2022

ref(MEP): Add new option to query tag values as strings from clickhouse [INGEST-1396] getsentry/sentry#36397

Merged

onewland added 2 commits July 8, 2022 07:51

fix MappingOptimizer

9df8839

add snql SDK test for tags_raw

62093cf

onewland requested a review from fpacifici July 8, 2022 16:05

evanh approved these changes Jul 8, 2022

View reviewed changes

change parser rules to remove tags_raw hardcoding

9dc1a0d

fpacifici approved these changes Jul 8, 2022

View reviewed changes

Merge remote-tracking branch 'origin/master' into tag-subscript-fix-f…

20bf50a

…or-multiple-value-arrays

onewland enabled auto-merge (squash) July 8, 2022 21:16

onewland disabled auto-merge July 8, 2022 21:16

onewland merged commit d6eb346 into master Jul 11, 2022

onewland deleted the tag-subscript-fix-for-multiple-value-arrays branch July 11, 2022 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mep): subscript fix for multiple value arrays #2914

feat(mep): subscript fix for multiple value arrays #2914

onewland commented Jul 7, 2022 •

edited

Loading

codecov-commenter commented Jul 7, 2022 •

edited

Loading

fpacifici left a comment

fpacifici Jul 7, 2022

onewland Jul 8, 2022

onewland Jul 8, 2022

fpacifici Jul 7, 2022

onewland Jul 7, 2022

evanh Jul 8, 2022 •

edited

Loading

onewland Jul 8, 2022 •

edited

Loading

evanh Jul 8, 2022

onewland Jul 8, 2022 •

edited

Loading

evanh Jul 8, 2022

onewland Jul 8, 2022 •

edited

Loading

evanh Jul 8, 2022

evanh left a comment

evanh left a comment

feat(mep): subscript fix for multiple value arrays #2914

feat(mep): subscript fix for multiple value arrays #2914

Conversation

onewland commented Jul 7, 2022 • edited Loading

codecov-commenter commented Jul 7, 2022 • edited Loading

Codecov Report

fpacifici left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evanh Jul 8, 2022 • edited Loading

Choose a reason for hiding this comment

onewland Jul 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onewland Jul 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onewland Jul 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evanh left a comment

Choose a reason for hiding this comment

evanh left a comment

Choose a reason for hiding this comment

onewland commented Jul 7, 2022 •

edited

Loading

codecov-commenter commented Jul 7, 2022 •

edited

Loading

evanh Jul 8, 2022 •

edited

Loading

onewland Jul 8, 2022 •

edited

Loading

onewland Jul 8, 2022 •

edited

Loading

onewland Jul 8, 2022 •

edited

Loading