feat(mql): MQL JOIN subquery generator #6101

enochtangg · 2024-07-12T15:01:30Z

This PR contains the last piece for being able to run multi-entity formula metric queries in snuba. This work was first spec'ed in #5717. The main change of this PR is the metrics subquery generator which now supports "MQL shaped" JOINs (see snuba/query/joins/metrics_subquery_generator.py). For example, the following query:

SELECT g(f(d0.a), f(d1.b)), t() as d0.time, t() as d1.time, d0.gb, d1.gb
FROM d0 INNER JOIN d1 ON ...
WHERE d0.m = 1 AND d1.m = 2
GROUP BY d0.time, d1.time, d0.gb, d1.gb
ORDER BY ...

becomes

SELECT g(d0._snuba_a, d1._snuba_b), d0._snuba_time, d1._snuba_time, d0._snuba_gb, d1._snuba_gb
FROM (
    SELECT f(a) as _snuba_a, t() as _snuba_time, gb as _snuba_gb
    FROM ...
    WHERE m = 1
    GROUP BY _snuba_time, _snuba_gb
) d0 INNER JOIN (
    SELECT g(b) as _snuba_b, t() as _snuba_time, gb as _snuba_gb
    FROM ...
    WHERE m = 2
    GROUP BY _snuba_time, _snuba_gb
) d1 ON ...
ORDER BY ...

See docstring from generate_metrics_subqueries() for the reasoning why subqueries should get processed in this way.

Additional to the subquery generator change, the indexer_mapping resolver was updated to resolve columns in the ON clause. This is because groupby columns get pushed onto the join condition. As a result, tag columns in the join condition must be resolved.

Testing

When this PR is merged, nothing changes. We configure the rollout through the runtime config. First, use run_new_mql_parser which samples a percentage of MQL queries and runs the new parser which now converts formula queries into JOINs. Then, in the entity processing stage of the pipeline, the metrics subquery generator will execute when it sees this query.

SET run_new_mql_parser = 0.01 # 1% of queries try using the new pipeline
Monitor in sentry and DD, and continue increasing run_new_mql_parser until we hit 100%.
After support this is completed, we can remove the old parser and other dead code.

…s-subquery-generator

codecov · 2024-07-12T15:15:46Z

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

❌ Failed Test Results:

Completed 1702 tests with 1 failed, 1699 passed and 2 skipped.

View the full list of failed tests

pytest

Class name: tests.query.parser.test_formula_mql_query
Test name: test_formula_with_nested_functions

Traceback (most recent call last):
  File ".../query/parser/test_formula_mql_query.py", line 946, in test_formula_with_nested_functions
    assert eq, reason
AssertionError: get_from_clause: JoinClause(left_node=IndividualNode(alias='d1', data_source=Entity(key=EntityKey.GENERIC_METRICS_DISTRIBUTIONS, schema=ColumnSet([Column('org_id', schemas.UInt(64, modifiers=None)), Column('project_id', schemas.UInt(64, modifiers=None)), Column('metric_id', schemas.UInt(64, modifiers=None)), Column('timestamp', schemas.DateTime(modifiers=None)), Column('bucketed_time', schemas.DateTime(modifiers=None)), Column('granularity', schemas.UInt(8, modifiers=None)), Column('use_case_id', schemas.String(modifiers=None)), Column('tags', schemas.Nested([Column('key', schemas.UInt(64, modifiers=None)), Column('value', schemas.UInt(64, modifiers=None))], modifiers=None)), Column('tags.raw_value', schemas.Array(schemas.String(modifiers=None), modifiers=SchemaModifiers(nullable=False, readonly=True))), Column('percentiles', schemas.AggregateFunction('quantiles(0.5, 0.75, 0.9, 0.95, 0.99)', schemas.Float(64, modifiers=None), modifiers=None)), Column('min', schemas.AggregateFunction('min', schemas.Float(64, modifiers=None), modifiers=None)), Column('max', schemas.AggregateFunction('max', schemas.Float(64, modifiers=None), modifiers=None)), Column('avg', schemas.AggregateFunction('avg', schemas.Float(64, modifiers=None), modifiers=None)), Column('sum', schemas.AggregateFunction('sum', schemas.Float(64, modifiers=None), modifiers=None)), Column('count', schemas.AggregateFunction('count', schemas.Float(64, modifiers=None), modifiers=None)), Column('histogram_buckets', schemas.AggregateFunction('histogram(250)', schemas.Float(64, modifiers=None), modifiers=None))]), sample=None)), right_node=IndividualNode(alias='d0', data_source=Entity(key=EntityKey.GENERIC_METRICS_DISTRIBUTIONS, schema=ColumnSet([Column('org_id', schemas.UInt(64, modifiers=None)), Column('project_id', schemas.UInt(64, modifiers=None)), Column('metric_id', schemas.UInt(64, modifiers=None)), Column('timestamp', schemas.DateTime(modifiers=None)), Column('bucketed_time', schemas.DateTime(modifiers=None)), Column('granularity', schemas.UInt(8, modifiers=None)), Column('use_case_id', schemas.String(modifiers=None)), Column('tags', schemas.Nested([Column('key', schemas.UInt(64, modifiers=None)), Column('value', schemas.UInt(64, modifiers=None))], modifiers=None)), Column('tags.raw_value', schemas.Array(schemas.String(modifiers=None), modifiers=SchemaModifiers(nullable=False, readonly=True))), Column('percentiles', schemas.AggregateFunction('quantiles(0.5, 0.75, 0.9, 0.95, 0.99)', schemas.Float(64, modifiers=None), modifiers=None)), Column('min', schemas.AggregateFunction('min', schemas.Float(64, modifiers=None), modifiers=None)), Column('max', schemas.AggregateFunction('max', schemas.Float(64, modifiers=None), modifiers=None)), Column('avg', schemas.AggregateFunction('avg', schemas.Float(64, modifiers=None), modifiers=None)), Column('sum', schemas.AggregateFunction('sum', schemas.Float(64, modifiers=None), modifiers=None)), Column('count', schemas.AggregateFunction('count', schemas.Float(64, modifiers=None), modifiers=None)), Column('histogram_buckets', schemas.AggregateFunction('histogram(250)', schemas.Float(64, modifiers=None), modifiers=None))]), sample=None)), keys=[JoinCondition(left=JoinConditionExpression(table_alias='d1', column='d1.time'), right=JoinConditionExpression(table_alias='d0', column='d0.time'))], join_type=<JoinType.INNER: 'INNER'>, join_modifier=None) != JoinClause(left_node=IndividualNode(alias='d1', data_source=Entity(key=EntityKey.GENERIC_METRICS_DISTRIBUTIONS, schema=ColumnSet([Column('org_id', schemas.UInt(64, modifiers=None)), Column('project_id', schemas.UInt(64, modifiers=None)), Column('metric_id', schemas.UInt(64, modifiers=None)), Column('timestamp', schemas.DateTime(modifiers=None)), Column('bucketed_time', schemas.DateTime(modifiers=None)), Column('granularity', schemas.UInt(8, modifiers=None)), Column('use_case_id', schemas.String(modifiers=None)), Column('tags', schemas.Nested([Column('key', schemas.UInt(64, modifiers=None)), Column('value', schemas.UInt(64, modifiers=None))], modifiers=None)), Column('tags.raw_value', schemas.Array(schemas.String(modifiers=None), modifiers=SchemaModifiers(nullable=False, readonly=True))), Column('percentiles', schemas.AggregateFunction('quantiles(0.5, 0.75, 0.9, 0.95, 0.99)', schemas.Float(64, modifiers=None), modifiers=None)), Column('min', schemas.AggregateFunction('min', schemas.Float(64, modifiers=None), modifiers=None)), Column('max', schemas.AggregateFunction('max', schemas.Float(64, modifiers=None), modifiers=None)), Column('avg', schemas.AggregateFunction('avg', schemas.Float(64, modifiers=None), modifiers=None)), Column('sum', schemas.AggregateFunction('sum', schemas.Float(64, modifiers=None), modifiers=None)), Column('count', schemas.AggregateFunction('count', schemas.Float(64, modifiers=None), modifiers=None)), Column('histogram_buckets', schemas.AggregateFunction('histogram(250)', schemas.Float(64, modifiers=None), modifiers=None))]), sample=None)), right_node=IndividualNode(alias='d0', data_source=Entity(key=EntityKey.GENERIC_METRICS_DISTRIBUTIONS, schema=ColumnSet([Column('org_id', schemas.UInt(64, modifiers=None)), Column('project_id', schemas.UInt(64, modifiers=None)), Column('metric_id', schemas.UInt(64, modifiers=None)), Column('timestamp', schemas.DateTime(modifiers=None)), Column('bucketed_time', schemas.DateTime(modifiers=None)), Column('granularity', schemas.UInt(8, modifiers=None)), Column('use_case_id', schemas.String(modifiers=None)), Column('tags', schemas.Nested([Column('key', schemas.UInt(64, modifiers=None)), Column('value', schemas.UInt(64, modifiers=None))], modifiers=None)), Column('tags.raw_value', schemas.Array(schemas.String(modifiers=None), modifiers=SchemaModifiers(nullable=False, readonly=True))), Column('percentiles', schemas.AggregateFunction('quantiles(0.5, 0.75, 0.9, 0.95, 0.99)', schemas.Float(64, modifiers=None), modifiers=None)), Column('min', schemas.AggregateFunction('min', schemas.Float(64, modifiers=None), modifiers=None)), Column('max', schemas.AggregateFunction('max', schemas.Float(64, modifiers=None), modifiers=None)), Column('avg', schemas.AggregateFunction('avg', schemas.Float(64, modifiers=None), modifiers=None)), Column('sum', schemas.AggregateFunction('sum', schemas.Float(64, modifiers=None), modifiers=None)), Column('count', schemas.AggregateFunction('count', schemas.Float(64, modifiers=None), modifiers=None)), Column('histogram_buckets', schemas.AggregateFunction('histogram(250)', schemas.Float(64, modifiers=None), modifiers=None))]), sample=None)), keys=[JoinCondition(left=JoinConditionExpression(table_alias='d1', column='time'), right=JoinConditionExpression(table_alias='d0', column='time'))], join_type=<JoinType.INNER: 'INNER'>, join_modifier=None)
assert False

…s-subquery-generator

enochtangg added 12 commits July 3, 2024 10:36

init

bc4b11d

in progress

431763f

sort conditions

153fe84

passing test

b64f895

add rollout settings

52478d4

Merge remote-tracking branch 'origin/master' into enocht/basic-metric…

cceac0b

…s-subquery-generator

escape join condition alias

2743865

fix groupby

5398581

fix multi join

ee9926d

clean up

882d370

fix totals queries

6ee77df

clean

d394065

enochtangg requested a review from a team as a code owner July 12, 2024 15:01

enochtangg added 7 commits July 12, 2024 11:18

typing

c1469ea

fix test

b100250

fix

8dfcf32

use subquery generator for metrics query too

3f15a53

fix tests

d6a521b

Merge remote-tracking branch 'origin/master' into enocht/basic-metric…

afc5f38

…s-subquery-generator

fix tests on master

3adb4b8

john-z-yang approved these changes Jul 12, 2024

View reviewed changes

enochtangg merged commit b3e81c3 into master Jul 15, 2024
29 checks passed

enochtangg deleted the enocht/basic-metrics-subquery-generator branch July 15, 2024 14:00

enochtangg mentioned this pull request Jul 15, 2024

Implement subquery generator for MQL join queries #6106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mql): MQL JOIN subquery generator #6101

feat(mql): MQL JOIN subquery generator #6101

enochtangg commented Jul 12, 2024

codecov bot commented Jul 12, 2024 •

edited

Loading

pytest

feat(mql): MQL JOIN subquery generator #6101

feat(mql): MQL JOIN subquery generator #6101

Conversation

enochtangg commented Jul 12, 2024

Testing

codecov bot commented Jul 12, 2024 • edited Loading

❌ Failed Test Results:

pytest

codecov bot commented Jul 12, 2024 •

edited

Loading