feat(mep): support time bucketing in queries #2937

onewland · 2022-07-13T21:53:50Z

add support for bucketed time and handling the mapping of granularity : enum value in query processing

codecov-commenter · 2022-07-13T22:00:58Z

Codecov Report

Merging #2937 (1b8f5ef) into master (5d9755c) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2937      +/-   ##
==========================================
- Coverage   92.93%   92.88%   -0.05%     
==========================================
  Files         635      635              
  Lines       29069    29127      +58     
==========================================
+ Hits        27016    27056      +40     
- Misses       2053     2071      +18

Impacted Files	Coverage Δ
snuba/datasets/entities/generic_metrics.py	`97.29% <100.00%> (+0.15%)`	⬆️
snuba/query/processors/granularity_processor.py	`100.00% <100.00%> (ø)`
...sts/query/processors/test_granularity_processor.py	`100.00% <100.00%> (ø)`
tests/test_generic_metrics_api.py	`100.00% <100.00%> (ø)`
snuba/settings/settings_distributed.py	`0.00% <0.00%> (-100.00%)`	⬇️
snuba/settings/settings_test_distributed.py	`0.00% <0.00%> (-100.00%)`	⬇️
...ts/0010_groupedmessages_onpremise_compatibility.py	`95.55% <0.00%> (-4.45%)`	⬇️
snuba/optimize.py	`87.50% <0.00%> (-3.75%)`	⬇️
snuba/migrations/table_engines.py	`95.50% <0.00%> (-3.38%)`	⬇️
...nsactions_onpremise_fix_orderby_and_partitionby.py	`81.33% <0.00%> (-2.67%)`	⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d9755c...1b8f5ef. Read the comment docs.

snuba/query/processors/granularity_processor.py

lynnagara · 2022-07-13T23:51:15Z

snuba/query/processors/granularity_processor.py

+
+    def process_query(self, query: Query, query_settings: QuerySettings) -> None:
+        granularity = self.__get_granularity(query)
+        query.add_condition_to_ast(


It is just a matter of adding the condition to the query? Is there no existing granularity condition to be removed/replaced?

yes, that's correct

Why is that though? What was the mechanism by which the granularity was being set on the query previously and wouldn't that still be happening?

it wasn't being set previously on generic_metrics unless you added a where clause with the granularity column

lynnagara · 2022-07-13T23:53:44Z

tests/test_generic_metrics_api.py

@@ -180,7 +180,7 @@ def test_raw_tags(self) -> None:
                    AND tags_raw[{tag_key}] = '{value_as_string}'
                    AND timestamp >= toDateTime('{self.start_time}')
                    AND timestamp < toDateTime('{self.end_time}')
-                    GRANULARITY 1
+                    GRANULARITY 60


I don't think I follow why these queries changed? Are you saying they were wrong before?

I'm going to remove that second condition so now they will be wrong, but I just wanted to verify that we're handling the "raw" granularities correctly

Which one are you saying is the correct value and which one is the wrong one? And if you're saying it will be wrong after this PR can you give some info about why and how will it be addressed in future?

When I originally put up the PR, I was thinking that both providing raw and enum values were correct. I've reconsidered and I now think we should accept only the "raw" (in seconds) granularity value to reduce confusion.

Our ideal is to take this non-infrastructure concern and move it to the product side (e.g. sentry should maintain the list of enum values for granularities it cares about and use them directly in queries). They don't have the flexibility in their query layer at the moment, though, to have two different approaches for building requests.

How it will be addressed in the future is undetermined but they're aware that we don't like having this logic in our side.

evanh · 2022-07-14T13:17:52Z

snuba/query/processors/granularity_processor.py

+
+    def __init__(
+        self,
+        accepted_granularities: Sequence[Tuple[int, int]],


Why do you pass these in? Everywhere you are using this class it is using the constants above.

I actually really like that this processor is not tied to the specific granularity values we happen to be storing in this table for metrics. Even though it's only used in one place today I can imagine us experimenting with different granularities in different tables in the future.

because "generic" metrics supports arbitrary granularities depending on the need of the customer, I wanted to make it flexible (e.g. we can support 10s, but performance doesn't use it, so we're not mapping it here)

evanh · 2022-07-14T13:18:47Z

snuba/datasets/entities/generic_metrics.py

+                accepted_granularities=PERFORMANCE_GRANULARITIES,
+                default_granularity_enum=DEFAULT_MAPPED_GRANULARITY_ENUM,
+            ),
+            TimeSeriesProcessor({"bucketed_time": "timestamp"}, ("timestamp",)),


Do you need this processor if you have the granularity processor above?

yes, because the product wants to round timestamps to odd intervals like 15m or 3h or whatever which isn't handled by the granularity processor alone

onewland added 2 commits July 13, 2022 14:50

add enum-based granularity processor

a2cf9e9

add TimeSeriesProcessor

8f25780

onewland added 2 commits July 13, 2022 15:24

added unit tests for mapped granularity processor

ff9551e

test that time bucketing actually occurs

b1a5528

onewland marked this pull request as ready for review July 13, 2022 23:12

onewland requested a review from a team as a code owner July 13, 2022 23:12

lynnagara reviewed Jul 13, 2022

View reviewed changes

evanh reviewed Jul 14, 2022

View reviewed changes

addressed PR comments

1b8f5ef

onewland requested a review from lynnagara July 14, 2022 18:05

lynnagara approved these changes Jul 14, 2022

View reviewed changes

onewland merged commit 8f38b4d into master Jul 14, 2022

onewland deleted the mep-support-time-bucketing-in-queries branch July 14, 2022 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mep): support time bucketing in queries #2937

feat(mep): support time bucketing in queries #2937

onewland commented Jul 13, 2022 •

edited

Loading

codecov-commenter commented Jul 13, 2022 •

edited

Loading

lynnagara Jul 13, 2022

onewland Jul 14, 2022

lynnagara Jul 14, 2022

onewland Jul 14, 2022

lynnagara Jul 13, 2022

onewland Jul 14, 2022

lynnagara Jul 14, 2022

onewland Jul 14, 2022

evanh Jul 14, 2022

lynnagara Jul 14, 2022 •

edited

Loading

onewland Jul 14, 2022

evanh Jul 14, 2022

onewland Jul 14, 2022

feat(mep): support time bucketing in queries #2937

feat(mep): support time bucketing in queries #2937

Conversation

onewland commented Jul 13, 2022 • edited Loading

codecov-commenter commented Jul 13, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lynnagara Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onewland commented Jul 13, 2022 •

edited

Loading

codecov-commenter commented Jul 13, 2022 •

edited

Loading

lynnagara Jul 14, 2022 •

edited

Loading