Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mep): support time bucketing in queries #2937

Merged
merged 5 commits into from
Jul 14, 2022

Conversation

onewland
Copy link
Contributor

@onewland onewland commented Jul 13, 2022

add support for bucketed time and handling the mapping of granularity : enum value in query processing

@wmak @ahmedetefy for visibility

@codecov-commenter
Copy link

codecov-commenter commented Jul 13, 2022

Codecov Report

Merging #2937 (1b8f5ef) into master (5d9755c) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2937      +/-   ##
==========================================
- Coverage   92.93%   92.88%   -0.05%     
==========================================
  Files         635      635              
  Lines       29069    29127      +58     
==========================================
+ Hits        27016    27056      +40     
- Misses       2053     2071      +18     
Impacted Files Coverage Δ
snuba/datasets/entities/generic_metrics.py 97.29% <100.00%> (+0.15%) ⬆️
snuba/query/processors/granularity_processor.py 100.00% <100.00%> (ø)
...sts/query/processors/test_granularity_processor.py 100.00% <100.00%> (ø)
tests/test_generic_metrics_api.py 100.00% <100.00%> (ø)
snuba/settings/settings_distributed.py 0.00% <0.00%> (-100.00%) ⬇️
snuba/settings/settings_test_distributed.py 0.00% <0.00%> (-100.00%) ⬇️
...ts/0010_groupedmessages_onpremise_compatibility.py 95.55% <0.00%> (-4.45%) ⬇️
snuba/optimize.py 87.50% <0.00%> (-3.75%) ⬇️
snuba/migrations/table_engines.py 95.50% <0.00%> (-3.38%) ⬇️
...nsactions_onpremise_fix_orderby_and_partitionby.py 81.33% <0.00%> (-2.67%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d9755c...1b8f5ef. Read the comment docs.

@onewland onewland marked this pull request as ready for review July 13, 2022 23:12
@onewland onewland requested a review from a team as a code owner July 13, 2022 23:12
snuba/query/processors/granularity_processor.py Outdated Show resolved Hide resolved
snuba/query/processors/granularity_processor.py Outdated Show resolved Hide resolved
snuba/query/processors/granularity_processor.py Outdated Show resolved Hide resolved

def process_query(self, query: Query, query_settings: QuerySettings) -> None:
granularity = self.__get_granularity(query)
query.add_condition_to_ast(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a matter of adding the condition to the query? Is there no existing granularity condition to be removed/replaced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that though? What was the mechanism by which the granularity was being set on the query previously and wouldn't that still be happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't being set previously on generic_metrics unless you added a where clause with the granularity column

@@ -180,7 +180,7 @@ def test_raw_tags(self) -> None:
AND tags_raw[{tag_key}] = '{value_as_string}'
AND timestamp >= toDateTime('{self.start_time}')
AND timestamp < toDateTime('{self.end_time}')
GRANULARITY 1
GRANULARITY 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow why these queries changed? Are you saying they were wrong before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove that second condition so now they will be wrong, but I just wanted to verify that we're handling the "raw" granularities correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one are you saying is the correct value and which one is the wrong one? And if you're saying it will be wrong after this PR can you give some info about why and how will it be addressed in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I originally put up the PR, I was thinking that both providing raw and enum values were correct. I've reconsidered and I now think we should accept only the "raw" (in seconds) granularity value to reduce confusion.

Our ideal is to take this non-infrastructure concern and move it to the product side (e.g. sentry should maintain the list of enum values for granularities it cares about and use them directly in queries). They don't have the flexibility in their query layer at the moment, though, to have two different approaches for building requests.

How it will be addressed in the future is undetermined but they're aware that we don't like having this logic in our side.


def __init__(
self,
accepted_granularities: Sequence[Tuple[int, int]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you pass these in? Everywhere you are using this class it is using the constants above.

Copy link
Member

@lynnagara lynnagara Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually really like that this processor is not tied to the specific granularity values we happen to be storing in this table for metrics. Even though it's only used in one place today I can imagine us experimenting with different granularities in different tables in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because "generic" metrics supports arbitrary granularities depending on the need of the customer, I wanted to make it flexible (e.g. we can support 10s, but performance doesn't use it, so we're not mapping it here)

accepted_granularities=PERFORMANCE_GRANULARITIES,
default_granularity_enum=DEFAULT_MAPPED_GRANULARITY_ENUM,
),
TimeSeriesProcessor({"bucketed_time": "timestamp"}, ("timestamp",)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this processor if you have the granularity processor above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because the product wants to round timestamps to odd intervals like 15m or 3h or whatever which isn't handled by the granularity processor alone

@onewland onewland requested a review from lynnagara July 14, 2022 18:05
@onewland onewland merged commit 8f38b4d into master Jul 14, 2022
@onewland onewland deleted the mep-support-time-bucketing-in-queries branch July 14, 2022 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants