-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce uniq in HAVING and SELECT #2339
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2339 +/- ##
==========================================
- Coverage 92.97% 92.96% -0.01%
==========================================
Files 563 565 +2
Lines 25932 26038 +106
==========================================
+ Hits 24109 24206 +97
- Misses 1823 1832 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the approach is fine.
please see my comment inline around alias expansion.
for col in selected_columns: | ||
col.expression.accept(matcher) | ||
if not all(matcher.found_expressions): | ||
should_throw = get_config("throw_on_uniq_select_and_having", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the dry run, I would simply log a warning in Sentry. We should not have too many of these queries to begin with, but you can do a quick estimate of how many queries have a uniq in the HAVING clause from the querylog to ensure they are not hundreds per second.
for i, exp_to_match in enumerate(self.expressions_to_match): | ||
if ( | ||
exp_to_match.function_name == exp.function_name | ||
and _expressions_equal_ignore_aliases(exp_to_match, exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the reason why you were looking for a way to generate a matcher from an expression ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have responded to your question in Slack, but you can also programmatically build matchers. I don't necessarily think that is the right solution here, but just FYI.
for exp in expressions_to_match:
matcher = FunctionCallMatch(String(exp.function_name))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this was happening only on transactions, but presumably it could also happen on errors, right?
Also, can you clarify how this is intended to work with aliases? If there is an alias of the uniq
expression in the select, is that sufficient?
for i, exp_to_match in enumerate(self.expressions_to_match): | ||
if ( | ||
exp_to_match.function_name == exp.function_name | ||
and _expressions_equal_ignore_aliases(exp_to_match, exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have responded to your question in Slack, but you can also programmatically build matchers. I don't necessarily think that is the right solution here, but just FYI.
for exp in expressions_to_match:
matcher = FunctionCallMatch(String(exp.function_name))
Good point will add it
It is sufficient for clickhouse yes. This will not crash it for example: SELECT
(project_id AS _snuba_project_id), transaction_name,
_snuba_count_unique_user -- <----- only the alias will work.
FROM transactions_dist_PROD
WHERE
greaterOrEquals((finish_ts AS _snuba_finish_ts), toDateTime('2021-12-18T19:20:14', 'Universal')) AND
less(_snuba_finish_ts, toDateTime('2021-12-29T21:55:02', 'Universal')) AND
in((project_id AS _snuba_project_id), [173225])
GROUP BY transaction_name, _snuba_project_id
HAVING
less((divide(countIf(notEquals(transaction_status, 0) AND
notEquals(transaction_status, 1) AND notEquals(transaction_status, 2)), count()) AS _snuba_failure_rate), 1.0) AND
greater((uniq(user) AS _snuba_count_unique_user), 1.0)
ORDER BY _snuba_count_unique_user DESC LIMIT 3 OFFSET 0 I originally even had this as a case that was handled but @fpacifici requested I remove it because the parser expands all aliases |
( | ||
Literal(None, datetime(2020, 4, 20, 16, 20)), | ||
"datetime(2020-04-20T16:20:00)", | ||
TEST_CASES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cases have stayed the same, I just pulled them out into a list
@fpacifici @evanh I've added functional equivalency as a function on Expression. I think this is the easier thing to do than generating matchers from expressions. please take a look when you can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine, but I wonder if the logic of function_eq
should be the default behaviour. I think generally speaking when we do equality the alias shouldn't be part of that.
return False | ||
if len(self.parameters) != len(other.parameters): | ||
return False | ||
for i, param in enumerate(self.parameters): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to change this, but you could shave one line off this by doing
for other, param in zip(self.parameters, other.parameters):
Context
This is a hacky query processor put in for the purpose of
mitigating clickhouse crashes. When a query is sent to clickhouse
which has a
uniq
in theHAVING
clause but not in theSELECT
clause, the query node will crash. This may be fixed with future versions
of clickhouse but we have not upgraded yet. This exists as a check so that we
return an exception to the user but don't crash the clickhouse query node
Solution
Feedback desired