-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for PostgreSQL-style shorthand casts #27392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
core/trino-grammar/src/main/antlr4/io/trino/grammar/sql/SqlBase.g4
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
| // the target is a primaryExpression to support PostgreSQL-style casts | ||
| // of the form <complex expression>::<type>, which are syntactically ambiguous with | ||
| // static method calls | ||
| | primaryExpression DOUBLE_COLON identifier ('(' (expression (',' expression)*)? ')')? #staticMethodCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use type on the rhs?
AFAIR engines allow a::timestamp with time zone and likely a::double precision too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can’t use type, and hence the limitation I mentioned in the description. The standard SQL syntax requires this to be a function invocation, so it has to look like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me, "syntactically ambiguous with static method calls" means that foo::bar(1) is ambiguous in a sense that it can be two different things
- PostgreSQL-style cast of expression
footo typebar(1) - a static method
barinvocation on typefoowith parameters1
it is surprising, but it seems to me that foo::bar is also ambiguous because argument braces are optional
now,
foo::timestamp already departs from the spec
foo::timestamp with time zone also departs from the spec, but is consistent with foo::timestamp. And is not ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"syntactically ambiguous with static method calls" means that foo::bar(1) is ambiguous in a sense that it can be two different things
Exactly. That's a syntactic ambiguity that's easy to solve during analysis. We parse all occurrences of :: as static method calls and decide later if it was meant to be a call or a cast depending on what's on the left side of the expression. If it's a type-producing expression (currently, an identifier matching a type name), then we treat it as a static method call. Otherwise, we treat it as a cast.
foo::timestamp with time zone also departs from the spec, but is consistent with foo::timestamp. And is not ambiguous.
Right, but to be able to parse that we'd need to allow either identifier ('(' (expression (',' expression)*)? ')') or type on the right side, which introduces all sorts of problems. See my comment below.
Also, as an aside, note that there's another ambiguous usage (I mention it in the comment below, too) with:
SELECT xxx::double precision
In Trino, that could be interpreted either as SELECT xxx::double AS precision or as a cast to double precision. PostgreSQL doesn't have that problem since double is not a valid type. Which goes to say, sometimes we can't just adopt a language feature from another database, since there are intricate interactions between all features that need to be taken into account.
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/scalar/TestStaticMethodCall.java
Show resolved
Hide resolved
f712f7a to
0c9ebcf
Compare
core/trino-main/src/test/java/io/trino/operator/scalar/TestStaticMethodCall.java
Show resolved
Hide resolved
| assertThatThrownBy(() -> SQL_PARSER.createExpression("1::double precision")) | ||
| .hasMessageMatching(".*mismatched input.*"); | ||
|
|
||
| assertThatThrownBy(() -> SQL_PARSER.createExpression("'abc'::timestamp with time zone")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should support this & decimals.
it's not nice to a user to allow foo::type for most but not all types that are commonly cast in sql
this syntax really helps write sql, but we punish users because they will have to sometimes go back to old syntax just because they need decimal instead of a double, or point in time instead of a zoneless timestamp
it's ok not to support this within this PR, but i would want to understand what it takes to support it in the future. also, can help with the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do that, we'll need to unify these two productions:
identifier ('(' (expression (',' expression)*)? ')')?type
Some implications:
typewill need to be a valid expression, since, for instance, you can have arbitrary nesting:array(array(array(double precision)))- there's a syntactic ambiguity between
identifier ('(' (expression (',' expression)*)? ')')?andidentifier ('(' typeParameter (',' typeParameter)* ')')?if the arguments are all integer numbers. It can be parsed either as a static method invocation or as a generic type - we'll need to figure out how to deal with the common that could lead to expensive backtracking. For example, in
array(array(array(double precision))), the parser wouldn't know which alternative to follow ahead of time. It would have to go down one of the alternatives until it finds it can't parse it, backtrack all the way to the top and try the other alternative - We'll have to resolve other ambiguities or backtracking elsewhere. For example,
SELECT xxx::double precisioncould be interpreted either as a cast todouble precisionor asSELECT xxx::double AS precision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typewill need to be a valid expression,
this i don't understand.
types are not expressions in current trino (type is not a first class value) and i don't have intention to make them be.
we'll need to figure out how to deal with the common that could lead to expensive backtracking. For example, in array(array(array(double precision))), the parser wouldn't know which alternative to follow ahead of time. It would have to go down one of the alternatives until it finds it can't parse it, backtrack all the way to the top and try the other alternative
yes, it could require backtracking. but would this be expensive backtracking?
it's not exponential, it's just two passes, right?
it could become exponential if in two passes we also can do backtracking, however :: is not valid part of a type definition
SELECT xxx::double precisioncould be interpreted either as a cast todouble precisionor asSELECT xxx::double AS precision
that's a very nice example. some systems solve ambiguity by having deterministic precedence, but i get the point.
i am not concerned about users really needing to type ::double precision. it's a understandable limitation to require them to type this as ::double. they can still write what they need
i am more concern about casting to timestamp with time zone (that's my favorite type btw). how can we allow that?
Standard SQL supports static method calls on types via the :: operator.
While this syntax is generally incompatible with PostgreSQL shorthand cast
syntax, there's a subset of that can be safely repurposed to support that
functionality.
The SQL specification defines static method calls as:
<static method invocation> ::=
<path-resolved user-defined type name> <double colon> <method name>
[ <SQL argument list> ]
where <path-resolved user-defined type name> translates to:
<user-defined type name> ::=
[ <schema name> <period> ] <qualified identifier>
To support casts, we need to extend the rule to support arbitrary expressions
as the target of the invocation. To disambiguate a static method call from
a cast, we distinguish between type-producing expressions and expressions that
produce regular values. For the latter, if the method matches the name of a
well-known type, we treat it as a cast. Otherwise, the expression assumed
to be a static method call and fail the evaluation with a "not yet supported"
error.
One limitation is that casts are only supported for simple types. Parametric
types are not yet supported, but it wouldn't be too hard to add. Types whose
name don't match the syntax of SQL function calls are not supported either
and will be harder to support. That will require introducing type-producing
expressions into the language and making types first-class expressions.
0c9ebcf to
9c0147b
Compare
Standard SQL supports static method calls on types via the :: operator. While this syntax is generally incompatible with PostgreSQL shorthand cast syntax, there's a subset of that can be safely repurposed to support that functionality.
The SQL specification defines static method calls as:
where
<path-resolved user-defined type name>translates to:To support casts, we need to extend the rule to support arbitrary expressions as the target of the invocation. To disambiguate a static method call from a cast, we distinguish between type-producing expressions and expressions that produce regular values. For the latter, if the method matches the name of a well-known type, we treat it as a cast. Otherwise, the expression assumed to be a static method call and fail the evaluation with a "not yet supported" error.
One limitation is that casts are only supported for simple types. Parametric types are not yet supported, but it wouldn't be too hard to add. Types whose name don't match the syntax of SQL function calls are not supported either and will be harder to support. That will require introducing type-producing expressions into the language and making types first-class expressions.
Fixes #23795
Release notes
(x) Release notes are required, with the following suggested text: