TVF Part 5/X: Complete Analysis for table functions#26071
TVF Part 5/X: Complete Analysis for table functions#26071mohsaka merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideThis PR completes the analyzer implementation for SQL-invoked table functions by fully wiring argument binding, result‐type derivation, partitioning/order/prune semantics, co‐partitioning, alias handling, and related validations, and augments the test suite accordingly. Sequence diagram for table function invocation analysis and validationsequenceDiagram
participant "StatementAnalyzer"
participant "Analysis"
participant "TableFunction"
participant "TableFunctionInvocationAnalysis"
"StatementAnalyzer"->>"TableFunction": analyze(session, transactionHandle, arguments)
"TableFunction"-->>"StatementAnalyzer": TableFunctionAnalysis
"StatementAnalyzer"->>"Analysis": setTableFunctionAnalysis(node, TableFunctionInvocationAnalysis)
"StatementAnalyzer"->>"Analysis": addPolymorphicTableFunction(node) (if applicable)
"StatementAnalyzer"->>"Analysis": setRelationName(relation, name)
"StatementAnalyzer"->>"Analysis": addAliased(relation)
"StatementAnalyzer"->>"Analysis": isAliased(node)
"StatementAnalyzer"->>"Analysis": isPolymorphicTableFunction(node)
"StatementAnalyzer"->>"TableFunctionInvocationAnalysis": validate required columns, copartitioning, result type
"StatementAnalyzer"-->>"Analysis": createAndAssignScope(node, scope, fields)
Class diagram for new and updated table function analysis typesclassDiagram
class StatementAnalyzer {
+visitTableFunctionInvocation()
+analyzeArguments()
+analyzeCopartitioning()
+validateNoNestedTableFunction()
+aliasTableFunctionInvocation()
+ArgumentAnalysis
+ArgumentsAnalysis
}
class Analysis {
+setRelationName()
+getRelationName()
+addAliased()
+isAliased()
+addPolymorphicTableFunction()
+isPolymorphicTableFunction()
+TableArgumentAnalysis
+TableFunctionInvocationAnalysis
}
class TableArgumentAnalysis {
+argumentName: String
+name: Optional<QualifiedName>
+relation: Relation
+partitionBy: Optional<List<Expression>>
+orderBy: Optional<OrderBy>
+pruneWhenEmpty: boolean
+rowSemantics: boolean
+passThroughColumns: boolean
+Builder
}
class TableFunctionInvocationAnalysis {
+connectorId: ConnectorId
+functionName: String
+arguments: Map<String, Argument>
+tableArgumentAnalyses: List<TableArgumentAnalysis>
+requiredColumns: Map<String, List<Integer>>
+copartitioningLists: List<List<String>>
+properColumnsCount: int
+connectorTableFunctionHandle: ConnectorTableFunctionHandle
+transactionHandle: ConnectorTransactionHandle
}
class ArgumentAnalysis {
+argument: Argument
+tableArgumentAnalysis: Optional<TableArgumentAnalysis>
}
class ArgumentsAnalysis {
+passedArguments: Map<String, Argument>
+tableArgumentAnalyses: List<TableArgumentAnalysis>
}
StatementAnalyzer --> ArgumentAnalysis
StatementAnalyzer --> ArgumentsAnalysis
Analysis --> TableArgumentAnalysis
Analysis --> TableFunctionInvocationAnalysis
TableFunctionInvocationAnalysis --> TableArgumentAnalysis
ArgumentAnalysis --> TableArgumentAnalysis
ArgumentsAnalysis --> TableArgumentAnalysis
TableArgumentAnalysis <|.. TableArgumentAnalysis.Builder
Flow diagram for argument analysis and binding in table function invocationflowchart TD
A["TableFunctionInvocation node"] --> B["analyzeArguments()"]
B --> C["ArgumentsAnalysis (passedArguments, tableArgumentAnalyses)"]
C --> D["TableFunction.analyze()"]
D --> E["TableFunctionAnalysis"]
E --> F["Validate required columns"]
F --> G["Analyze copartitioning"]
G --> H["Determine result relation type"]
H --> I["Create TableFunctionInvocationAnalysis"]
I --> J["Assign scope and fields"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The visitTableFunctionInvocation method has grown very large—consider extracting argument mapping, return‐type resolution, and copartitioning into separate helper classes or methods to improve readability and maintainability.
- The analyzeCopartitioning logic contains multiple distinct steps (resolving names, validating partitions, type coercion)—splitting it into smaller focused methods would reduce cognitive load and make testing easier.
- There is duplicated alias and conflict validation between aliasTableFunctionInvocation and the planner—centralizing common validation routines could help avoid divergence and simplify future changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The visitTableFunctionInvocation method has grown very large—consider extracting argument mapping, return‐type resolution, and copartitioning into separate helper classes or methods to improve readability and maintainability.
- The analyzeCopartitioning logic contains multiple distinct steps (resolving names, validating partitions, type coercion)—splitting it into smaller focused methods would reduce cognitive load and make testing easier.
- There is duplicated alias and conflict validation between aliasTableFunctionInvocation and the planner—centralizing common validation routines could help avoid divergence and simplify future changes.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java:1722-1731` </location>
<code_context>
+ if (tableArgument.getOrderBy().isPresent()) {
</code_context>
<issue_to_address>
**suggestion (bug_risk):** OrderBy validation does not check for duplicate sort keys.
Please add a check to ensure sort keys in the ORDER BY clause are unique to prevent ambiguity.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java:1897-1899` </location>
<code_context>
+ }
+
+ // coerce corresponding copartition columns to common supertype
+ for (int index = 0; index < partitioningColumns.get(0).size(); index++) {
+ Type commonSuperType = analysis.getType(partitioningColumns.get(0).get(index));
+ // find common supertype
</code_context>
<issue_to_address>
**suggestion:** The coercion logic for partitioning columns may not handle all type mismatch scenarios.
Consider providing more informative error messages or guidance to assist users in resolving type mismatches when coercion fails.
```suggestion
if (!superType.isPresent()) {
String columnName = column.toString();
Type leftType = commonSuperType;
Type rightType = analysis.getType(columnList.get(index));
String errorMessage = String.format(
"Partitioning columns in copartitioned tables have incompatible types at column index %d ('%s'): %s vs %s. " +
"Please ensure that all copartitioned columns at this position have compatible types or can be coerced to a common supertype.",
index,
columnName,
leftType.getDisplayName(),
rightType.getDisplayName()
);
throw new SemanticException(TYPE_MISMATCH, nameList.get(0).getOriginalParts().get(0), errorMessage);
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks for the code @mohsaka.
Have only one minor comment.
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java
Show resolved
Hide resolved
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java
Show resolved
Hide resolved
| TOO_MANY_GROUPING_SETS, | ||
|
|
||
| INVALID_OFFSET_ROW_COUNT, | ||
| INVALID_COPARTITIONING, |
There was a problem hiding this comment.
I am wondering if we should have separate error codes for table functions, like TABLE_FUNCTION_.*
| .findFirst().ifPresent(unpartitioned -> { | ||
| throw new SemanticException(TABLE_FUNCTION_INVALID_COPARTITIONING, unpartitioned.getRelation(), "Table %s referenced in COPARTITION clause is not partitioned", unpartitioned.getName().orElseThrow(() -> new IllegalStateException("Missing unpartitioned TableArgumentAnalysis name"))); | ||
| }); | ||
| // TODO make sure that copartitioned tables cannot have empty partitioning lists. |
There was a problem hiding this comment.
Can we log a github issue and link it
There was a problem hiding this comment.
Done. Added issue number 26147 to TODO
| QualifiedObjectName functionName = new QualifiedObjectName(connectorId.getCatalogName(), function.getSchema(), function.getName()); | ||
|
|
||
| Map<String, Argument> passedArguments = analyzeArguments(node, function.getArguments(), node.getArguments()); | ||
| ArgumentsAnalysis argumentsAnalysis = analyzeArguments(node, function.getArguments(), scope); |
There was a problem hiding this comment.
This function visitTableFunctionInvocation is becoming too large (> 100 lines), can we split the logic into smaller functions? preferrable 2-3
There was a problem hiding this comment.
Added two helper functions
verifyInputColumns
verifyProperColumnsDescriptor
| @@ -1346,95 +1361,181 @@ protected Scope visitTableFunctionInvocation(TableFunctionInvocation node, Optio | |||
| ConnectorId connectorId = tableFunctionMetadata.getConnectorId(); | |||
|
|
|||
| QualifiedObjectName functionName = new QualifiedObjectName(connectorId.getCatalogName(), function.getSchema(), function.getName()); | |||
There was a problem hiding this comment.
This variable is not used and can be removed
| } | ||
|
|
||
| @Test | ||
| public void testTableArgument() |
There was a problem hiding this comment.
Can we add some tests for complex usages like in joins, nested selects / join within argugments? or do you plan to add integration tests?
There was a problem hiding this comment.
There is one
// query passed as the argument is correlated
analyze("SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM TABLE(system.table_argument_function(input => TABLE(SELECT 1 WHERE a > 0))))");
But I agree we need some more. We do plan on adding integration tests after adding our two example table functions.
| } | ||
|
|
||
| protected void assertFails(Session session, SemanticErrorCode error, String message, @Language("SQL") String query) | ||
| protected void assertFails(Session session, SemanticErrorCode error, String message, @Language("SQL") String query, boolean exact) |
There was a problem hiding this comment.
What does this exact represent?
There was a problem hiding this comment.
Exact match is required because matches is actually a regular expression match. Due to the messages having a lot of regex special characters, to have an actual string exact match we need to use .equals.
Example, one of the messages is
"line 1:57: Invalid descriptor argument SCHEMA. Descriptors should be formatted as 'DESCRIPTOR(name [type], ...)'"
The [(. characters are all special regex characters which would cause the match to fail.
| private RowType rowType; | ||
| private List<String> partitionBy = Collections.emptyList(); | ||
| private List<String> orderBy = Collections.emptyList(); | ||
| private List<String> partitionBy = new ArrayList<>(); |
There was a problem hiding this comment.
This issue was found while running the test cases. Basically Collections.emptyList() is actually immutable. So I should have not used it in the first place.
Trino had List.of(), which is supported only in Java 9+.
|
@jaystarshot Could I get an approval if everything looks good? Thanks! |
| return createAndAssignScope(node, scope, fields.build()); | ||
| } | ||
|
|
||
| private void verifyInputColumns(TableFunctionInvocation node, Map<String, List<Integer>> requiredColumns, Map<String, TableArgumentAnalysis> tableArgumentsByName) |
There was a problem hiding this comment.
Nit : Rename to verifyRequiredColumns.
Changes adapted from trino/PR#13602, PR#13653, PR#14115 Original commit: dbef4d9be37494967496230573ab400e54aab0d9 Author: kasiafi Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com> Co-authored-by: Xin Zhang <desertsxin@gmail.com>
Description
This PR contains all of changes pertaining to the Analysis of table functions.
There should not be any more Analysis changes after this PR, unless there is refactoring requested or I missed something.
Changes adapted from trino/PR#13602, PR#13653, PR#14115, PR#15256, PR#16058
Motivation and Context
Impact
Test Plan
Added tests in TestAnalyzer
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.