Skip to content

TVF Part 5/X: Complete Analysis for table functions#26071

Merged
mohsaka merged 1 commit intoprestodb:masterfrom
mohsaka:tvf-part-2
Sep 26, 2025
Merged

TVF Part 5/X: Complete Analysis for table functions#26071
mohsaka merged 1 commit intoprestodb:masterfrom
mohsaka:tvf-part-2

Conversation

@mohsaka
Copy link
Copy Markdown
Contributor

@mohsaka mohsaka commented Sep 18, 2025

Description

This PR contains all of changes pertaining to the Analysis of table functions.

There should not be any more Analysis changes after this PR, unless there is refactoring requested or I missed something.

Changes adapted from trino/PR#13602, PR#13653, PR#14115, PR#15256, PR#16058

Motivation and Context

Impact

Test Plan

Added tests in TestAnalyzer

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Sep 18, 2025
@prestodb-ci prestodb-ci requested review from a team, aaneja and jkhaliqi and removed request for a team September 18, 2025 02:30
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Sep 18, 2025

Reviewer's Guide

This PR completes the analyzer implementation for SQL-invoked table functions by fully wiring argument binding, result‐type derivation, partitioning/order/prune semantics, co‐partitioning, alias handling, and related validations, and augments the test suite accordingly.

Sequence diagram for table function invocation analysis and validation

sequenceDiagram
    participant "StatementAnalyzer"
    participant "Analysis"
    participant "TableFunction"
    participant "TableFunctionInvocationAnalysis"
    "StatementAnalyzer"->>"TableFunction": analyze(session, transactionHandle, arguments)
    "TableFunction"-->>"StatementAnalyzer": TableFunctionAnalysis
    "StatementAnalyzer"->>"Analysis": setTableFunctionAnalysis(node, TableFunctionInvocationAnalysis)
    "StatementAnalyzer"->>"Analysis": addPolymorphicTableFunction(node) (if applicable)
    "StatementAnalyzer"->>"Analysis": setRelationName(relation, name)
    "StatementAnalyzer"->>"Analysis": addAliased(relation)
    "StatementAnalyzer"->>"Analysis": isAliased(node)
    "StatementAnalyzer"->>"Analysis": isPolymorphicTableFunction(node)
    "StatementAnalyzer"->>"TableFunctionInvocationAnalysis": validate required columns, copartitioning, result type
    "StatementAnalyzer"-->>"Analysis": createAndAssignScope(node, scope, fields)
Loading

Class diagram for new and updated table function analysis types

classDiagram
    class StatementAnalyzer {
        +visitTableFunctionInvocation()
        +analyzeArguments()
        +analyzeCopartitioning()
        +validateNoNestedTableFunction()
        +aliasTableFunctionInvocation()
        +ArgumentAnalysis
        +ArgumentsAnalysis
    }
    class Analysis {
        +setRelationName()
        +getRelationName()
        +addAliased()
        +isAliased()
        +addPolymorphicTableFunction()
        +isPolymorphicTableFunction()
        +TableArgumentAnalysis
        +TableFunctionInvocationAnalysis
    }
    class TableArgumentAnalysis {
        +argumentName: String
        +name: Optional<QualifiedName>
        +relation: Relation
        +partitionBy: Optional<List<Expression>>
        +orderBy: Optional<OrderBy>
        +pruneWhenEmpty: boolean
        +rowSemantics: boolean
        +passThroughColumns: boolean
        +Builder
    }
    class TableFunctionInvocationAnalysis {
        +connectorId: ConnectorId
        +functionName: String
        +arguments: Map<String, Argument>
        +tableArgumentAnalyses: List<TableArgumentAnalysis>
        +requiredColumns: Map<String, List<Integer>>
        +copartitioningLists: List<List<String>>
        +properColumnsCount: int
        +connectorTableFunctionHandle: ConnectorTableFunctionHandle
        +transactionHandle: ConnectorTransactionHandle
    }
    class ArgumentAnalysis {
        +argument: Argument
        +tableArgumentAnalysis: Optional<TableArgumentAnalysis>
    }
    class ArgumentsAnalysis {
        +passedArguments: Map<String, Argument>
        +tableArgumentAnalyses: List<TableArgumentAnalysis>
    }
    StatementAnalyzer --> ArgumentAnalysis
    StatementAnalyzer --> ArgumentsAnalysis
    Analysis --> TableArgumentAnalysis
    Analysis --> TableFunctionInvocationAnalysis
    TableFunctionInvocationAnalysis --> TableArgumentAnalysis
    ArgumentAnalysis --> TableArgumentAnalysis
    ArgumentsAnalysis --> TableArgumentAnalysis
    TableArgumentAnalysis <|.. TableArgumentAnalysis.Builder
Loading

Flow diagram for argument analysis and binding in table function invocation

flowchart TD
    A["TableFunctionInvocation node"] --> B["analyzeArguments()"]
    B --> C["ArgumentsAnalysis (passedArguments, tableArgumentAnalyses)"]
    C --> D["TableFunction.analyze()"]
    D --> E["TableFunctionAnalysis"]
    E --> F["Validate required columns"]
    F --> G["Analyze copartitioning"]
    G --> H["Determine result relation type"]
    H --> I["Create TableFunctionInvocationAnalysis"]
    I --> J["Assign scope and fields"]
Loading

File-Level Changes

Change Details Files
Replace simple Map<String,Argument> with ArgumentAnalysis/ArgumentsAnalysis to track scalar, descriptor, and table arguments
  • Introduced ArgumentAnalysis and ArgumentsAnalysis classes
  • analyzeArguments returns ArgumentsAnalysis
  • mapTableFunctionsArgsByName/Position now builds TableArgumentAnalysis list
StatementAnalyzer.java
Analysis.java
Implement full table argument processing
  • process TABLE(...) relation into TableArgumentAnalysis
  • validate PARTITION BY, ORDER BY, PRUNE/KEEP semantics
  • build TableArgument builder with rowType/partitionBy/orderBy/prune
StatementAnalyzer.java
Add COPARTITION clause support
  • Resolve argument names (unqualified/qualified)
  • Validate partitioning consistency and type coercions
  • Collect and record copartitioningLists
StatementAnalyzer.java
Derive table function result schema combining proper and pass‐through columns
  • Switch on ReturnTypeSpecification to enforce alias rules
  • Build result RelationType: proper columns first, then pass‐through/partition columns
  • Attach new TableFunctionInvocationAnalysis with detailed metadata
StatementAnalyzer.java
Analysis.java
Enforce aliasing and sampling rules for polymorphic table functions
  • Track aliased and polymorphic TVFs in Analysis
  • aliasTableFunctionInvocation applies column aliasing rules
  • validateNoNestedTableFunction forbids TABLESAMPLE on polymorphic TVFs
StatementAnalyzer.java
Analysis.java
RelationPlanner.java
Expand test suite for new TVF features
  • Add tests for table arguments, descriptor args, copartitioning, requiredColumns, aliasing, sampling
  • Register new test table functions in AbstractAnalyzerTest
TestAnalyzer.java
TestingTableFunctions.java
AbstractAnalyzerTest.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The visitTableFunctionInvocation method has grown very large—consider extracting argument mapping, return‐type resolution, and copartitioning into separate helper classes or methods to improve readability and maintainability.
  • The analyzeCopartitioning logic contains multiple distinct steps (resolving names, validating partitions, type coercion)—splitting it into smaller focused methods would reduce cognitive load and make testing easier.
  • There is duplicated alias and conflict validation between aliasTableFunctionInvocation and the planner—centralizing common validation routines could help avoid divergence and simplify future changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The visitTableFunctionInvocation method has grown very large—consider extracting argument mapping, return‐type resolution, and copartitioning into separate helper classes or methods to improve readability and maintainability.
- The analyzeCopartitioning logic contains multiple distinct steps (resolving names, validating partitions, type coercion)—splitting it into smaller focused methods would reduce cognitive load and make testing easier.
- There is duplicated alias and conflict validation between aliasTableFunctionInvocation and the planner—centralizing common validation routines could help avoid divergence and simplify future changes.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java:1722-1731` </location>
<code_context>
+            if (tableArgument.getOrderBy().isPresent()) {
</code_context>

<issue_to_address>
**suggestion (bug_risk):** OrderBy validation does not check for duplicate sort keys.

Please add a check to ensure sort keys in the ORDER BY clause are unique to prevent ambiguity.
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java:1897-1899` </location>
<code_context>
+                }
+
+                // coerce corresponding copartition columns to common supertype
+                for (int index = 0; index < partitioningColumns.get(0).size(); index++) {
+                    Type commonSuperType = analysis.getType(partitioningColumns.get(0).get(index));
+                    // find common supertype
</code_context>

<issue_to_address>
**suggestion:** The coercion logic for partitioning columns may not handle all type mismatch scenarios.

Consider providing more informative error messages or guidance to assist users in resolving type mismatches when coercion fails.

```suggestion
                        if (!superType.isPresent()) {
                            String columnName = column.toString();
                            Type leftType = commonSuperType;
                            Type rightType = analysis.getType(columnList.get(index));
                            String errorMessage = String.format(
                                "Partitioning columns in copartitioned tables have incompatible types at column index %d ('%s'): %s vs %s. " +
                                "Please ensure that all copartitioned columns at this position have compatible types or can be coerced to a common supertype.",
                                index,
                                columnName,
                                leftType.getDisplayName(),
                                rightType.getDisplayName()
                            );
                            throw new SemanticException(TYPE_MISMATCH, nameList.get(0).getOriginalParts().get(0), errorMessage);
                        }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code @mohsaka.

Have only one minor comment.

aditi-pandit
aditi-pandit previously approved these changes Sep 23, 2025
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mohsaka

TOO_MANY_GROUPING_SETS,

INVALID_OFFSET_ROW_COUNT,
INVALID_COPARTITIONING,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should have separate error codes for table functions, like TABLE_FUNCTION_.*

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.findFirst().ifPresent(unpartitioned -> {
throw new SemanticException(TABLE_FUNCTION_INVALID_COPARTITIONING, unpartitioned.getRelation(), "Table %s referenced in COPARTITION clause is not partitioned", unpartitioned.getName().orElseThrow(() -> new IllegalStateException("Missing unpartitioned TableArgumentAnalysis name")));
});
// TODO make sure that copartitioned tables cannot have empty partitioning lists.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log a github issue and link it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added issue number 26147 to TODO

QualifiedObjectName functionName = new QualifiedObjectName(connectorId.getCatalogName(), function.getSchema(), function.getName());

Map<String, Argument> passedArguments = analyzeArguments(node, function.getArguments(), node.getArguments());
ArgumentsAnalysis argumentsAnalysis = analyzeArguments(node, function.getArguments(), scope);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function visitTableFunctionInvocation is becoming too large (> 100 lines), can we split the logic into smaller functions? preferrable 2-3

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two helper functions
verifyInputColumns
verifyProperColumnsDescriptor

@@ -1346,95 +1361,181 @@ protected Scope visitTableFunctionInvocation(TableFunctionInvocation node, Optio
ConnectorId connectorId = tableFunctionMetadata.getConnectorId();

QualifiedObjectName functionName = new QualifiedObjectName(connectorId.getCatalogName(), function.getSchema(), function.getName());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not used and can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@Test
public void testTableArgument()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for complex usages like in joins, nested selects / join within argugments? or do you plan to add integration tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one

        // query passed as the argument is correlated
        analyze("SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM TABLE(system.table_argument_function(input => TABLE(SELECT 1 WHERE a > 0))))");

But I agree we need some more. We do plan on adding integration tests after adding our two example table functions.

}

protected void assertFails(Session session, SemanticErrorCode error, String message, @Language("SQL") String query)
protected void assertFails(Session session, SemanticErrorCode error, String message, @Language("SQL") String query, boolean exact)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this exact represent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact match is required because matches is actually a regular expression match. Due to the messages having a lot of regex special characters, to have an actual string exact match we need to use .equals.

Example, one of the messages is
"line 1:57: Invalid descriptor argument SCHEMA. Descriptors should be formatted as 'DESCRIPTOR(name [type], ...)'"

The [(. characters are all special regex characters which would cause the match to fail.

Copy link
Copy Markdown
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

private RowType rowType;
private List<String> partitionBy = Collections.emptyList();
private List<String> orderBy = Collections.emptyList();
private List<String> partitionBy = new ArrayList<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue was found while running the test cases. Basically Collections.emptyList() is actually immutable. So I should have not used it in the first place.

Trino had List.of(), which is supported only in Java 9+.

aditi-pandit
aditi-pandit previously approved these changes Sep 25, 2025
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mohsaka

@mohsaka
Copy link
Copy Markdown
Contributor Author

mohsaka commented Sep 25, 2025

@jaystarshot Could I get an approval if everything looks good? Thanks!

return createAndAssignScope(node, scope, fields.build());
}

private void verifyInputColumns(TableFunctionInvocation node, Map<String, List<Integer>> requiredColumns, Map<String, TableArgumentAnalysis> tableArgumentsByName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Rename to verifyRequiredColumns.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Changes adapted from trino/PR#13602, PR#13653, PR#14115
Original commit: dbef4d9be37494967496230573ab400e54aab0d9
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Co-authored-by: Xin Zhang <desertsxin@gmail.com>
@mohsaka mohsaka merged commit 9ea6d39 into prestodb:master Sep 26, 2025
74 checks passed
@mohsaka mohsaka mentioned this pull request Sep 30, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants