Skip to content

Conversation

codluca
Copy link
Member

@codluca codluca commented Oct 9, 2025

Add Iceberg system.bucket function to Lakehouse

Fixes #26757

Description

LakehouseMetadata will delegate all calls related to functions to IcebergMetadata. The system.bucket function will always be delegated to Iceberg function provider.

If there is no connector selected, the full query is like this:

SELECT lakehouse.system.bucket('trino', 16)

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`26757`)

Summary by Sourcery

Introduce Iceberg 'system.bucket' function support in the Lakehouse connector by delegating function handling to the Iceberg provider and adding a new FunctionProvider implementation.

New Features:

  • Add LakehouseFunctionProvider to route the 'bucket' scalar function to Iceberg and table functions to the appropriate providers

Enhancements:

  • Delegate listFunctions, getFunctions, and getFunctionMetadata calls in LakehouseMetadata to the Iceberg provider

Tests:

  • Add end-to-end test for system.bucket function in Lakehouse

@cla-bot cla-bot bot added the cla-signed label Oct 9, 2025
Copy link

sourcery-ai bot commented Oct 9, 2025

Reviewer's Guide

This PR extends the Lakehouse connector to support the Iceberg system.bucket function by delegating function metadata and implementation to the Iceberg provider, and adds an integration test to validate its behavior.

Sequence diagram for system.bucket function delegation in LakehouseMetadata

sequenceDiagram
participant Q as "Query Engine"
participant LM as LakehouseMetadata
participant IM as IcebergMetadata
Q->>LM: listFunctions/getFunctions/getFunctionMetadata
LM->>IM: Delegate function call
IM-->>LM: Return function metadata
LM-->>Q: Return function metadata
Loading

Class diagram for LakehouseFunctionProvider and function delegation

classDiagram
class LakehouseFunctionProvider {
  +DeltaLakeFunctionProvider deltaLakeFunctionProvider
  +IcebergFunctionProvider icebergFunctionProvider
  +getScalarFunctionImplementation(FunctionId, BoundSignature, FunctionDependencies, InvocationConvention)
  +getTableFunctionProcessorProviderFactory(ConnectorTableFunctionHandle)
}
class DeltaLakeFunctionProvider
class IcebergFunctionProvider
class TableChangesTableFunctionHandle
class TableChangesFunctionHandle
LakehouseFunctionProvider --> DeltaLakeFunctionProvider : uses
LakehouseFunctionProvider --> IcebergFunctionProvider : uses
LakehouseFunctionProvider ..> TableChangesTableFunctionHandle : checks instance
LakehouseFunctionProvider ..> TableChangesFunctionHandle : checks instance
Loading

File-Level Changes

Change Details Files
Delegate function metadata calls in LakehouseMetadata to IcebergMetadata
  • Added listFunctions override
  • Added getFunctions override
  • Added getFunctionMetadata override
  • Added getFunctionDependencies override
plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseMetadata.java
Add LakehouseFunctionProvider to route 'bucket' and table functions
  • Created LakehouseFunctionProvider
  • Injected DeltaLakeFunctionProvider and IcebergFunctionProvider
  • Implemented getScalarFunctionImplementation to delegate 'bucket'
  • Implemented table function dispatch logic
plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseFunctionProvider.java
Add integration test for lakehouse.system.bucket
  • Added testSystemBucket verifying bucket output
plugin/trino-lakehouse/src/test/java/io/trino/plugin/lakehouse/TestLakehouseConnectorTest.java

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • listFunctions and getFunctions currently proxy all Iceberg functions into the Lakehouse connector, which may expose unintended functions—consider filtering to only the ‘bucket’ function or the system schema.
  • getFunctionDependencies always returns NO_DEPENDENCIES, which could miss actual dependencies for the bucket function—consider delegating to IcebergMetadata or returning the correct dependencies.
  • In LakehouseFunctionProvider, comparing functionId.toString() to "bucket" is brittle; use functionId.getName() or check the function’s namespace for a more reliable match.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- listFunctions and getFunctions currently proxy all Iceberg functions into the Lakehouse connector, which may expose unintended functions—consider filtering to only the ‘bucket’ function or the system schema.
- getFunctionDependencies always returns NO_DEPENDENCIES, which could miss actual dependencies for the bucket function—consider delegating to IcebergMetadata or returning the correct dependencies.
- In LakehouseFunctionProvider, comparing functionId.toString() to "bucket" is brittle; use functionId.getName() or check the function’s namespace for a more reliable match.

## Individual Comments

### Comment 1
<location> `plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseFunctionProvider.java:54-57` </location>
<code_context>
+            FunctionDependencies functionDependencies,
+            InvocationConvention invocationConvention)
+    {
+        if ("bucket".equals(functionId.toString())) {
+            return icebergFunctionProvider.getScalarFunctionImplementation(functionId, boundSignature, functionDependencies, invocationConvention);
+        }
+        throw new UnsupportedOperationException("%s provides only 'bucket' scalar function".formatted(getClass().getName()));
+    }
+
</code_context>

<issue_to_address>
**issue:** FunctionId string comparison may be fragile and could lead to future maintenance issues.

Relying on toString() for function identification may break if the format changes or similar names exist. Use a dedicated identifier or property for more reliable matching.
</issue_to_address>

### Comment 2
<location> `plugin/trino-lakehouse/src/test/java/io/trino/plugin/lakehouse/TestLakehouseConnectorTest.java:369-373` </location>
<code_context>
                 )\\E""");
     }
+
+    @Test
+    void testSystemBucket()
+    {
+        assertThat(query("SELECT lakehouse.system.bucket('trino', 16)"))
+                .matches("VALUES 10");
+    }
 }
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding more test cases for system.bucket to cover edge cases.

Expand test coverage by including various string inputs, boundary values for bucket count (such as 1, 0, negative, and very large numbers), and invalid inputs to verify correct behavior and robust error handling.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codluca codluca force-pushed the 26757-lakehouse-system-bucket-function branch from e60bada to f6d92dd Compare October 9, 2025 09:12
@codluca codluca force-pushed the 26757-lakehouse-system-bucket-function branch 2 times, most recently from 4c1608f to eb086a6 Compare October 9, 2025 13:19
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix CI failures.

LakehouseMetadata will delegate all calls related to functions to IcebergMetadata.
The system.bucket function will always be delegated to Iceberg function provider.

If there is no connector selected, the full query is like this:
SELECT lakehouse.system.bucket('trino', 16)
@codluca codluca force-pushed the 26757-lakehouse-system-bucket-function branch from 4cbcc56 to a89a9e5 Compare October 10, 2025 13:05
@codluca
Copy link
Member Author

codluca commented Oct 10, 2025

Please fix CI failures.

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Lakehouse connector doesn't support Iceberg system.bucket function

2 participants