-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add Iceberg system.bucket function to Lakehouse #26888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Iceberg system.bucket function to Lakehouse #26888
Conversation
Reviewer's GuideThis PR extends the Lakehouse connector to support the Iceberg system.bucket function by delegating function metadata and implementation to the Iceberg provider, and adds an integration test to validate its behavior. Sequence diagram for system.bucket function delegation in LakehouseMetadatasequenceDiagram
participant Q as "Query Engine"
participant LM as LakehouseMetadata
participant IM as IcebergMetadata
Q->>LM: listFunctions/getFunctions/getFunctionMetadata
LM->>IM: Delegate function call
IM-->>LM: Return function metadata
LM-->>Q: Return function metadata
Class diagram for LakehouseFunctionProvider and function delegationclassDiagram
class LakehouseFunctionProvider {
+DeltaLakeFunctionProvider deltaLakeFunctionProvider
+IcebergFunctionProvider icebergFunctionProvider
+getScalarFunctionImplementation(FunctionId, BoundSignature, FunctionDependencies, InvocationConvention)
+getTableFunctionProcessorProviderFactory(ConnectorTableFunctionHandle)
}
class DeltaLakeFunctionProvider
class IcebergFunctionProvider
class TableChangesTableFunctionHandle
class TableChangesFunctionHandle
LakehouseFunctionProvider --> DeltaLakeFunctionProvider : uses
LakehouseFunctionProvider --> IcebergFunctionProvider : uses
LakehouseFunctionProvider ..> TableChangesTableFunctionHandle : checks instance
LakehouseFunctionProvider ..> TableChangesFunctionHandle : checks instance
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- listFunctions and getFunctions currently proxy all Iceberg functions into the Lakehouse connector, which may expose unintended functions—consider filtering to only the ‘bucket’ function or the system schema.
- getFunctionDependencies always returns NO_DEPENDENCIES, which could miss actual dependencies for the bucket function—consider delegating to IcebergMetadata or returning the correct dependencies.
- In LakehouseFunctionProvider, comparing functionId.toString() to "bucket" is brittle; use functionId.getName() or check the function’s namespace for a more reliable match.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- listFunctions and getFunctions currently proxy all Iceberg functions into the Lakehouse connector, which may expose unintended functions—consider filtering to only the ‘bucket’ function or the system schema.
- getFunctionDependencies always returns NO_DEPENDENCIES, which could miss actual dependencies for the bucket function—consider delegating to IcebergMetadata or returning the correct dependencies.
- In LakehouseFunctionProvider, comparing functionId.toString() to "bucket" is brittle; use functionId.getName() or check the function’s namespace for a more reliable match.
## Individual Comments
### Comment 1
<location> `plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseFunctionProvider.java:54-57` </location>
<code_context>
+ FunctionDependencies functionDependencies,
+ InvocationConvention invocationConvention)
+ {
+ if ("bucket".equals(functionId.toString())) {
+ return icebergFunctionProvider.getScalarFunctionImplementation(functionId, boundSignature, functionDependencies, invocationConvention);
+ }
+ throw new UnsupportedOperationException("%s provides only 'bucket' scalar function".formatted(getClass().getName()));
+ }
+
</code_context>
<issue_to_address>
**issue:** FunctionId string comparison may be fragile and could lead to future maintenance issues.
Relying on toString() for function identification may break if the format changes or similar names exist. Use a dedicated identifier or property for more reliable matching.
</issue_to_address>
### Comment 2
<location> `plugin/trino-lakehouse/src/test/java/io/trino/plugin/lakehouse/TestLakehouseConnectorTest.java:369-373` </location>
<code_context>
)\\E""");
}
+
+ @Test
+ void testSystemBucket()
+ {
+ assertThat(query("SELECT lakehouse.system.bucket('trino', 16)"))
+ .matches("VALUES 10");
+ }
}
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding more test cases for system.bucket to cover edge cases.
Expand test coverage by including various string inputs, boundary values for bucket count (such as 1, 0, negative, and very large numbers), and invalid inputs to verify correct behavior and robust error handling.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseFunctionProvider.java
Outdated
Show resolved
Hide resolved
e60bada
to
f6d92dd
Compare
plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseFunctionProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseFunctionProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-lakehouse/src/test/java/io/trino/plugin/lakehouse/TestLakehouseConnectorTest.java
Outdated
Show resolved
Hide resolved
4c1608f
to
eb086a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix CI failures.
LakehouseMetadata will delegate all calls related to functions to IcebergMetadata. The system.bucket function will always be delegated to Iceberg function provider. If there is no connector selected, the full query is like this: SELECT lakehouse.system.bucket('trino', 16)
4cbcc56
to
a89a9e5
Compare
Fixed |
Add Iceberg system.bucket function to Lakehouse
Fixes #26757
Description
LakehouseMetadata will delegate all calls related to functions to IcebergMetadata. The system.bucket function will always be delegated to Iceberg function provider.
If there is no connector selected, the full query is like this:
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Summary by Sourcery
Introduce Iceberg 'system.bucket' function support in the Lakehouse connector by delegating function handling to the Iceberg provider and adding a new FunctionProvider implementation.
New Features:
Enhancements:
Tests: