-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add projection push down for STRUCT field in big-query connector #23443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add projection push down for STRUCT field in big-query connector #23443
Conversation
|
Please add a simple Take the description from the PR #17085 as reference. |
Praveen2112
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/TestBigQueryMetadata.java
Outdated
Show resolved
Hide resolved
430945f to
7e51bce
Compare
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially create Set for the parent column names in case of column handles provided by the split and passed as a part of PageSource -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set<String> projectedColumnNames = bigQuerySplit.getColumns().stream().map(BigQueryColumnHandle::name).collect(Collectors.toSet());
checkArgument(bigQuerySplit.getColumns().isEmpty() || bigQuerySplit.getColumns().map(BigQueryColumnHandle::name).collect(Collectors.toSet()).equals(columns),
"Requested columns %s do not match list in split %s", columns, bigQuerySplit.getColumns());
7e51bce to
70b8046
Compare
| for (int index : indices) { | ||
| checkArgument(type instanceof RowType, "type should be Row type"); | ||
| RowType rowType = (RowType) type; | ||
| RowType.Field field = rowType.getFields().get(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can field be null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope no:
public static ProjectedColumnRepresentation createProjectedColumnRepresentation(ConnectorExpression expression)
{
ImmutableList.Builder<Integer> ordinals = ImmutableList.builder();
Variable target;
while (true) {
if (expression instanceof Variable variable) {
target = variable;
break;
}
if (expression instanceof FieldDereference dereference) {
ordinals.add(dereference.getField());
expression = dereference.getTarget();
}
else {
throw new IllegalArgumentException("expression is not a valid dereference chain");
}
}
return new ProjectedColumnRepresentation(target, ordinals.build().reverse());
}
|
/test-with-secrets sha=70b8046a133f404c46259f00fa37f6caba232de8 |
|
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/10931512904 |
|
Do we need any doc update here @ebyhr @vlad-lyutenko |
Description
This PR implements dereference projection pushdown for BigQuery connector(similar to #17085).
This adds significant performance improvements for queries accessing nested fields inside struct/row columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.
For Example:
I have a table having a nested field
root. When perform selectingroot.f1, we can see the difference inInputandPhysical Inputvalues in the query plan when running with and without dereference pushdown.Table Schema as below:
Query Plan without Dereference pushdown:
Query Plan with Dereference pushdown:
Additional context and related issues
The feature is enabled by default.
The feature can be disabled by setting bigquery.projection-pushdown-enabled configuration property or bigquery.projection_pushdown_enabled session property to false.
Release notes
(X) Release notes are required, with the following suggested text: