-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations #5428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
1413acd
Added `customOptimizerRules` to `HoodieAnalysis`;
8ac7727
Cleaning up utils
ae95e54
Tidying up
f874974
Missing license
6fe1eae
Added `NestedSchemaPrunning` Spark's Optimizer rule;
915b71f
Handle Spark's Optimizer pruned data schema (to effectively prune nes…
f77bb5f
Injecting Spark Session extensions for `TestMORDataSource`
0115eca
Disabled fallback to `HadoopFsRelation`
552ef5a
Make sure extensions are loaded in COW/MOR tests
8a78748
Fixed compilation for Scala 2.11
73c0192
Added `JFunction` utility to convert b/w Scala/Java lambdas in Scala …
df54017
Fixed compilation for Spark 2
79aa00d
Adding `HoodieSparkSessionExtensions` to quick-start tests
7145e7a
Fixing compilation in tests
d030046
Fixing tests
dda72e8
Fixed tests
38c7aa0
Tidying up
3acb652
Internalized `canPruneRelationSchema` method w/in `HoodieBaseRelation…
fb060d1
Adding missing scala-docs;
78d074d
Tidying up
35332fc
Added test for Avro ser-/de
95800de
Made `SchemaConverters` to appropriatley transform Avro unions to Cat…
2e57aee
Disallow schema pruning for MT
2599694
Make sure we avoid unnecessary conversions for table's schema
1c69fea
Fixed union detection heuristic
3f3c874
Fixing test
32e8656
Cleaning up handling of `InternalSchema`
e2611ae
Restore fallback to `HadoopFsRelation`
5516a93
Fixed handling of Schema Evolution case when actual table's schema ha…
cc86798
Fixing compilation
9b5f303
Fixing Spark version checkers
85b5462
Extracted Spark version checkers into a standalone trait;
6ae8896
Extracted all Catalyst `LogicalPlan` related utilities from `SparkAda…
3e5c2a6
Missing license
991a4ec
Bifurcated Spark 3.1 vs 3.2 `HoodieCatalystPlanUtils` implementations
c639c63
Tidying up
7d28c25
Missing license
a783eb8
Added `createExplainCommand` to `HoodieCatalystPlanUtils`
0ea00e3
Added test for `NestedSchemaPruning` optimization;
eea51dd
Fixing tests for Spark 2.4;
2ec2f72
Fixed test for Spark 2.4
6c4d2fa
Fixing compilation
019ce76
Tidying up
f74fa2e
Clean up superfluous flag (MT table will be ruled out by file-format …
0caadf9
Fixing partition-path extraction for globbed paths
e0a442b
Fixed reading t/h globbed paths to properly handle case of partitione…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
77 changes: 77 additions & 0 deletions
77
...ient/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystPlansUtils.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql | ||
|
|
||
| import org.apache.spark.sql.catalyst.{AliasIdentifier, TableIdentifier} | ||
| import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation | ||
| import org.apache.spark.sql.catalyst.expressions.Expression | ||
| import org.apache.spark.sql.catalyst.plans.JoinType | ||
| import org.apache.spark.sql.catalyst.plans.logical.{Join, LogicalPlan} | ||
|
|
||
| trait HoodieCatalystPlansUtils { | ||
|
|
||
| def createExplainCommand(plan: LogicalPlan, extended: Boolean): LogicalPlan | ||
|
|
||
| /** | ||
| * Convert a AliasIdentifier to TableIdentifier. | ||
| */ | ||
| def toTableIdentifier(aliasId: AliasIdentifier): TableIdentifier | ||
|
|
||
| /** | ||
| * Convert a UnresolvedRelation to TableIdentifier. | ||
| */ | ||
| def toTableIdentifier(relation: UnresolvedRelation): TableIdentifier | ||
|
|
||
| /** | ||
| * Create Join logical plan. | ||
| */ | ||
| def createJoin(left: LogicalPlan, right: LogicalPlan, joinType: JoinType): Join | ||
|
|
||
| /** | ||
| * Test if the logical plan is a Insert Into LogicalPlan. | ||
| */ | ||
| def isInsertInto(plan: LogicalPlan): Boolean | ||
|
|
||
| /** | ||
| * Get the member of the Insert Into LogicalPlan. | ||
| */ | ||
| def getInsertIntoChildren(plan: LogicalPlan): | ||
| Option[(LogicalPlan, Map[String, Option[String]], LogicalPlan, Boolean, Boolean)] | ||
|
|
||
| /** | ||
| * if the logical plan is a TimeTravelRelation LogicalPlan. | ||
| */ | ||
| def isRelationTimeTravel(plan: LogicalPlan): Boolean | ||
|
|
||
| /** | ||
| * Get the member of the TimeTravelRelation LogicalPlan. | ||
| */ | ||
| def getRelationTimeTravel(plan: LogicalPlan): Option[(LogicalPlan, Option[Expression], Option[String])] | ||
|
|
||
| /** | ||
| * Create a Insert Into LogicalPlan. | ||
| */ | ||
| def createInsertInto(table: LogicalPlan, partition: Map[String, Option[String]], | ||
| query: LogicalPlan, overwrite: Boolean, ifPartitionNotExists: Boolean): LogicalPlan | ||
|
|
||
| /** | ||
| * Create Like expression. | ||
| */ | ||
| def createLike(left: Expression, right: Expression): Expression | ||
|
|
||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.