-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 #9895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @codope |
| | IN | ||
| | INDEX | ||
| | INDEXES | ||
| | IF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need some of these tokens for other SQL statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I checked and only remove unneeded.
| val columnsMap: java.util.LinkedHashMap[String, java.util.Map[String, String]] = | ||
| new util.LinkedHashMap[String, java.util.Map[String, String]]() | ||
| columns.map(c => columnsMap.put(c._1.name, c._2.asJava)) | ||
| columns.map(c => columnsMap.put(c._1.mkString("."), c._2.asJava)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this? for nested fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now columns' name is Seq[String] instead of Attribute(UnresolvedAttribute to be more specific), since this pr will try to resolve column names
Line 208 in 6eb38c5
| case (u: UnresolvedFieldName, prop) => resolveFieldNames(cmd.table, u.name, u) -> prop |
UnresolvedAttribute is no need anymore, so here I directly use Seq[String] to represent the column name here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
| ctx.accept(this).asInstanceOf[T] | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of this code copied from Spark 3.3+? Wondering if the original PR introducing this intends to support CreateIndex for Spark 3.2 and below. cc @huberylee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These codes are introduced from #5761, maybe it's only for supporting Index commands
| | ) | ||
| | partitioned by(ts) | ||
| | location '$basePath' | ||
| if (HoodieSparkUtils.gteqSpark3_2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like TestSecondaryIndex should also have a precondition on the spark version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, will fix it as well
| UnresolvedFieldPosition(typedVisit[ColumnPosition](pos)))) | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. So at least CreateIndex is still supported in Spark 3.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
| | createTableHeader ('(' colTypeList ')')? tableProvider? | ||
| createTableClauses | ||
| (AS? query)? #createTable | ||
| | CREATE INDEX (IF NOT EXISTS)? identifier ON TABLE? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we still maintain the grammar in a single place for all Spark versions, but fail the logical plan of INDEX SQL statement in Spark 3.1 and below, so the grammar can be easily maintained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot put the grammar in HoodieSqlCommon.g4, since we have to parse it in HoodieSqlCommonAstBuilder to build CreateIndex, while one param FieldName doesn't exist below Spark3.
We already have this for time travel, and grammar is rarely modified, so changing it like so is acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. @codope is working on functional indexes in Hudi and may extend the grammar here. So I'm thinking if it's not too hard to put the grammar in a common place, we should do that.
| UnresolvedFieldPosition(typedVisit[ColumnPosition](pos)))) | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the SQL parsing of INDEX SQL statement should not be different across Spark versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it should be same
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Change Logs
CreateIndex is added in HUDI-4165, and spark 3.3 also include this in SPARK-36895. Since
CreateIndexuses same packageorg.apache.spark.sql.catalyst.plans.logicalin HUDI and Spark3.3, but params are not same. So it could introduce class conflict issues if we use it.One simple modification to fix this issue is we directly move
CreateIndexto a different package path, but I think it's not proper to do, givenCreateIndexorg.apache.spark.sql.catalyst.plans.logicalfor logical plan(likeMergeInto)So here I still keep the same package path with Spark, but change
org.apache.spark.sql.catalyst.analysis.FieldNamebutCreateIndexrequires, so we only support Index related commands from Spark3.2(I can also add support for Spark2 and Spark3.0, spark3.1, but not sure it's still needed or not)Impact
Describe any public API or user-facing feature change or any performance impact.
Index related commands are not supported anymore below Spark3.2
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist