-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36895][SQL] Add Create Index syntax support #34148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@cloud-fan @viirya |
|
Test build #143737 has finished for PR 34148 at commit
|
|
I removed these specific index types |
|
Kubernetes integration test starting |
|
@dongjoon-hyun @sunchao @viirya |
|
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ignoreIfExists. I will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ignoreIfExists.
|
Test build #143771 has finished for PR 34148 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove
|
@huaxingao just a quick question. do you have an implementation of the index support in DSv2? I am asking this because pandas API on Spark already implemented this actually. I would like to investigate the feasibility of migration in the future more just for curiosity .. If you don't have it now, it should be good to have one in test or documentation. |
|
@HyukjinKwon Thanks for taking a look! I am implementing this in JDBC right now, so I can test out the new code in index support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unrelated change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we combine this with tablePropertyList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to use UnresolvedDBObjectName here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test more combinations? e.g., when there is no option given to a column, or no option is given to the index, or an index type is specified etc.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not table property anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is index_property_value required or optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's optional. I forgot to put this in [].
I double checked oracle and MySQL reference, the index properties could have name without value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only can create index on a table, right? If so, this sounds to be UnresolvedTable and we can use createUnresolvedTable(...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Should be UnresolvedTable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. I think we cannot create index on arbitrary database object.
|
Test build #143808 has finished for PR 34148 at commit
|
|
Test build #143810 has finished for PR 34148 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * CREATE [index_type] INDEX [index_name] ON [TABLE] table_name (column_index_property_list) | |
| * CREATE [index_type] INDEX index_name ON [TABLE] table_name (column_index_property_list) |
index name is not optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual implementation is as simple as calling v2 createIndex API, right? I think we should throw exception in the JDBC v2 source instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have JDBC v2 source createIndex implemented in #34164. We can probably merge that PR first, and then I can remove this Todo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add parser UT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | CREATE indexType=STRING? INDEX (IF NOT EXISTS)? identifier ON TABLE? | |
| | CREATE indexType=identifier? INDEX (IF NOT EXISTS)? identifier ON TABLE? |
STRING only matches 'abc', but we want to support CREATE bloom_filter INDEX ...
…CREATE INDEX index_name ON [TABLE] table_name [USING index_type]
…rces/v2/CreateIndexExec.scala Co-authored-by: Wenchen Fan <[email protected]>
…[String, String]]] to util.Map[NamedReference, util.Map[String, String]]
57fbf67 to
fe0d4d3
Compare
|
Test build #144628 has started for PR 34148 at commit |
|
LGTM. Merged into master. Thank you! |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
### What changes were proposed in this pull request? use property to specify index type ### Why are the changes needed? to address this comment #34148 (comment) ### Does this PR introduce _any_ user-facing change? Yes ``` void createIndex(String indexName, String indexType, NamedReference[] columns, Map<NamedReference, Map<String, String>> columnsProperties, Map<String, String> properties) ``` changed to ``` createIndex(String indexName, NamedReference[] columns, Map<NamedReference, Map<String, String>> columnsProperties, Map<String, String> properties ``` ### How was this patch tested? new test Closes #34486 from huaxingao/deleteIndexType. Lead-authored-by: Huaxin Gao <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Huaxin Gao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
CreateIndex is added in [HUDI-4165](https://github.com/apache/hudi/pull/5761/files), and spark 3.3 also include this in [SPARK-36895](apache/spark#34148). Since `CreateIndex` uses same package `org.apache.spark.sql.catalyst.plans.logical` in HUDI and Spark3.3, but params are not same. So it could introduce class conflict issues if we use it. This commit still keeps the same package path with Spark, but changes to 1. Use the same params like Spark, so there should be no class conflict 2. Only support Index related commands from **Spark3.2**, since Spark2 doesn't have `org.apache.spark.sql.catalyst.analysis.FieldName` but `CreateIndex` requires 3. Resolve columns for CreateIndex during Analyze stage
What changes were proposed in this pull request?
This is the 2nd PR for DSv2 index support.
This PR adds the following:
CreateIndexlogic nodeCreateIndexExecphysical nodeCreateIndexis not implemented yet in this PR. CallingCreateIndexwill throwSQLFeatureNotSupportedException, and the parsed index information such asIndexNameindexTypecolumnsand index properties will be included in the error message for now for testing purpose.Why are the changes needed?
To support index in DSv2
Does this PR introduce any user-facing change?
Yes, the create table syntax as the following:
How was this patch tested?
add a UT