[SPARK-36895][SQL] Add Create Index syntax support #34148

huaxingao · 2021-09-30T00:22:26Z

What changes were proposed in this pull request?

This is the 2nd PR for DSv2 index support.

This PR adds the following:

create index syntax support in parser and analyzer
CreateIndex logic node
CreateIndexExec physical node

CreateIndex is not implemented yet in this PR. Calling CreateIndex will throw SQLFeatureNotSupportedException, and the parsed index information such as IndexName indexType columns and index properties will be included in the error message for now for testing purpose.

Why are the changes needed?

To support index in DSv2

Does this PR introduce any user-facing change?

Yes, the create table syntax as the following:

CREATE INDEX index_name ON [TABLE] table_name [USING index_type] (column_index_property_list)[OPTIONS indexPropertyList]

    column_index_property_list: column_name [OPTIONS(indexPropertyList)]  [ ,  . . . ]
    indexPropertyList: index_property_name [= index_property_value] [ ,  . . . ]

How was this patch tested?

add a UT

SparkQA · 2021-09-30T01:08:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48248/

SparkQA · 2021-09-30T02:10:36Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48248/

huaxingao · 2021-09-30T02:42:03Z

@cloud-fan @viirya
Could you please take a look?
SQLKeywordSuite failed because the new keywords BLOOM_FILTER_INDEX, BTREE_INDEX and Z_ORDERING_INDEX for index type are not in the documentation yet.

SparkQA · 2021-09-30T03:13:25Z

Test build #143737 has finished for PR 34148 at commit f2d7e76.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2021-09-30T18:02:18Z

I removed these specific index types BLOOM_FILTER_INDEX, BTREE_INDEX and Z_ORDERING_INDEX for now to pass the SQLKeywordSuite. We can discuss what index types to support and document in next PR after Wenchen comes back from the holiday break.

SparkQA · 2021-09-30T19:16:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48282/

huaxingao · 2021-09-30T20:04:50Z

@dongjoon-hyun @sunchao @viirya
This PR is ready for review. The failed test MariaDBKrbIntegrationSuite is not relevant.

SparkQA · 2021-09-30T20:18:16Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48282/

huaxingao · 2021-09-30T22:42:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

This should be ignoreIfExists. I will fix this.

huaxingao · 2021-09-30T22:42:33Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala

This should be ignoreIfExists.

SparkQA · 2021-09-30T23:12:40Z

Test build #143771 has finished for PR 34148 at commit 0024096.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2021-09-30T23:49:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

will remove

HyukjinKwon · 2021-10-01T06:07:16Z

@huaxingao just a quick question. do you have an implementation of the index support in DSv2? I am asking this because pandas API on Spark already implemented this actually. I would like to investigate the feasibility of migration in the future more just for curiosity .. If you don't have it now, it should be good to have one in test or documentation.

huaxingao · 2021-10-01T16:29:35Z

@HyukjinKwon Thanks for taking a look! I am implementing this in JDBC right now, so I can test out the new code in index support.

sunchao · 2021-10-01T17:15:51Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

nit: unrelated change

sunchao · 2021-10-01T17:16:39Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

can we combine this with tablePropertyList?

sunchao · 2021-10-02T17:59:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala

why do we need to use UnresolvedDBObjectName here?

sunchao · 2021-10-02T18:01:47Z

...rc/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala

can we test more combinations? e.g., when there is no option given to a column, or no option is given to the index, or an index type is specified etc.

SparkQA · 2021-10-03T04:14:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48320/

SparkQA · 2021-10-03T05:13:58Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48320/

viirya · 2021-10-03T07:21:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

not table property anymore.

viirya · 2021-10-03T07:24:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Hmm, is index_property_value required or optional?

It's optional. I forgot to put this in [].
I double checked oracle and MySQL reference, the index properties could have name without value.

viirya · 2021-10-03T07:27:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

We only can create index on a table, right? If so, this sounds to be UnresolvedTable and we can use createUnresolvedTable(...).

You are right. Should be UnresolvedTable.

viirya · 2021-10-03T07:29:18Z

...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala

ditto. I think we cannot create index on arbitrary database object.

SparkQA · 2021-10-03T08:12:29Z

Test build #143808 has finished for PR 34148 at commit 8248e60.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-10-03T16:42:00Z

Test build #143810 has finished for PR 34148 at commit bc71de3.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-10-03T17:16:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48322/

SparkQA · 2021-10-03T18:14:03Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48322/

cloud-fan · 2021-10-05T04:17:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Suggested change

* CREATE [index_type] INDEX [index_name] ON [TABLE] table_name (column_index_property_list)

* CREATE [index_type] INDEX index_name ON [TABLE] table_name (column_index_property_list)

index name is not optional

cloud-fan · 2021-10-05T04:20:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala

The actual implementation is as simple as calling v2 createIndex API, right? I think we should throw exception in the JDBC v2 source instead.

I have JDBC v2 source createIndex implemented in #34164. We can probably merge that PR first, and then I can remove this Todo.

cloud-fan · 2021-10-05T04:21:45Z

...rc/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala

can we also add parser UT?

cloud-fan · 2021-10-05T04:24:00Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Suggested change

| CREATE indexType=STRING? INDEX (IF NOT EXISTS)? identifier ON TABLE?

| CREATE indexType=identifier? INDEX (IF NOT EXISTS)? identifier ON TABLE?

STRING only matches 'abc', but we want to support CREATE bloom_filter INDEX ...

…CREATE INDEX index_name ON [TABLE] table_name [USING index_type]

…rces/v2/CreateIndexExec.scala Co-authored-by: Wenchen Fan <[email protected]>

…[String, String]]] to util.Map[NamedReference, util.Map[String, String]]

SparkQA · 2021-10-26T18:07:30Z

Test build #144628 has started for PR 34148 at commit fe0d4d3.

dbtsai · 2021-10-26T18:16:37Z

LGTM. Merged into master. Thank you!

SparkQA · 2021-10-26T18:51:07Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49098/

SparkQA · 2021-10-26T19:29:10Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49098/

### What changes were proposed in this pull request? use property to specify index type ### Why are the changes needed? to address this comment #34148 (comment) ### Does this PR introduce _any_ user-facing change? Yes ``` void createIndex(String indexName, String indexType, NamedReference[] columns, Map<NamedReference, Map<String, String>> columnsProperties, Map<String, String> properties) ``` changed to ``` createIndex(String indexName, NamedReference[] columns, Map<NamedReference, Map<String, String>> columnsProperties, Map<String, String> properties ``` ### How was this patch tested? new test Closes #34486 from huaxingao/deleteIndexType. Lead-authored-by: Huaxin Gao <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Huaxin Gao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

CreateIndex is added in [HUDI-4165](https://github.com/apache/hudi/pull/5761/files), and spark 3.3 also include this in [SPARK-36895](apache/spark#34148). Since `CreateIndex` uses same package `org.apache.spark.sql.catalyst.plans.logical` in HUDI and Spark3.3, but params are not same. So it could introduce class conflict issues if we use it. This commit still keeps the same package path with Spark, but changes to 1. Use the same params like Spark, so there should be no class conflict 2. Only support Index related commands from **Spark3.2**, since Spark2 doesn't have `org.apache.spark.sql.catalyst.analysis.FieldName` but `CreateIndex` requires 3. Resolve columns for CreateIndex during Analyze stage

github-actions bot added the SQL label Sep 30, 2021

huaxingao commented Sep 30, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Outdated

Copy link

Contributor Author

huaxingao Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

sunchao reviewed Oct 2, 2021

View reviewed changes

viirya reviewed Oct 3, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Outdated

Copy link

Member

viirya Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not table property anymore.

viirya reviewed Oct 3, 2021

View reviewed changes

cloud-fan reviewed Oct 5, 2021

View reviewed changes

huaxingao and others added 19 commits October 26, 2021 10:29

test IF NOT EXISTS

6f9347d

add end to end test for createIndex

d6ac2aa

remove index test from JDBCTableCatalogSuite

2e3788a

remove unsed import

78f300f

change CREATE [index_type] INDEX index_name ON [TABLE] table_name to …

7531b87

…CREATE INDEX index_name ON [TABLE] table_name [USING index_type]

rebase

b619f7a

fix index type syntax in test

a8382e5

address comments

a4fa7c8

rebase

6b84f83

fix test failure

9d0781c

address comments and change Properties to Map

03c046b

Update sql/core/src/main/scala/org/apache/spark/sql/execution/datasou…

b7801e4

…rces/v2/CreateIndexExec.scala Co-authored-by: Wenchen Fan <[email protected]>

fix build failure

e962ecb

fix lint-java

a86503f

fix test failure

ab2908f

address comments

f871d26

address comments

2e066b3

change columnsProperties from Array[util.Map[NamedReference, util.Map…

a8cb845

…[String, String]]] to util.Map[NamedReference, util.Map[String, String]]

address comments

fe0d4d3

huaxingao force-pushed the index_syntax branch from 57fbf67 to fe0d4d3 Compare October 26, 2021 17:51

dbtsai closed this in 677aba2 Oct 26, 2021

huaxingao mentioned this pull request Nov 4, 2021

[SPARK-36895][SQL][FOLLOWUP] Use property to specify index type #34486

Closed

boneanxs mentioned this pull request Oct 20, 2023

[HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 apache/hudi#9895

Merged

4 tasks

hudi-bot mentioned this pull request Nov 30, 2025

Fix class conflict of CreateIndex from Spark3.3 apache/hudi#16269

Open

	* CREATE [index_type] INDEX [index_name] ON [TABLE] table_name (column_index_property_list)
	* CREATE [index_type] INDEX index_name ON [TABLE] table_name (column_index_property_list)

	\| CREATE indexType=STRING? INDEX (IF NOT EXISTS)? identifier ON TABLE?
	\| CREATE indexType=identifier? INDEX (IF NOT EXISTS)? identifier ON TABLE?

[SPARK-36895][SQL] Add Create Index syntax support #34148

[SPARK-36895][SQL] Add Create Index syntax support #34148

Uh oh!

Conversation

huaxingao commented Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Sep 30, 2021

Uh oh!

SparkQA commented Sep 30, 2021

Uh oh!

huaxingao commented Sep 30, 2021

Uh oh!

SparkQA commented Sep 30, 2021

Uh oh!

huaxingao commented Sep 30, 2021

Uh oh!

SparkQA commented Sep 30, 2021

Uh oh!

huaxingao commented Sep 30, 2021

Uh oh!

SparkQA commented Sep 30, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 30, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Oct 1, 2021

Uh oh!

huaxingao commented Oct 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 3, 2021

Uh oh!

SparkQA commented Oct 3, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 3, 2021

Uh oh!

SparkQA commented Oct 3, 2021

Uh oh!

SparkQA commented Oct 3, 2021

Uh oh!

SparkQA commented Oct 3, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Oct 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

huaxingao commented Sep 30, 2021 •

edited

Loading

cloud-fan Oct 5, 2021 •

edited

Loading