Skip to content

Conversation

@huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Sep 30, 2021

What changes were proposed in this pull request?

This is the 2nd PR for DSv2 index support.

This PR adds the following:

  • create index syntax support in parser and analyzer
  • CreateIndex logic node
  • CreateIndexExec physical node

CreateIndex is not implemented yet in this PR. Calling CreateIndex will throw SQLFeatureNotSupportedException, and the parsed index information such as IndexName indexType columns and index properties will be included in the error message for now for testing purpose.

Why are the changes needed?

To support index in DSv2

Does this PR introduce any user-facing change?

Yes, the create table syntax as the following:

CREATE INDEX index_name ON [TABLE] table_name [USING index_type] (column_index_property_list)[OPTIONS indexPropertyList]

    column_index_property_list: column_name [OPTIONS(indexPropertyList)]  [ ,  . . . ]
    indexPropertyList: index_property_name [= index_property_value] [ ,  . . . ]

How was this patch tested?

add a UT

@github-actions github-actions bot added the SQL label Sep 30, 2021
@SparkQA
Copy link

SparkQA commented Sep 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48248/

@SparkQA
Copy link

SparkQA commented Sep 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48248/

@huaxingao
Copy link
Contributor Author

@cloud-fan @viirya
Could you please take a look?
SQLKeywordSuite failed because the new keywords BLOOM_FILTER_INDEX, BTREE_INDEX and Z_ORDERING_INDEX for index type are not in the documentation yet.

@SparkQA
Copy link

SparkQA commented Sep 30, 2021

Test build #143737 has finished for PR 34148 at commit f2d7e76.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

I removed these specific index types BLOOM_FILTER_INDEX, BTREE_INDEX and Z_ORDERING_INDEX for now to pass the SQLKeywordSuite. We can discuss what index types to support and document in next PR after Wenchen comes back from the holiday break.

@SparkQA
Copy link

SparkQA commented Sep 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48282/

@huaxingao
Copy link
Contributor Author

@dongjoon-hyun @sunchao @viirya
This PR is ready for review. The failed test MariaDBKrbIntegrationSuite is not relevant.

@SparkQA
Copy link

SparkQA commented Sep 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48282/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ignoreIfExists. I will fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ignoreIfExists.

@SparkQA
Copy link

SparkQA commented Sep 30, 2021

Test build #143771 has finished for PR 34148 at commit 0024096.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

@HyukjinKwon
Copy link
Member

@huaxingao just a quick question. do you have an implementation of the index support in DSv2? I am asking this because pandas API on Spark already implemented this actually. I would like to investigate the feasibility of migration in the future more just for curiosity .. If you don't have it now, it should be good to have one in test or documentation.

@huaxingao
Copy link
Contributor Author

@HyukjinKwon Thanks for taking a look! I am implementing this in JDBC right now, so I can test out the new code in index support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unrelated change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine this with tablePropertyList?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to use UnresolvedDBObjectName here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test more combinations? e.g., when there is no option given to a column, or no option is given to the index, or an index type is specified etc.

@SparkQA
Copy link

SparkQA commented Oct 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48320/

@SparkQA
Copy link

SparkQA commented Oct 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48320/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not table property anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is index_property_value required or optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optional. I forgot to put this in [].
I double checked oracle and MySQL reference, the index properties could have name without value.

Comment on lines 4515 to 4517
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only can create index on a table, right? If so, this sounds to be UnresolvedTable and we can use createUnresolvedTable(...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Should be UnresolvedTable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. I think we cannot create index on arbitrary database object.

@SparkQA
Copy link

SparkQA commented Oct 3, 2021

Test build #143808 has finished for PR 34148 at commit 8248e60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 3, 2021

Test build #143810 has finished for PR 34148 at commit bc71de3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48322/

@SparkQA
Copy link

SparkQA commented Oct 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48322/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* CREATE [index_type] INDEX [index_name] ON [TABLE] table_name (column_index_property_list)
* CREATE [index_type] INDEX index_name ON [TABLE] table_name (column_index_property_list)

index name is not optional

Copy link
Contributor

@cloud-fan cloud-fan Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual implementation is as simple as calling v2 createIndex API, right? I think we should throw exception in the JDBC v2 source instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have JDBC v2 source createIndex implemented in #34164. We can probably merge that PR first, and then I can remove this Todo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add parser UT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| CREATE indexType=STRING? INDEX (IF NOT EXISTS)? identifier ON TABLE?
| CREATE indexType=identifier? INDEX (IF NOT EXISTS)? identifier ON TABLE?

STRING only matches 'abc', but we want to support CREATE bloom_filter INDEX ...

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Test build #144628 has started for PR 34148 at commit fe0d4d3.

@dbtsai dbtsai closed this in 677aba2 Oct 26, 2021
@dbtsai
Copy link
Member

dbtsai commented Oct 26, 2021

LGTM. Merged into master. Thank you!

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49098/

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49098/

cloud-fan added a commit that referenced this pull request Nov 8, 2021
### What changes were proposed in this pull request?
use property to specify index type

### Why are the changes needed?
to address this comment #34148 (comment)

### Does this PR introduce _any_ user-facing change?
Yes
```
  void createIndex(String indexName,
      String indexType,
      NamedReference[] columns,
      Map<NamedReference, Map<String, String>> columnsProperties,
      Map<String, String> properties)
```
changed to
```
createIndex(String indexName,
      NamedReference[] columns,
      Map<NamedReference, Map<String, String>> columnsProperties,
      Map<String, String> properties
```

### How was this patch tested?
new test

Closes #34486 from huaxingao/deleteIndexType.

Lead-authored-by: Huaxin Gao <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Huaxin Gao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
yihua pushed a commit to apache/hudi that referenced this pull request Oct 27, 2023
CreateIndex is added in [HUDI-4165](https://github.com/apache/hudi/pull/5761/files), and spark 3.3 also include this in [SPARK-36895](apache/spark#34148). Since `CreateIndex` uses same package `org.apache.spark.sql.catalyst.plans.logical` in HUDI and Spark3.3, but params are not same. So it could introduce class conflict issues if we use it.

This commit still keeps the same package path with Spark, but changes to

1. Use the same params like Spark, so there should be no class conflict
2. Only support Index related commands from **Spark3.2**, since Spark2 doesn't have `org.apache.spark.sql.catalyst.analysis.FieldName` but `CreateIndex` requires
3. Resolve columns for CreateIndex during Analyze stage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants