Skip to content

Comments

[SPARK-28974][SQL] centralize the Data Source V2 table capability checks#25679

Closed
cloud-fan wants to merge 2 commits intoapache:masterfrom
cloud-fan:dsv2-check
Closed

[SPARK-28974][SQL] centralize the Data Source V2 table capability checks#25679
cloud-fan wants to merge 2 commits intoapache:masterfrom
cloud-fan:dsv2-check

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

merge the V2WriteSupportCheck and V2StreamingScanSupportCheck to one rule: TableCapabilityCheck.

Why are the changes needed?

It's a little confusing to have 2 rules to check DS v2 table capability, while one rule says it checks write and another rule says it checks streaming scan. We can clearly tell it from the rule names that the batch scan check is missing.

It's better to have a centralized place for this check, with a name that clearly says it checks table capability.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing tests

OverwritePartitionsDynamicExec(r.table.asWritable, r.options, planLater(query)) :: Nil

case DeleteFromTable(r: DataSourceV2Relation, condition) =>
if (SubqueryExpression.hasSubquery(condition)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also see https://github.com/apache/spark/pull/25626/files#diff-fc4a83061d40a242e7f7b729cb8ee870R251

We think that the subquery check is for the current limitation, which should not be treated as a capability check. It's better to put it near the implementation, and should be updated together when we improve the implementation.

override def apply(plan: LogicalPlan): Unit = {
plan.foreach {
plan foreach {
case r: DataSourceV2Relation if !r.table.supports(BATCH_READ) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I add the batch scan check. It's possible that a table implements SupportsRead without reporting BATCH_READ capability. For example, a steaming table which doesn't support batch scan. We must check the BATCH_READ capability here, instead of relying on the .isInstaceOf[SupportsRead] check at the planner side.


assert(exc.getMessage.contains(
"does not support overwrite expression (`x` = 5) in batch mode"))
assert(exc.getMessage.contains("does not support overwrite by filter in batch mode"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the error message a little bit. The previous message is a little misleading that only the specific expression is not supported.

@SparkQA
Copy link

SparkQA commented Sep 4, 2019

Test build #110124 has finished for PR 25679 at commit 03d9bcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor

rdblue commented Sep 4, 2019

Looks fine to me, once tests are passing.

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

Test build #110163 has finished for PR 25679 at commit a25f9df.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

Test build #110168 has finished for PR 25679 at commit a25f9df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@cloud-fan cloud-fan closed this in c81fd0c Sep 5, 2019
PavithraRamachandran pushed a commit to PavithraRamachandran/spark that referenced this pull request Sep 15, 2019
### What changes were proposed in this pull request?

merge the `V2WriteSupportCheck` and `V2StreamingScanSupportCheck` to one rule: `TableCapabilityCheck`.

### Why are the changes needed?

It's a little confusing to have 2 rules to check DS v2 table capability, while one rule says it checks write and another rule says it checks streaming scan. We can clearly tell it from the rule names that the batch scan check is missing.

It's better to have a centralized place for this check, with a name that clearly says it checks table capability.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

existing tests

Closes apache#25679 from cloud-fan/dsv2-check.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants