[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

windpiger · 2017-03-06T09:18:30Z

What changes were proposed in this pull request?

In SPARK-5068, we introduce a SQLConf spark.sql.hive.verifyPartitionPath,
if it is set to true, it will avoid the task failed when the patition location does not exists in the filesystem.

this situation should always return emtpy and don't lead to the task failed, here we remove this conf.

And the function verifyPartitionPath has a bug ,that if the partition path is custom path

it will still do filter for all partition path in the parameter partitionToDeserializer,
it will scan the path which does not belong to the table ,e.g. custom path is /root/a
and the partitionSpec is b=1/c=2, this will lead to scan / because of the getPathPatternByPath

How was this patch tested?

modify a test case

…eturn empty when the location does not exists

SparkQA · 2017-03-06T10:26:59Z

Test build #73991 has finished for PR 17176 at commit 95aa931.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-06T12:14:24Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

+          }
+          // convert  /demo/data/year/month/day  to  /demo/data/*/*/*/
+          def getPathPatternByPath(parNum: Int, tempPath: Path, partitionName: String): String = {
+            // if the partition path does not end with partition name, we should not


if the partition location has been altered to another location, we should not do this pattern, or we will list pattern files which does not belong to the partition

SparkQA · 2017-03-06T13:14:57Z

Test build #73998 has finished for PR 17176 at commit 8128567.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-06T13:41:36Z

Test build #73992 has finished for PR 17176 at commit 4bb0e28.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-06T15:09:16Z

retest this please

SparkQA · 2017-03-06T16:22:27Z

Test build #74016 has finished for PR 17176 at commit 8128567.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-07T05:26:24Z

why jenkins failed...

…yPath

SparkQA · 2017-03-07T06:53:01Z

Test build #74072 has finished for PR 17176 at commit 22b1f53.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-07T13:07:11Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

    def verifyPartitionPath(
        partitionToDeserializer: Map[HivePartition, Class[_ <: Deserializer]]):
        Map[HivePartition, Class[_ <: Deserializer]] = {
-      if (!sparkSession.sessionState.conf.verifyPartitionPath) {


after this pr https://github.com/apache/spark/pull/17187， read hive table which does not use stored by will not use HiveTableScanExec.

this function has a bug ,that if the partition path is custom path

it will still do filter for all partition path in the parameter partitionToDeserializer,

it will scan the path which does not belong to the table ,e.g. custom path is /root/a
and the partitionSpec is b=1/c=2, this will lead to scan / because of the getPathPatternByPath

SparkQA · 2017-03-07T14:59:41Z

Test build #74106 has finished for PR 17176 at commit 262e2f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-07T15:56:12Z

Test build #74107 has finished for PR 17176 at commit 3a15e5d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-28T04:35:58Z

@windpiger If you do not have a bandwidth to continue it, how about closing it now?

barrenlake · 2017-12-04T07:46:22Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

+        case (partition, partDeserializer) =>
+          val partPath = partition.getDataLocation
+          val fs = partPath.getFileSystem(hadoopConf)
+          fs.exists(partPath)


Each partition sending an RPC request to the NameNode can result in poor performance

Closes apache#21766 Closes apache#21679 Closes apache#21161 Closes apache#20846 Closes apache#19434 Closes apache#18080 Closes apache#17648 Closes apache#17169 Add: Closes apache#22813 Closes apache#21994 Closes apache#22005 Closes apache#22463 Add: Closes apache#15899 Add: Closes apache#22539 Closes apache#21868 Closes apache#21514 Closes apache#21402 Closes apache#21322 Closes apache#21257 Closes apache#20163 Closes apache#19691 Closes apache#18697 Closes apache#18636 Closes apache#17176 Closes apache#23001 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always r…

95aa931

…eturn empty when the location does not exists

fix a bug

8128567

windpiger force-pushed the removeHiveVerfiyPath branch from 4bb0e28 to 8128567 Compare March 6, 2017 12:05

windpiger commented Mar 6, 2017

View reviewed changes

windpiger added 2 commits March 7, 2017 13:42

add log to find why jenkins failed

22b1f53

Merge branch 'master' of github.com:apache/spark into removeHiveVerfi…

5fd0e20

…yPath

fix test failed

262e2f2

windpiger commented Mar 7, 2017

View reviewed changes

remove log

3a15e5d

barrenlake reviewed Dec 4, 2017

View reviewed changes

HyukjinKwon mentioned this pull request Nov 11, 2018

[INFRA] Close stale PRs #23001

Closed

asfgit closed this in a3ba3a8 Nov 11, 2018

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

Uh oh!

Conversation

windpiger commented Mar 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 6, 2017

Uh oh!

windpiger Mar 6, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 6, 2017

Uh oh!

SparkQA commented Mar 6, 2017

Uh oh!

windpiger commented Mar 6, 2017

Uh oh!

SparkQA commented Mar 6, 2017

Uh oh!

windpiger commented Mar 7, 2017

Uh oh!

SparkQA commented Mar 7, 2017

Uh oh!

windpiger Mar 7, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 7, 2017

Uh oh!

SparkQA commented Mar 7, 2017

Uh oh!

gatorsmile commented Oct 28, 2017

Uh oh!

barrenlake Dec 4, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

windpiger commented Mar 6, 2017 •

edited

Loading