[SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile #1959

ghost · 2014-08-15T00:32:57Z

fix compile error on hadoop 0.23 for the pull request #1924.

AmplabJenkins · 2014-08-15T00:33:20Z

Can one of the admins verify this patch?

marmbrus · 2014-08-15T00:44:30Z

ok to test

SparkQA · 2014-08-15T00:49:59Z

QA tests have started for PR 1959. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18586/consoleFull

SparkQA · 2014-08-15T01:58:03Z

QA results for PR 1959:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18586/consoleFull

srowen · 2014-08-15T08:31:26Z

sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala

How about ignoring any file starting with _ ? Hadoop (also) uses this convention, for things like the _SUCCESS file.

Unfortunately, that would ignore the metadata file "_metadata" as well.

maybe we should rethink about why we use filterNot here? simple filter works fine here, something like:

val children = fs.listStatus(path).filter { status => val name = status.getPath.getName name == ParquetFileWriter.PARQUET_METADATA_FILE || (name(0) != '.' && name(0) != '_') }

so we can ignore all of hidden/tmp files without _metadata

I agree with this. Just remove .* and _* except _metadata.

SparkQA · 2014-08-23T06:55:43Z

QA tests have started for PR 1959 at commit be30793.

This patch merges cleanly.

SparkQA · 2014-08-23T08:18:20Z

QA tests have finished for PR 1959 at commit be30793.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2014-08-24T06:08:56Z

LGTM, thanks.

marmbrus · 2014-08-26T01:20:58Z

Thanks! I've merged this into master and 1.1.

…ontext.parquetFile fix compile error on hadoop 0.23 for the pull request #1924. Author: Chia-Yung Su <[email protected]> Closes #1959 from joesu/bugfix-spark3011 and squashes the following commits: be30793 [Chia-Yung Su] remove .* and _* except _metadata 8fe2398 [Chia-Yung Su] add note to explain 40ea9bd [Chia-Yung Su] fix hadoop-0.23 compile error c7e44f2 [Chia-Yung Su] match syntax f8fc32a [Chia-Yung Su] filter out tmp dir (cherry picked from commit 4243bb6) Signed-off-by: Michael Armbrust <[email protected]>

…ontext.parquetFile fix compile error on hadoop 0.23 for the pull request apache#1924. Author: Chia-Yung Su <[email protected]> Closes apache#1959 from joesu/bugfix-spark3011 and squashes the following commits: be30793 [Chia-Yung Su] remove .* and _* except _metadata 8fe2398 [Chia-Yung Su] add note to explain 40ea9bd [Chia-Yung Su] fix hadoop-0.23 compile error c7e44f2 [Chia-Yung Su] match syntax f8fc32a [Chia-Yung Su] filter out tmp dir

josephsu added 4 commits August 13, 2014 23:52

filter out tmp dir

f8fc32a

match syntax

c7e44f2

fix hadoop-0.23 compile error

40ea9bd

add note to explain

8fe2398

ghost mentioned this pull request Aug 15, 2014

[SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile #1924

Closed

srowen reviewed Aug 15, 2014
View reviewed changes

remove .* and _* except _metadata

be30793

asfgit closed this in 4243bb6 Aug 26, 2014

[SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile #1959

[SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile #1959

Uh oh!

Conversation

ghost commented Aug 15, 2014

Uh oh!

AmplabJenkins commented Aug 15, 2014

Uh oh!

marmbrus commented Aug 15, 2014

Uh oh!

SparkQA commented Aug 15, 2014

Uh oh!

SparkQA commented Aug 15, 2014

Uh oh!

srowen Aug 15, 2014

Choose a reason for hiding this comment

Uh oh!

ghost Aug 15, 2014

Choose a reason for hiding this comment

Uh oh!

chutium Aug 21, 2014

Choose a reason for hiding this comment

Uh oh!

liancheng Aug 23, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 23, 2014

Uh oh!

SparkQA commented Aug 23, 2014

Uh oh!

liancheng commented Aug 24, 2014

Uh oh!

marmbrus commented Aug 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants