-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile #1959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
ok to test |
|
QA tests have started for PR 1959. This patch merges cleanly. |
|
QA results for PR 1959: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ignoring any file starting with _ ? Hadoop (also) uses this convention, for things like the _SUCCESS file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that would ignore the metadata file "_metadata" as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should rethink about why we use filterNot here? simple filter works fine here, something like:
val children = fs.listStatus(path).filter { status =>
val name = status.getPath.getName
name == ParquetFileWriter.PARQUET_METADATA_FILE || (name(0) != '.' && name(0) != '_')
}
so we can ignore all of hidden/tmp files without _metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. Just remove .* and _* except _metadata.
|
QA tests have started for PR 1959 at commit
|
|
QA tests have finished for PR 1959 at commit
|
|
LGTM, thanks. |
|
Thanks! I've merged this into master and 1.1. |
…ontext.parquetFile fix compile error on hadoop 0.23 for the pull request #1924. Author: Chia-Yung Su <[email protected]> Closes #1959 from joesu/bugfix-spark3011 and squashes the following commits: be30793 [Chia-Yung Su] remove .* and _* except _metadata 8fe2398 [Chia-Yung Su] add note to explain 40ea9bd [Chia-Yung Su] fix hadoop-0.23 compile error c7e44f2 [Chia-Yung Su] match syntax f8fc32a [Chia-Yung Su] filter out tmp dir (cherry picked from commit 4243bb6) Signed-off-by: Michael Armbrust <[email protected]>
…ontext.parquetFile fix compile error on hadoop 0.23 for the pull request apache#1924. Author: Chia-Yung Su <[email protected]> Closes apache#1959 from joesu/bugfix-spark3011 and squashes the following commits: be30793 [Chia-Yung Su] remove .* and _* except _metadata 8fe2398 [Chia-Yung Su] add note to explain 40ea9bd [Chia-Yung Su] fix hadoop-0.23 compile error c7e44f2 [Chia-Yung Su] match syntax f8fc32a [Chia-Yung Su] filter out tmp dir
fix compile error on hadoop 0.23 for the pull request #1924.