-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11103][SQL] Filter applied on Merged Parquet shema with new column fail #9327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -314,4 +314,24 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex | |
| } | ||
| } | ||
| } | ||
|
|
||
| test("SPARK-11103: Filter applied on merged Parquet schema with new column fails") { | ||
| import testImplicits._ | ||
|
|
||
| withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true", | ||
| SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true") { | ||
| withTempPath { dir => | ||
| var pathOne = s"${dir.getCanonicalPath}/table1" | ||
| (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(pathOne) | ||
| var pathTwo = s"${dir.getCanonicalPath}/table2" | ||
| (1 to 3).map(i => (i, i.toString)).toDF("c", "b").write.parquet(pathTwo) | ||
|
|
||
| // If the "c = 1" filter gets pushed down, this query will throw an exception which | ||
| // Parquet emits. This is a Parquet issue (PARQUET-389). | ||
| checkAnswer( | ||
| sqlContext.read.parquet(pathOne, pathTwo).filter("c = 1"), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will great if we can specify the columns for this kind of cases because the ordering of the columns can be changed.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. I just wonder if the inconsistent order is another issue. I think users might think it is weird if they run the same script with Could I open an issue for this if you think it is a separate issue?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HyukjinKwon It will be weird if the column ordering of
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will try to check this. Thanks.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yhuai This is because of So, after retrieving the list of leaf files including I think this can be resolved by using I will open an issue for this. I would like to work on this if this is really an issue. Filed here https://issues.apache.org/jira/browse/SPARK-11500 |
||
| (1 to 1).map(i => Row(i, i.toString, null))) | ||
| } | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several nits here, but I'm going to merge this one first since 1.5.2rc2 is being cut soon.
Please use
valinstead ofvarhere.To construct the test DF, the following way is more preferable for better readability:
or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valwas used by mistake... Thanks for the comments!