[SPARK-30826][SQL] Respect reference case in StringStartsWith pushed down to parquet#27574
Closed
MaxGekk wants to merge 3 commits intoapache:masterfrom
Closed
[SPARK-30826][SQL] Respect reference case in StringStartsWith pushed down to parquet#27574MaxGekk wants to merge 3 commits intoapache:masterfrom
StringStartsWith pushed down to parquet#27574MaxGekk wants to merge 3 commits intoapache:masterfrom
Conversation
|
Test build #118396 has finished for PR 27574 at commit
|
Member
Author
|
jenkins, retest this, please |
Ngone51
approved these changes
Feb 14, 2020
Member
Ngone51
left a comment
There was a problem hiding this comment.
LGTM. It'd be better if we can reduce the scope of PR title to make it more compatible with the fix.
StringStartsWith filter pushed down to parquetStringStartsWith pushed down to parquet
cloud-fan
reviewed
Feb 14, 2020
| if pushDownStartWith && canMakeFilterOn(name, prefix) => | ||
| Option(prefix).map { v => | ||
| FilterApi.userDefined(binaryColumn(name), | ||
| FilterApi.userDefined(binaryColumn(nameToParquetField(name).fieldName), |
cloud-fan
approved these changes
Feb 14, 2020
wangyum
approved these changes
Feb 14, 2020
|
Test build #118401 has finished for PR 27574 at commit
|
Contributor
|
does 2.4 also have this bug? |
Member
Author
|
@cloud-fan Yes, 2.4 has the bug too. |
cloud-fan
pushed a commit
that referenced
this pull request
Feb 15, 2020
…d down to parquet ### What changes were proposed in this pull request? In the PR, I propose to convert the attribute name of `StringStartsWith` pushed down to the Parquet datasource to column reference via the `nameToParquetField` map. Similar conversions are performed for other source filters pushed down to parquet. ### Why are the changes needed? This fixes the bug described in [SPARK-30826](https://issues.apache.org/jira/browse/SPARK-30826). The query from an external table: ```sql CREATE TABLE t1 (col STRING) USING parquet OPTIONS (path '$path') ``` created on top of written parquet files by `Seq("42").toDF("COL").write.parquet(path)` returns wrong empty result: ```scala spark.sql("SELECT * FROM t1 WHERE col LIKE '4%'").show +---+ |col| +---+ +---+ ``` ### Does this PR introduce any user-facing change? Yes. After the changes the result is correct for the example above: ```scala spark.sql("SELECT * FROM t1 WHERE col LIKE '4%'").show +---+ |col| +---+ | 42| +---+ ``` ### How was this patch tested? Added a test to `ParquetFilterSuite` Closes #27574 from MaxGekk/parquet-StringStartsWith-case-sens. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8b73b92) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Contributor
|
thanks, merging to master/3.0/2.4! |
cloud-fan
pushed a commit
that referenced
this pull request
Feb 15, 2020
…d down to parquet ### What changes were proposed in this pull request? In the PR, I propose to convert the attribute name of `StringStartsWith` pushed down to the Parquet datasource to column reference via the `nameToParquetField` map. Similar conversions are performed for other source filters pushed down to parquet. ### Why are the changes needed? This fixes the bug described in [SPARK-30826](https://issues.apache.org/jira/browse/SPARK-30826). The query from an external table: ```sql CREATE TABLE t1 (col STRING) USING parquet OPTIONS (path '$path') ``` created on top of written parquet files by `Seq("42").toDF("COL").write.parquet(path)` returns wrong empty result: ```scala spark.sql("SELECT * FROM t1 WHERE col LIKE '4%'").show +---+ |col| +---+ +---+ ``` ### Does this PR introduce any user-facing change? Yes. After the changes the result is correct for the example above: ```scala spark.sql("SELECT * FROM t1 WHERE col LIKE '4%'").show +---+ |col| +---+ | 42| +---+ ``` ### How was this patch tested? Added a test to `ParquetFilterSuite` Closes #27574 from MaxGekk/parquet-StringStartsWith-case-sens. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8b73b92) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho
pushed a commit
to sjincho/spark
that referenced
this pull request
Apr 15, 2020
…d down to parquet ### What changes were proposed in this pull request? In the PR, I propose to convert the attribute name of `StringStartsWith` pushed down to the Parquet datasource to column reference via the `nameToParquetField` map. Similar conversions are performed for other source filters pushed down to parquet. ### Why are the changes needed? This fixes the bug described in [SPARK-30826](https://issues.apache.org/jira/browse/SPARK-30826). The query from an external table: ```sql CREATE TABLE t1 (col STRING) USING parquet OPTIONS (path '$path') ``` created on top of written parquet files by `Seq("42").toDF("COL").write.parquet(path)` returns wrong empty result: ```scala spark.sql("SELECT * FROM t1 WHERE col LIKE '4%'").show +---+ |col| +---+ +---+ ``` ### Does this PR introduce any user-facing change? Yes. After the changes the result is correct for the example above: ```scala spark.sql("SELECT * FROM t1 WHERE col LIKE '4%'").show +---+ |col| +---+ | 42| +---+ ``` ### How was this patch tested? Added a test to `ParquetFilterSuite` Closes apache#27574 from MaxGekk/parquet-StringStartsWith-case-sens. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In the PR, I propose to convert the attribute name of
StringStartsWithpushed down to the Parquet datasource to column reference via thenameToParquetFieldmap. Similar conversions are performed for other source filters pushed down to parquet.Why are the changes needed?
This fixes the bug described in SPARK-30826. The query from an external table:
created on top of written parquet files by
Seq("42").toDF("COL").write.parquet(path)returns wrong empty result:Does this PR introduce any user-facing change?
Yes. After the changes the result is correct for the example above:
How was this patch tested?
Added a test to
ParquetFilterSuite