Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Mar 20, 2019

#83 and #89 add case sensitivity flags, but filters pushed down to Parquet still do not support case sensitivity. This updates the Parquet row group filters to support a case sensitivity flag and updates Spark to pass the flag.

This also updates the old read path that uses ReadSupport when converting Iceberg expressions to Parquet filters.

@rdblue
Copy link
Contributor Author

rdblue commented Mar 20, 2019

@xabriel, can you review? We ran into this when testing case insensitive queries.

Copy link
Contributor

@xabriel xabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also provide a repro SQL query that would trigger this issue as a comment to this PR? I guess my simple test cases didn't.

@rdblue
Copy link
Contributor Author

rdblue commented Mar 22, 2019

@xabriel, thanks for looking. You're right that I still need to add tests.

@rdblue rdblue force-pushed the fix-parquet-case-sensitivity branch from 5f9241c to a5ff4dc Compare April 7, 2019 20:17
@rdblue
Copy link
Contributor Author

rdblue commented Apr 7, 2019

@xabriel, can you take another look? I've fixed the problem you found and validated there are tests that actually exercise case insensitive binding for the Parquet classes.

To your question about how to reproduce this, you have to run a query that pushes filters down to Parquet. The filters you use can't be satisfied by partition filtering because Iceberg will drop those filters and won't push them down. I would recommend writing a dataset without partitioning and sorted by a categorical value, then test equality filters on that categorical value. That would hit this problem.

Copy link
Contributor

@xabriel xabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @rdblue. LGTM.

@rdblue
Copy link
Contributor Author

rdblue commented Apr 11, 2019

Thanks for reviewing this, @xabriel! I'll merge it.

@rdblue rdblue merged commit 2fdc6f7 into apache:master Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants