Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet: Add tests for pruning on Int8/Int16/Int64 columns #9778

Merged
merged 3 commits into from
Mar 31, 2024

Conversation

progval
Copy link
Contributor

@progval progval commented Mar 24, 2024

[Do not merge this PR, it highlights a bug in Int8 and Int16 columns, through correct_bloom_filters: false. See #9779 for a discussion]

Which issue does this PR close?

Closes #9777.

Rationale for this change

What changes are included in this PR?

Generalizes the Int32 tests, using a macro.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 24, 2024
@progval progval changed the title parquet: Add tests for Bloom filters on Int8/Int16/Int64 columns parquet: Add tests for pruning on Int8/Int16/Int64 columns Mar 24, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @progval

[Do not merge this PR, it highlights a bug in Int8 and Int16 columns, through correct_bloom_filters: false. See https://github.com//issues/9779 for a discussion]

What would you think about merging this PR (but keeping #9779 open)?

Then the fix for #9779 could just update the tests to set correct_bloom_filters?

@progval
Copy link
Contributor Author

progval commented Mar 24, 2024

What would you think about merging this PR

Makes sense, I guess. I was hoping you would find how to solve #9779, because I can't :)

(but keeping #9779 open)?

Well, it should be removed when fixed, there is no point to keeping it.

@alamb
Copy link
Contributor

alamb commented Mar 24, 2024

What would you think about merging this PR

Makes sense, I guess. I was hoping you would find how to solve #9779, because I can't :)

(but keeping #9779 open)?

Well, it should be removed when fixed, there is no point to keeping it.

Agreed.

For others following along, the issue is upstream in parquet-rs: #9779 (comment)

I think what we should do with this PR is to update the comments explaining that the tests now demonstrate there is a bug in DataFusion and link to the upstream issue.

Once the upstream issue is fixed, we can then update the tests / close the false nagatives bug #9779

@alamb
Copy link
Contributor

alamb commented Mar 28, 2024

As a follow up here, I plan to add the comments explaining what is going on here (and that there are tests that show incorrect results, that are tracked by a ticket) and then merge it in.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @progval -- I somehow missed that you had updated this PR. It looks really good to me.

Thanks again. Very much appreciated

@alamb
Copy link
Contributor

alamb commented Mar 30, 2024

I merged up from main to make sure we have a clean CI run and then I think we can merge this one in

@alamb alamb merged commit 2cb6f73 into apache:main Mar 31, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 31, 2024

Thanks again @progval

@progval progval deleted the int-bloom-tests branch March 31, 2024 09:12
Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Apr 1, 2024
* parquet: Add tests for Bloom filters on Int8/Int16/Int64 columns

* Document int_tests macro

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core/tests/parquet/row_group_pruning.rs is missing tests for Int8/Int16/Int64 columns
2 participants