Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect statistics read for i8 i16 columns in parquet #10585

Closed
Tracked by #10453
alamb opened this issue May 20, 2024 · 3 comments · Fixed by #10629
Closed
Tracked by #10453

Incorrect statistics read for i8 i16 columns in parquet #10585

alamb opened this issue May 20, 2024 · 3 comments · Fixed by #10629
Assignees
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented May 20, 2024

Describe the bug

As @NGA-TRAN found in #10537 when i8 and i16 values are written to parquet and then the statistics are extracted, the returned min/max values are incorrect.

This could lead to incorrect results when reading parquet files with filters on columns with i8 and i16 types

To Reproduce

See tests added in #10537

Expected behavior

No response

Additional context

No response

@alamb alamb added the bug Something isn't working label May 20, 2024
@alamb alamb changed the title Incorrect statistics read for i8 i16 Incorrect statistics read for i8 i16 columns in parquet May 20, 2024
@alamb
Copy link
Contributor Author

alamb commented May 20, 2024

Possibly related to #9779

@Lordworms
Copy link
Contributor

I have detected the location which causes the bug, seems like in arrow-rs, when getting DataType::Int8, we always return a Int32 Array Instead
image

@Lordworms
Copy link
Contributor

take

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants