Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFusion reads Date32 and Date64 parquet statistics in as Int32Array #10587

Closed
Tracked by #10453
alamb opened this issue May 20, 2024 · 2 comments · Fixed by #10593
Closed
Tracked by #10453

DataFusion reads Date32 and Date64 parquet statistics in as Int32Array #10587

alamb opened this issue May 20, 2024 · 2 comments · Fixed by #10593
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented May 20, 2024

Describe the bug

When reading a Date32 or Date64 column from a parquet file, DataFusion currently returns an Int32 array

To Reproduce

You can see the issue in #10537

  • test_dates_32_diff_rg_sizes
  • test_dates_64_diff_rg_sizes

Expected behavior

I expect a

  1. Date32 column to be read as Date32Array
  2. Date64 column to be read as Date64Array

Additional context

No response

@edmondop
Copy link
Contributor

@alamb the title here doesn't make much sense, are you saying that the min and max are not extracted as Date32/Date64?

@alamb alamb changed the title DataFusion reads Date32 and Date64 parquet statistics in as DataFusion reads Date32 and Date64 parquet statistics in as Int32Array May 20, 2024
@alamb
Copy link
Contributor Author

alamb commented May 20, 2024

Thanks for pointing that out @edmondop -- yes the min/max seem to be extracted as Int32Arrays

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants