Skip to content

Conversation

@nevi-me
Copy link
Contributor

@nevi-me nevi-me commented Dec 15, 2020

Contains the following:

  • Fixes the date64 writer by writing the array as a timestamp milli
  • Adds support for reading and writing interval types

Apache Spark doesn't read intervals (as they're fixed len binary)
It looks like the CPP impl at v2.0.0 just reads back the binary data.
The binary data read is however correct.

The result is that we can now correctly read and write the temporal types.
The Duration type remains unsupported.

@nevi-me
Copy link
Contributor Author

nevi-me commented Dec 15, 2020

CC @carols10cents @houqp

@codecov-io
Copy link

codecov-io commented Dec 15, 2020

Codecov Report

Merging #8926 (c760e88) into master (091df20) will decrease coverage by 0.03%.
The diff coverage is 64.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8926      +/-   ##
==========================================
- Coverage   83.22%   83.18%   -0.04%     
==========================================
  Files         196      196              
  Lines       48232    48341     +109     
==========================================
+ Hits        40142    40214      +72     
- Misses       8090     8127      +37     
Impacted Files Coverage Δ
rust/parquet/src/arrow/array_reader.rs 75.49% <48.43%> (-1.51%) ⬇️
rust/parquet/src/arrow/schema.rs 91.05% <56.52%> (-0.27%) ⬇️
rust/parquet/src/arrow/arrow_writer.rs 95.58% <78.57%> (-1.76%) ⬇️
rust/parquet/src/arrow/arrow_reader.rs 91.35% <100.00%> (+0.09%) ⬆️
rust/parquet/src/arrow/converter.rs 73.04% <100.00%> (+4.35%) ⬆️
rust/parquet/src/encodings/encoding.rs 95.24% <0.00%> (-0.20%) ⬇️
rust/arrow/src/compute/kernels/cast.rs 96.84% <0.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 091df20...c760e88. Read the comment docs.

@github-actions
Copy link

@github-actions github-actions bot added needs-rebase A PR that needs to be rebased by the author and removed needs-rebase A PR that needs to be rebased by the author labels Dec 16, 2020
Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see any changes in the tests, which I would expect for a change in semantics. Aren't we not testing this yet, or how can I verify that it is now correct?

@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 18, 2020
Contains the following:
- Fixes the date64 writer by writing the array as a timestamp milli
- Adds support for reading and writing interval types

> Apache Spark doesn't read intervals (as they're fixed len binary)
> It looks like the CPP impl at v2.0.0 just reads back the binary data.
The binary data read is however correct.
@github-actions github-actions bot removed the needs-rebase A PR that needs to be rebased by the author label Dec 19, 2020
@nevi-me
Copy link
Contributor Author

nevi-me commented Dec 19, 2020

I did not see any changes in the tests, which I would expect for a change in semantics. Aren't we not testing this yet, or how can I verify that it is now correct?

There were existing tests, but they were ignored. So I removed the #[ignore] so they run.

I was writing date64 as timestampmilli, but this wasn't per the spec, so I fixed that now. I also added an interval test to increase test coverage.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

alamb pushed a commit that referenced this pull request Jan 4, 2021
…nt types

This PR adds capabilities to read decimal columns in parquet files that store them as i32 or i64.

I tried to follow the approach in #8926 by using casts.

Closes #9047 from sweb/ARROW-11072/support-int-types

Authored-by: Florian Müller <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants