Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Dec 13, 2022

When trying to implement CRC computation in Parquet C++, we found the wording to be ambiguous.

Clarify that CRC computation happens on the exact binary serialization (instead of a long-winded and confusing elaboration about v1 and v2 data page layout).

Also, clarify that CRC computation can apply to all page kinds, not only data pages (for reference, parquet-mr currently support checksumming v1 data pages as well as dictionary pages).

Also, see discussion on #126 (comment) and below.

When trying to implement CRC computation in Parquet C++, we found the wording to be ambiguous.

Clarify that CRC computation happens on the exact binary serialization (instead of a long-winded and confusing elaboration about v1 and v2 data page layout).

Also, clarify that CRC computation can apply to all page kinds, not only data pages
(for reference, parquet-mr currently support checksumming v1 data pages as well as dictionary pages).

Also, see discussion on apache#126 (comment) and below.
@pitrou pitrou requested a review from gszadovszky December 13, 2022 14:43
@pitrou
Copy link
Member Author

pitrou commented Dec 13, 2022

@bbraams @gszadovszky @mapleFU thoughts?

@mapleFU
Copy link
Member

mapleFU commented Dec 13, 2022

The change looks good to me! Thanks a lot!

@pitrou pitrou merged commit 613a1cf into apache:master Jan 3, 2023
@pitrou pitrou deleted the PARQUET-2218-crc-clarification branch January 3, 2023 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants