Arrow: Fix for dictionary encoded fixed length binary decimals #5198

bryanck · 2022-07-04T17:59:03Z

This PR fixes the vectorized reader for decimals that are fixed length binary and dictionary-encoded. Before these decimals would be downcast to 8 byte (long) precision. This only affects Parquet V2, as fixed length binary decimals aren't dictionary-encoded in Parquet V1. The Spark vectorized reader test for Parquet V2 was modified so this is now tested.

Also included is a minor refactor of the big endian padding to share some common logic and make it easier to enhance later.

In addition, the Spark benchmark test for the vectorized reader was enhanced to include both long encoded and fixed width binary encoded decimals.

rdblue · 2022-07-05T23:36:15Z

This looks good to me. Thanks for fixing it, @bryanck!

…e#5198)

bryanck added 2 commits July 4, 2022 09:33

Arrow: Fix for dictionary encoded fixed length binary decimals

dc09442

Add test for fix length binary decimal dictionary encoded

54b9771

github-actions bot added arrow spark labels Jul 4, 2022

bryanck mentioned this pull request Jul 4, 2022

Arrow: Pad decimal bytes before passing to decimal vector #5168

Merged

bryanck force-pushed the fixed-len-dec-dict branch from 3e6aa8f to c80f12b Compare July 4, 2022 22:50

Add additional decimal type to test

596dd88

bryanck force-pushed the fixed-len-dec-dict branch from c80f12b to 596dd88 Compare July 4, 2022 22:57

rdblue approved these changes Jul 5, 2022

View reviewed changes

rdblue merged commit c8b97c9 into apache:master Jul 5, 2022

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022

Arrow: Fix for dictionary encoded fixed length binary decimals (apach…

2a3b3b7

…e#5198)

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022

Arrow: Fix for dictionary encoded fixed length binary decimals (apach…

62bd5a5

…e#5198)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow: Fix for dictionary encoded fixed length binary decimals #5198

Arrow: Fix for dictionary encoded fixed length binary decimals #5198

Uh oh!

bryanck commented Jul 4, 2022

Uh oh!

rdblue commented Jul 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Arrow: Fix for dictionary encoded fixed length binary decimals #5198

Arrow: Fix for dictionary encoded fixed length binary decimals #5198

Uh oh!

Conversation

bryanck commented Jul 4, 2022

Uh oh!

rdblue commented Jul 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants