-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Use decimal precision to determine variant decimal type #13692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| public void testDecimal4() { | ||
| VariantPrimitive<?> value = | ||
| SerializedPrimitive.from( | ||
| new byte[] {primitiveHeader(8), 0x04, (byte) 0xD2, 0x02, (byte) 0x96, 0x49}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the value here?
| 0x10, | ||
| 0x22, | ||
| 0x11 | ||
| (byte) 0xFA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, I'm using the right decimal4 and decimal8 values for testing.
(byte) 0xD2, 0x02, (byte) 0x96, 0x49 has value 1234567890 but it should be decimal8 instead of decimal4.
Maybe I should add a check to error out if it's out of the decimal4 range?
| if (bitLength < 32) { | ||
| int precision = value.precision(); | ||
|
|
||
| if (precision >= 1 && precision <= 9) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these synonymous? I notice this matches our table in the Spec but i'm a little confused about if the previous behavior we wrote here was a bug.
Is it possible to have a unscaledValue bitlength < 32 and have a precision above 9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the value like 123456.7890, the unscaled fits in int32 while the precision is 10. So they are not exactly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value
|
@aihuaxu Going through the tests it makes me think we had a bug in the previous implementation, is that correct? |
| new byte[] {primitiveHeader(8), 0x04, (byte) 0x15, (byte) 0xCD, (byte) 0x5B, 0x07}); | ||
|
|
||
| assertThat(value.type()).isEqualTo(PhysicalType.DECIMAL4); | ||
| assertThat(value.get()).isEqualTo(new BigDecimal("123456.7890")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the issue here that 123456.7890 == Precision 10 so by the spec it shouldn't be Decimal4 it should be Decimal 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Other engines like Spark and Parquet-Java implements properly.
I think you are right. This may be considered a bug since some values like 123456.7890 should be encoded as decimal8 while it is encoded as decimal4. |
| Variants.of(new BigDecimal("1234567890.987654321")), // decimal8 | ||
| Variants.of(new BigDecimal("-1234567890.987654321")), // decimal8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these changed because they should be decimal16 right?
| if (bitLength < 32) { | ||
| int precision = value.precision(); | ||
|
|
||
| if (precision >= 1 && precision <= 9) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value
|
Thanks, @aihuaxu! I merged this to get it into 1.10. |
Updated decimal encoding to follow Variant spec
Aligned the decimal encoding logic with the Variant spec, selecting the appropriate decimal type based on precision rather than unscaled value range. Updated corresponding tests to reflect the change.