Skip to content

Conversation

@aihuaxu
Copy link
Contributor

@aihuaxu aihuaxu commented Jul 27, 2025

Updated decimal encoding to follow Variant spec

Aligned the decimal encoding logic with the Variant spec, selecting the appropriate decimal type based on precision rather than unscaled value range. Updated corresponding tests to reflect the change.

public void testDecimal4() {
VariantPrimitive<?> value =
SerializedPrimitive.from(
new byte[] {primitiveHeader(8), 0x04, (byte) 0xD2, 0x02, (byte) 0x96, 0x49});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the value here?

0x10,
0x22,
0x11
(byte) 0xFA,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I'm using the right decimal4 and decimal8 values for testing.

(byte) 0xD2, 0x02, (byte) 0x96, 0x49 has value 1234567890 but it should be decimal8 instead of decimal4.

Maybe I should add a check to error out if it's out of the decimal4 range?

if (bitLength < 32) {
int precision = value.precision();

if (precision >= 1 && precision <= 9) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these synonymous? I notice this matches our table in the Spec but i'm a little confused about if the previous behavior we wrote here was a bug.

Is it possible to have a unscaledValue bitlength < 32 and have a precision above 9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the value like 123456.7890, the unscaled fits in int32 while the precision is 10. So they are not exactly the same.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value

@RussellSpitzer
Copy link
Member

@aihuaxu Going through the tests it makes me think we had a bug in the previous implementation, is that correct?

new byte[] {primitiveHeader(8), 0x04, (byte) 0x15, (byte) 0xCD, (byte) 0x5B, 0x07});

assertThat(value.type()).isEqualTo(PhysicalType.DECIMAL4);
assertThat(value.get()).isEqualTo(new BigDecimal("123456.7890"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the issue here that 123456.7890 == Precision 10 so by the spec it shouldn't be Decimal4 it should be Decimal 8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Other engines like Spark and Parquet-Java implements properly.

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Jul 28, 2025

@aihuaxu Going through the tests it makes me think we had a bug in the previous implementation, is that correct?

I think you are right. This may be considered a bug since some values like 123456.7890 should be encoded as decimal8 while it is encoded as decimal4.

Comment on lines -127 to -128
Variants.of(new BigDecimal("1234567890.987654321")), // decimal8
Variants.of(new BigDecimal("-1234567890.987654321")), // decimal8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these changed because they should be decimal16 right?

if (bitLength < 32) {
int precision = value.precision();

if (precision >= 1 && precision <= 9) {
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value

@rdblue rdblue merged commit 15351e6 into apache:main Jul 31, 2025
43 checks passed
@rdblue
Copy link
Contributor

rdblue commented Jul 31, 2025

Thanks, @aihuaxu! I merged this to get it into 1.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants