-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support decimal scalar value #1394
Conversation
ce0150f
to
9aa8f11
Compare
9aa8f11
to
0c3506d
Compare
|
@Dandandan @alamb @houqp PTAL |
this is likely due to a new rust version being released with more stringent clippy lint rules Thankfully @xudong963 has a PR up to get it working again: #1395 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good @liukun4515 -- thank you. I had a few suggestions, but overall this looks just about perfect.
datafusion/src/scalar.rs
Outdated
@@ -43,6 +43,8 @@ pub enum ScalarValue { | |||
Float32(Option<f32>), | |||
/// 64bit float | |||
Float64(Option<f64>), | |||
/// 128bit decimal, using the i128 to represent the decimal | |||
Decimal128(Option<i128>, Option<usize>, Option<usize>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 it seems like we always need the precision and scale (to know what type the Decimal is).
Thus, perhaps this could be changed to be
Decimal128(Option<i128>, Option<usize>, Option<usize>), | |
Decimal128(Option<i128>, usize, usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice point.
I almost forgot this.
scale: usize, | ||
) -> Result<Self> { | ||
// make sure the precision and scale is valid | ||
// TODO const the max precision and min scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arrow spec doesn't seem to say anything about min/max valid scales: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L181-L190
I poked a little around in arrow-rs and I also didn't find any limits on the precision or scale either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use 128bit to represent a decimal value, the max precision
is 38.
The scale
is always greater or equal to 0 and less than or equal to precision
.
In the arrow-rs, the decimal is represented by i128
with precision
and scale
.
So we can add the MAX_PRECISION = 38
in the datafusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I got this correctly.
The spec doesn't say anything about the the precision/scale of DECIMAL
, but we're implementing DECIMAL128
here so 38 should be the max precision.
After apache/arrow-rs#131 is implemented, we can implement DECIMAL256
type in datafusion and the max precision will be 76.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now in the arrow-rs the decimal256 has not been implemented.
In the arrow-go, it is also not implemented too.
Just in the arrow-java and arrow-c++, the bitwith of 256 has been implemented.
@capkurmagati
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make this const in other pull request
datafusion/src/scalar.rs
Outdated
@@ -100,6 +102,12 @@ impl PartialEq for ScalarValue { | |||
// any newly added enum variant will require editing this list | |||
// or else face a compile error | |||
match (self, other) { | |||
(Decimal128(v1, p1, s1), Decimal128(v2, p2, s2)) => { | |||
v1.eq(v2) && p1.eq(p2) && s1.eq(s2) | |||
// TODO how to handle this case: decimal(123,10,1) with decimal(1230,10,2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int8(Some(100))
is not equal to Int64(Some(100))
with this implementation either , so I think it is consistent with the rest of this comparison of decimal(123,10,1) and
decimal(1230,10,2)` are different --
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep.
If we want to compare two values with the different data types, you should convert them to the same data type and then compare them.
datafusion/src/scalar.rs
Outdated
@@ -171,6 +179,17 @@ impl PartialOrd for ScalarValue { | |||
// any newly added enum variant will require editing this list | |||
// or else face a compile error | |||
match (self, other) { | |||
// TODO decimal type, we just compare the values which have the same precision and scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code in this PR is correct -- two decimal's can be compared if they have the same precision / scale and they can't be compared (returns None
) if they have different precision/scales).
p.hash(state); | ||
s.hash(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could probably skip hashing the precision and scale - and just hash the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two decimal values have the same value and diff precision or scale, the hash value is the same.
But if we hash all data in the decimal, the above case will not happen.
It is better to hash all data in decimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point 👍
datafusion/src/scalar.rs
Outdated
ScalarValue::Decimal128(_, _, _) => { | ||
// TODO add the default precision and scale for this case | ||
// DataType::Decimal(38, 0) | ||
panic!("The Decimal Scalar value with invalid precision or scale."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you changed Decimal as suggested above to always have precision and scale, this code would not be possible
ScalarValue::Decimal128(v1, _, _) => v1, | ||
_ => unreachable!(), | ||
}) | ||
.collect::<Vec<Option<i128>>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using DecimalBuilder
in this way is awkward. I wonder if we could add a function to create Decimal values from an array of i128
(likely it would go in arrow-rs). Something like
let arr = DecimalArray::from_iter_and_scale(array.into_iter(), precision, scale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can add the from_iter_and_scale
function in arrow-rs later and replace this in the followup pull request.
Do you agree this? @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related pr: apache/arrow-rs#1009
datafusion/src/scalar.rs
Outdated
scale: &usize, | ||
) -> ScalarValue { | ||
let array = array.as_any().downcast_ref::<DecimalArray>().unwrap(); | ||
// TODO add checker: the precision and scale are same with array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a good assert!
type check to add, though I don't think it would ever happen unless there is some bug
0c3506d
to
bbc0a8f
Compare
@alamb I have addressed all comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great -- thank you @liukun4515
@@ -43,6 +43,8 @@ pub enum ScalarValue { | |||
Float32(Option<f32>), | |||
/// 64bit float | |||
Float64(Option<f64>), | |||
/// 128bit decimal, using the i128 to represent the decimal | |||
Decimal128(Option<i128>, usize, usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
p.hash(state); | ||
s.hash(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point 👍
Which issue does this PR close?
Closes #1393
From #122 In order to support the decimal data type in the datafusion, we should add decimal scalar value first.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?