feat: Support log for Decimal32 and Decimal64 #18999

Mark1626 · 2025-11-30T12:02:49Z

Which issue does this PR close?

Part of Native decimal 32/64/256 bit support for log #17555 .

Rationale for this change

Analysis

Other engines:

Clickhouse seems to only consider "(U)Int*", "Float*", "Decimal*" as arguments for log https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/log.cpp#L47-L63

Libraries

There a C++ library libdecimal which internally uses Intel Decimal Floating Point Library for it's decimal32 operations. Intel's library itself converts the decimal32 to double and calls log. https://github.com/karlorz/IntelRDFPMathLib20U2/blob/main/LIBRARY/src/bid32_log.c
There was another C++ library based on IBM's decimal decNumber library https://github.com/semihc/CppDecimal . This one's implementation of log is fully using decimal, but I don't think this would be very performant way to do this

I'm going to go with an approach similar to the one inside Intel's decimal library. To begin with the decimal32 -> double is done by a simple scaling

What changes are included in this PR?

Support Decimal32 for log

Are these changes tested?

Yes, unit tests have been added, and I've tested this from the datafusion cli for Decimal32

> select log(2.0, arrow_cast(12345.67, 'Decimal32(9, 2)'));
+-----------------------------------------------------------------------+
| log(Float64(2),arrow_cast(Float64(12345.67),Utf8("Decimal32(9, 2)"))) |
+-----------------------------------------------------------------------+
| 13.591717513271785                                                    |
+-----------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.021 seconds.

Are there any user-facing changes?

The precision of the result for Decimal32 will change, the precision loss in Decimal128 implementation of log loses precision #18524 does not occur in this PR

Mark1626 · 2025-11-30T12:06:12Z

I'm still working on the Decimal64, but early feedback on the PR is much appreciated

datafusion/sqllogictest/test_files/decimal.slt

datafusion/functions/src/utils.rs

datafusion/functions/src/math/log.rs

datafusion/functions/src/utils.rs

Mark1626 · 2025-12-01T02:21:20Z

Interesting, I didn't realise negative scales aren't allowed. I assumed they were as arrow allows negative scales in decimal.
https://github.com/apache/arrow-rs/blob/main/arrow-schema/src/datatype.rs#L359-L372

Jefffrey · 2025-12-01T08:21:39Z

Interesting, I didn't realise negative scales aren't allowed. I assumed they were as arrow allows negative scales in decimal. https://github.com/apache/arrow-rs/blob/main/arrow-schema/src/datatype.rs#L359-L372

Negative scales are allowed; I believe any places in our codebase they are disallowed is mainly due to implementation limitation (i.e. not yet supported) rather than inherently not being possible.

(Haven't had a chance to review this PR yet, hopefully soon)

Jefffrey

Now that I think about it, how would this implementation be different from having our coercion/casting logic convert the input decimal arrays to floats before applying the log, as opposed to doing that decimal -> float ourselves here? 🤔

Mark1626 · 2025-12-03T05:58:10Z

Now that I think about it, how would this implementation be different from having our coercion/casting logic convert the input decimal arrays to floats before applying the log

My take here is that it would depend on the function. Let's say something round, ceil, floor a coercion is unacceptable, where as with things like log, sin the results would not be precise and so a coercion is acceptable

We can doing it entirely in decimal like this one based on IBM's version https://github.com/semihc/CppDecimal/blob/main/src/decNumber.c#L1384-L1518

But I think this would be a rather expensive version. Also from #18524 if we want
select log(2.0, 100000000000000000000000000000000000::decimal(38,0)); to be 116.267483321058 we need the result of log to be float (which it currently is), otherwise I think the result has to be 116 (since it's (38, 0))

Another thing is Intel's decimal library does convert decimal -> binary64 then back to decimal. The conversion itself is a bit more sophisticated than a simple (N / 10*scale) https://github.com/karlorz/IntelRDFPMathLib20U2/blob/main/LIBRARY/src/bid32_log.c

We can port their conversion logic here if needed, I wanted to get some feedback on this PR before that. Let me know your suggestion on this

Jefffrey · 2025-12-05T23:02:59Z

In the original PR that kicked off this effort (#17023) it converts the decimal128 to the native i128 representation before doing an integer log, as converting to f64 apparently causes some precision loss. I think we should follow suit as otherwise there is little difference than previous behaviour of casting to float64 first before doing the log 🤔

It also makes me wonder if we should handle negative scale by just casting to float to do the log so we don't lose functionality.

Thanks for looking into the other solutions from IBM and Intel; I think we can avoid porting/copying their code unless there is a strong need for what they bring to the table for us.

Mark1626 · 2025-12-06T10:18:55Z

Ok, instead of converting to float I'll keep it as integers and perform and integer log.

Just one thing though the log function in DuckDB and Clickhouse return float64/double so the behaviour might different (but I personally think that's fine, as the user will not have an implicit type conversion from decimal to float). And if we are returning as decimal then the following is expected right? across Decimal32, Decimal64, Decimal128, Decimal256

Query	Res
select log(12345::decimal(38,0))	4.0
select log(12345::decimal(38,2))	4.09

I went through the Intel and IBM solution to understand potential edge cases which could affect precision. I won't be porting anything from them unless it's truly needed, point noted

Jefffrey · 2025-12-06T12:59:53Z

Ok, instead of converting to float I'll keep it as integers and perform and integer log.

Just one thing though the log function in DuckDB and Clickhouse return float64/double so the behaviour might different (but I personally think that's fine, as the user will not have an implicit type conversion from decimal to float). And if we are returning as decimal then the following is expected right? across Decimal32, Decimal64, Decimal128, Decimal256

Query Res

select log(12345::decimal(38,0)) 4.0

select log(12345::decimal(38,2)) 4.09

I went through the Intel and IBM solution to understand potential edge cases which could affect precision. I won't be porting anything from them unless it's truly needed, point noted

I think we should follow what the decimal128 version is doing: it performs ilog on the scaled i128 and then converts it to f64 to return. We aren't returning log of decimal as log, we're still converting it to f64.

The idea is that doing the log as ilog on scaled decimal we get more accurate results than casting decimal to float before performing log on the float.

See the original decimal128 PR for reference: #17023

Jefffrey · 2025-12-16T00:41:09Z

datafusion/functions/src/utils.rs

+    } else if scale as u8 > precision {
+        Err(ArrowError::ComputeError(format!(
+            "scale {scale} is greater than precision {precision}"
+        )))
+    } else if scale == 0 {
+        Ok(value)
+    } else {
+        validate_decimal32_precision(value, precision, scale)?;


I'm curious why we differ from the decimal128 version above in having a precision parameter and doing these extra checks?

This was added to validate cases where precision > 9 and also check the max value (Decimal(32, 0) max value 999999999).

This conversion in the decimal128 testcase is actually incorrect as it's greater than the max decimal128 value
https://github.com/Mark1626/datafusion/blob/05eea8b1144487a6a698d3be8815c93b689d15a3/datafusion/functions/src/utils.rs#L411

Do you think it's redundant, in which case I'll remove it? ScalarValue does a validation but it doesn't validate on the max value
https://github.com/Mark1626/datafusion/blob/05eea8b1144487a6a698d3be8815c93b689d15a3/datafusion/common/src/scalar/mod.rs#L4446

I'm not sure if the checks should be done in these functions, as they are detached from the actual DataType::DecimalXX(_) so it looks weird that we check the precision even though at this level it doesn't feel like it is this functions responsibility 🤔

Same goes for checking that scale doesn't exceed precision; it seems like something that would be checked higher in the chain.

Sure, I'll remove these redundant checks

Jefffrey · 2025-12-16T00:41:54Z

datafusion/functions/src/utils.rs

+                        "{value} and {precision} {scale} vs {expected:?}"
+                    );
+                }
+                Err(_) => assert!(expected.is_none()),


Would be nice if we can assert the expected error

ok, I'll add an assertion on the expected error

Jefffrey · 2025-12-19T02:50:02Z

Thanks @Mark1626 @martin-g @kumarUjjawal

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Nov 30, 2025

kumarUjjawal suggested changes Nov 30, 2025

View reviewed changes

datafusion/sqllogictest/test_files/decimal.slt Outdated Show resolved Hide resolved

datafusion/functions/src/utils.rs Outdated Show resolved Hide resolved

kumarUjjawal reviewed Nov 30, 2025

View reviewed changes

datafusion/functions/src/math/log.rs Outdated Show resolved Hide resolved

Mark1626 marked this pull request as ready for review November 30, 2025 16:59

martin-g reviewed Nov 30, 2025

View reviewed changes

datafusion/functions/src/utils.rs Outdated Show resolved Hide resolved

datafusion/functions/src/utils.rs Outdated Show resolved Hide resolved

datafusion/functions/src/utils.rs Outdated Show resolved Hide resolved

Mark1626 requested a review from martin-g December 1, 2025 02:27

martin-g approved these changes Dec 1, 2025

View reviewed changes

Jefffrey reviewed Dec 3, 2025

View reviewed changes

Jefffrey reviewed Dec 16, 2025

View reviewed changes

Mark1626 added 8 commits December 16, 2025 20:27

feat: Support log for Decimal32

5d9b573

feat: Support log for Decimal64, refactor Decimal32 log code

bd14ace

chore: Refactor changes

fa5784f

fix: Update tests to 12 decimal points

65ec6ca

fix: Perform integer log for decimal32 and decimal64

52d86b3

lint: Fix lint and fmt issue

8fb3a8d

test: Assert on error message

b2eed3c

refactor: Remove redundant checks

2b78589

Mark1626 force-pushed the decimal3264-log branch from ad73b04 to 2b78589 Compare December 16, 2025 15:02

Mark1626 added 2 commits December 16, 2025 20:36

lint: Fix formatting issue

332f0a3

fix: Fix failing test

0074c03

Mark1626 requested a review from Jefffrey December 16, 2025 16:00

Jefffrey approved these changes Dec 17, 2025

View reviewed changes

Merge branch 'main' into decimal3264-log

1b85516

alamb requested a review from kumarUjjawal December 18, 2025 20:56

kumarUjjawal approved these changes Dec 19, 2025

View reviewed changes

Jefffrey added this pull request to the merge queue Dec 19, 2025

Merged via the queue into apache:main with commit c2747eb Dec 19, 2025
27 checks passed

Jefffrey mentioned this pull request Dec 19, 2025

Allow log/pow on negative scale decimals #19250

Closed

feat: Support log for Decimal32 and Decimal64 #18999

feat: Support log for Decimal32 and Decimal64 #18999

Uh oh!

Conversation

Mark1626 commented Nov 30, 2025 • edited by Jefffrey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

Analysis

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Mark1626 commented Nov 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mark1626 commented Dec 1, 2025

Uh oh!

Jefffrey commented Dec 1, 2025

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Mark1626 commented Dec 3, 2025

Uh oh!

Jefffrey commented Dec 5, 2025

Uh oh!

Mark1626 commented Dec 6, 2025

Uh oh!

Jefffrey commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jefffrey Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Mark1626 Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Mark1626 Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Mark1626 Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jefffrey commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Mark1626 commented Nov 30, 2025 •

edited by Jefffrey

Loading

Jefffrey commented Dec 6, 2025 •

edited

Loading