feat: add bounds for unary math scalar functions #11584

tshauck · 2024-07-21T15:35:47Z

Which issue does this PR close?

Closes #11583 11583

Rationale for this change

Many math unary functions aren't unbounded, but because the macro doesn't include a way to modify this on a per function basis the trait default (of unbounded) is used.

What changes are included in this PR?

Change the make_math_unary_udf macro to take a bounds function
Implement a function for math functions where the bounds are known a priori.

Are these changes tested?

Yes, see unittests.

Are there any user-facing changes?

No

datafusion/functions/src/math/monotonicity.rs

datafusion/functions/src/math/bounds.rs

berkaysynnada · 2024-07-22T10:30:18Z

datafusion/functions/src/math/bounds.rs

+pub(super) fn unbounded(_input: &[&Interval]) -> crate::Result<Interval> {
+    // We cannot assume the input datatype is the same of output type.
+    Interval::make_unbounded(&DataType::Null)
+}


Is it better here returning not_impl_err? The current version may cause type mismatch errors and debugging would require more time.

I think that would cause an error when trying to use these functions? The update macro is going to call whatever is passed as $EVALUATE_BOUNDS. Right now this implementation matches the default of the ScalarUDFImpl trait.

I think an alternative is to make something like Interval::make_symmetric_inf. I'm using unbounded here somewhat loosely, but probably in most of these cases it should be (-inf, inf) which could be given a type.

Went ahead and did that here: cf2b023

berkaysynnada · 2024-07-23T07:02:32Z

datafusion/expr/src/interval_arithmetic.rs

@@ -332,6 +332,46 @@ impl Interval {
        Ok(Self::new(unbounded_endpoint.clone(), unbounded_endpoint))
    }

+    /// Creates an interval between -∞ to ∞.
+    pub fn make_infinity_interval(data_type: &DataType) -> Result<Self> {


We use ScalarValue::Float(None) for infinite bounds by convention. Even if you give Float(Some(f::INF)) or Float(Some(f::NaN)), they are converted to Float(None)'s during interval creation. We had given such a decision to have unique representation of unboundedness. You can check the details here:

datafusion/datafusion/expr/src/interval_arithmetic.rs

Line 178 in deef834

macro_rules! handle_float_intervals {

I recommend here to use make_unbounded API with floating types to not break this convention.

berkaysynnada · 2024-07-23T07:03:38Z

datafusion/expr/src/interval_arithmetic.rs

+    }
+
+    /// Create an interval from 0 to infinity.
+    pub fn make_non_negative_infinity_interval(data_type: &DataType) -> Result<Self> {


Same suggestion applies here as well

berkaysynnada · 2024-07-23T07:11:06Z

Thanks @tshauck. This is a nice step towards having comprehensive interval analysis. I have left a small suggestion. Once that is addressed, the PR will be good to go 🚀

ozankabak · 2024-07-23T13:03:23Z

Apart from @berkaysynnada's outstanding suggestions the only thing I see is that we would need to use two different π values, depending on whether it is in the lower bound or the upper bound of the resulting interval.

If π will be in the upper bound, we should use the smallest floating point number that is greater than the mathematical (precise) value of π. Let's call this pi_upper. Conversely, if π will be in the lower bound, we should use the largest floating point number that is less than the mathematical (precise) value of π.

Since π is not exactly representable, the pi you get from Rust is actually either pi_upper or pi_lower (you need to check). You can obtain the other one by using the nextafter function (let's just be careful to do this only once and use it as a constant everywhere).

Apart from this detail that is addressable by a small fix, this seems to be a great PR. Thank you.

tshauck · 2024-07-23T17:19:45Z

Thanks to both of you for the thoughtful feedback.

@berkaysynnada, I updated infinity to use unbounded in 231a717.

@ozankabak, for handling π, do you mean libm::nextafter? There's also a .next_up, but unfortunately, it's in nightly. Did I overlook the actual function, or if not, any opinion on bringing in libm as dependency vs something w/i datafusion, albeit less robust?

ozankabak · 2024-07-23T17:27:27Z

IIRC nextafter is the IEEE standard name for this function, and Rust's next_up does something similar (in the upward direction). If it is in nightly, we probably should just create constants with these values explicitly given for each float type.

tshauck · 2024-07-24T03:59:19Z

Thanks @ozankabak, would you please have a look at the latest changes and lmk what you think?

berkaysynnada · 2024-07-24T07:06:49Z

I checked the new version for both unbounded creation and rounded PI usage. Everything looks good and seems ready to merge. Thanks @tshauck!

ozankabak

All looks good to me -- thanks for this great work @tshauck!

tshauck · 2024-07-24T17:09:51Z

Thanks! -- it looks like signum was changed in the meantime, so removing that for now as it doesn't use make_math_unary_udf anymore. The rebase seems ok locally, but I'll check to see if things don't go through for w/e reason here.

alamb · 2024-07-24T19:21:11Z

I took a quick look and this looks good to me. Thank you @tshauck @berkaysynnada and @ozankabak 🚀

github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate and removed core Core DataFusion crate labels Jul 21, 2024

tshauck commented Jul 21, 2024

View reviewed changes

datafusion/functions/src/math/monotonicity.rs Show resolved Hide resolved

tshauck marked this pull request as ready for review July 21, 2024 17:12

berkaysynnada reviewed Jul 22, 2024

View reviewed changes

datafusion/functions/src/math/bounds.rs Outdated Show resolved Hide resolved

berkaysynnada reviewed Jul 22, 2024

View reviewed changes

tshauck requested a review from berkaysynnada July 22, 2024 15:00

berkaysynnada reviewed Jul 23, 2024

View reviewed changes

tshauck requested a review from berkaysynnada July 24, 2024 00:04

ozankabak approved these changes Jul 24, 2024

View reviewed changes

tshauck added 12 commits July 24, 2024 10:01

feat: unary udf function bounds

5f16d8b

feat: add bounds for more types

46fc574

feat: remove eprint

d44dea1

fix: add missing bounds file

0db9c1f

tests: add tests for unary udf bounds

5188415

tests: test f32 and f64

86b680d

build: remove unrelated changes

7919ab9

refactor: better unbounded func name

c420f56

tests: fix tests

9dd2ebd

refactor: use data_type method

f191a93

refactor: add more useful intervals to Interval

e54c5f6

refactor: use typed bounds for (-inf, inf)

3114c12

tshauck added 6 commits July 24, 2024 10:02

refactor: inf to unbounded

7a377e6

refactor: add lower/upper pi bounds

c2c9221

refactor: consts to consts module

b2ff865

fix: add missing file

4a1b64f

fix: docstring typo

ecba0cb

refactor: remove unused signum bounds

a10a150

tshauck force-pushed the add-unary-udf-bounds branch from 4337de8 to a10a150 Compare July 24, 2024 17:05

alamb merged commit 5901df5 into apache:main Jul 24, 2024
24 checks passed

tshauck deleted the add-unary-udf-bounds branch July 24, 2024 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add bounds for unary math scalar functions #11584

feat: add bounds for unary math scalar functions #11584

tshauck commented Jul 21, 2024

berkaysynnada Jul 22, 2024

tshauck Jul 22, 2024 •

edited

Loading

tshauck Jul 22, 2024

berkaysynnada Jul 23, 2024

berkaysynnada Jul 23, 2024

berkaysynnada commented Jul 23, 2024

ozankabak commented Jul 23, 2024

tshauck commented Jul 23, 2024

ozankabak commented Jul 23, 2024

tshauck commented Jul 24, 2024

berkaysynnada commented Jul 24, 2024

ozankabak left a comment

tshauck commented Jul 24, 2024

alamb commented Jul 24, 2024 •

edited

Loading

feat: add bounds for unary math scalar functions #11584

feat: add bounds for unary math scalar functions #11584

Conversation

tshauck commented Jul 21, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

berkaysynnada Jul 22, 2024

Choose a reason for hiding this comment

tshauck Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

tshauck Jul 22, 2024

Choose a reason for hiding this comment

berkaysynnada Jul 23, 2024

Choose a reason for hiding this comment

berkaysynnada Jul 23, 2024

Choose a reason for hiding this comment

berkaysynnada commented Jul 23, 2024

ozankabak commented Jul 23, 2024

tshauck commented Jul 23, 2024

ozankabak commented Jul 23, 2024

tshauck commented Jul 24, 2024

berkaysynnada commented Jul 24, 2024

ozankabak left a comment

Choose a reason for hiding this comment

tshauck commented Jul 24, 2024

alamb commented Jul 24, 2024 • edited Loading

tshauck Jul 22, 2024 •

edited

Loading

alamb commented Jul 24, 2024 •

edited

Loading