Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add log1p function #273

Merged
merged 2 commits into from
Sep 2, 2022
Merged

feat: add log1p function #273

merged 2 commits into from
Sep 2, 2022

Conversation

thisisnic
Copy link
Contributor

This PR adds the log1p function.

options: [ NAN, ERROR ]
required: false
- value: fp32
return: fp32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the surrounding functions don't do this but do these arguments need a nullability parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which arguments; as in - value: fp32? And by the same logic, would we also want a lot of the arithmetic functions to have a nullability parameter then? Or did you mean something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud why? The default is MIRROR (see table here), which is correct here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this signature handle the case of NULL inputs? It looks like it only accepts non-nullable inputs.

Copy link
Contributor

@jvanstraten jvanstraten Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because MIRROR and DECLARED_OUTPUT are IMO weird special cases simply because the return type derivation expressions lack the generality to describe them. See https://substrait.io/expressions/scalar_functions/#nullability-handling. MIRROR and DECLARED_OUTPUT ignore the nullability flag of the arguments and accept any combination of both nullable and non-nullable arguments, and MIRROR ignores and overrides the nullability flag of the return type with the boolean or of all argument nullability flags.

Copy link
Contributor

@jvanstraten jvanstraten Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, for the validator I'll be implementing a generalization of the type derivation grammar that uses type?? to specify "either nullable or non-nullable" in a type pattern, and type?parameter to bind the nullability to the name parameter, so variadics aside, you could describe MIRROR and DECLARED_OUTPUT more flexibly and explicitly if you want. I want to support that if only to avoid adding code for all the special cases, even if Substrait itself won't ever support that syntax.

ETA: heck, I could do away with the special cases even for variadics if I add a type?parameter? syntax for patterns, where parameter is bound to true iff any matched patterns were nullable :D But I realize that the syntax is arcane at best. Ultimately I just want to be able to round-trip the grammar with my IR for these things while being 100% compatible with Substrait grammar (for as far as that's possible at all due to all the conflicting descriptions in Substrait itself).

jvanstraten
jvanstraten previously approved these changes Aug 15, 2022
Copy link
Contributor

@jvanstraten jvanstraten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function definition looks good to me, though I'm not sure I would consider this function sufficiently primitive to include in core. After all, it's literally just (a probably optimized form of) log(add(literal(1), x)). Then again... the same could be said about all but one of log2, log10, natural log, and log with a specified base though, since they can all be rewritten to any other using the right identities. And then cos(x) is just sin(add(literal(pi/2), x)), and tan(x) is just divide(sin(x), cos(x)), and so on...

I don't know.

@julianhyde
Copy link

@jvanstraten For the record,log1p(x) is the same as log(1 + x) in math but not in floating point arithmetic. For values of x that are close to 0, log1p(x) will give a much more accurate result than log(x + 1). There is a good reason that many numeric libraries (e.g. NumPy and java.math.Math) have log1p and expm1.

@jvanstraten
Copy link
Contributor

I didn't know about log1p specifically, but figured as much. There's lots of functions like that to trade off performance, accuracy, and stability, like inverse sqrt to name one. I'm just not sure where we should draw the line for Substrait core. A calculator doesn't have a dedicated button for log1p or invsqrt either, because when you're thinking at the abstraction level of doing abstract math those are pretty meaningless operations. I doubt SQL has it either for the same reason, but correct me if I'm wrong. You can make the case that Substrait explicitly operates on fp32 or fp64 though, not on abstract real numbers. So again, I don't feel strongly either way in this case. I'm just worried this kind of thing will explode without bound if we don't draw the line somewhere.

@jvanstraten jvanstraten merged commit 55e8275 into substrait-io:main Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants