Skip to content

Add IGNORE NULLS clause to various Window functions#10568

Merged
rongrong merged 1 commit intoprestodb:masterfrom
ptkool:lag_lead_ignore_nulls
Dec 3, 2019
Merged

Add IGNORE NULLS clause to various Window functions#10568
rongrong merged 1 commit intoprestodb:masterfrom
ptkool:lag_lead_ignore_nulls

Conversation

@ptkool
Copy link
Contributor

@ptkool ptkool commented May 7, 2018

See #4554.

This PR adds the IGNORE/RESPECT NULLS clause to window functions LAG, LEAD, FIRST_VALUE, LAST_VALUE, and NTH_VALUE.

@ptkool
Copy link
Contributor Author

ptkool commented May 7, 2018

@electrum @martint @maciejgrzybek I ended up having to recreate this pull request after inadvertently wiping out all of my changes in the original PR. Please have a look at this.

@ptkool ptkool force-pushed the lag_lead_ignore_nulls branch from e5fcc25 to 5790346 Compare May 8, 2018 12:57
@kbkrieb
Copy link

kbkrieb commented Jun 7, 2018

Hey @ptkool - excited to see all the work you've done on this, I have a use case & was looking for it, so just stopping by to say thanks!

@johnistan
Copy link

I was just looking for the functionality. would give a +1 to the feature.

@facebook-github-bot
Copy link
Collaborator

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Collaborator

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

1 similar comment
@facebook-github-bot
Copy link
Collaborator

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@mrbrahman
Copy link

This will be a really nice to have feature! Will avoid all the workarounds currently needed. Hope this PR is very close to acceptance!

@Bennyelg
Copy link

Bennyelg commented Dec 4, 2018

Anything new about this?

@Bennyelg
Copy link

@electrum ? This is very important feature!!

@jinma1978
Copy link

any updates here? this is a must-have feature!!

@rongrong
Copy link
Contributor

@ptkool Could you please squash the commits into logical features? If there's only one you can just squash all into one commit. Thanks!

@Bennyelg
Copy link

workaround for now:
just force sort the partition by column (i.e: partition by v1 order by t1, case when t1 is null then 1 else 0 end)

@stale
Copy link

stale bot commented Jun 15, 2019

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot unassigned electrum Jun 15, 2019
@stale stale bot added the stale label Jun 15, 2019
@ryan-benty
Copy link

@ptkool bumping this PR - this would be a huge help!!

@stale stale bot removed the stale label Jun 20, 2019
@ptkool ptkool force-pushed the lag_lead_ignore_nulls branch 3 times, most recently from 83c8050 to 40338d3 Compare September 22, 2019 23:02
@ptkool
Copy link
Contributor Author

ptkool commented Sep 23, 2019

@rongrong Can I get this reviewed again?

@ptkool ptkool force-pushed the lag_lead_ignore_nulls branch 2 times, most recently from 57bcb64 to 1105314 Compare October 22, 2019 18:34
@ptkool ptkool force-pushed the lag_lead_ignore_nulls branch from 199a254 to e0bf716 Compare November 7, 2019 17:20
@ben-gilbert
Copy link

Also bumping, would love this added.

For those stuck I was able to use this solution on stackoverflow to come up with a stop-gap in Presto.

https://stackoverflow.com/a/19012333

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Sorry about the late review. I somehow missed it earlier. A few high level comments:

  • We structure commits logically. So when we fix things, instead of creating a commit as "fix..." we just modify the commit directly. For this PR, I think you can merge all commits into 1. You can force push to the branch afterwards.
  • Since RESPECT NULLS and IGNORE NULLS is a common functionality to all ValueWindowFunction, you might want to restructure the code so this is handled in the parent class to avoid duplicate logic.

@ptkool ptkool force-pushed the lag_lead_ignore_nulls branch 2 times, most recently from b86cbcb to a94efcc Compare November 13, 2019 14:44
Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. Some more detailed review. Please also update the commit message to have a meaning full summary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to add ignoreNulls to Aggregation. RESPECT NULLS | IGNORE NULLS is only relevant for ValueWindowFunction. It's probably better to add a semantic check in ExpressionAnalyzer to make sure this is only specified for related window functions, and throw an explicit SemanticException otherwise.

Copy link
Contributor Author

@ptkool ptkool Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed ignoreNulls from Aggregation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as adding a semantic check in ExpressionAnalyzer, there doesn't appear to be a nice way of determining whether a function is a value window function (other than checking the function name against a list of names).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignoreNulls should be a variable in ValueWindowFunction. You can have a separate setIgnoreNulls in ValueWindowFunction, that way you don't need to mess with the constructors. ignoreNulls is not really part of the function inputs anyways. If you keep a separate currentNonNullPosition in ValueWindowFunction, I believe the null handling can be done in ValueWindowFunction.processRow.

Copy link
Contributor Author

@ptkool ptkool Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the suggested change with ignoreNulls.

Copy link
Contributor Author

@ptkool ptkool Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe moving the null handling to ValueWindowFunction.processRow will simplify the code - the logic isn't duplicated in any of the window functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably still need special handling on each function, but how many non-null values there are since frameStart can be tracked. The current implementation is really inefficient as it tries to iterate from the beginning of the frame on every position to count whether it got enough non-null values. This is O(n^2) while you could have tracked this in O(n).

@ptkool ptkool force-pushed the lag_lead_ignore_nulls branch 2 times, most recently from 19d5c09 to 94ee06f Compare November 14, 2019 10:39
Add parser null treatment clause test

Fix compilation error

Clean up

Remove ignoreNulls attribute from Aggregation class

Fix documentation
@ptkool ptkool force-pushed the lag_lead_ignore_nulls branch from 94ee06f to 4f1ee84 Compare November 14, 2019 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.