Skip to content

RowExpression duplicate isNull on same variable issue#16336

Merged
zhenxiao merged 1 commit intoprestodb:masterfrom
chliang71:isnull-rowexpression
Jul 2, 2021
Merged

RowExpression duplicate isNull on same variable issue#16336
zhenxiao merged 1 commit intoprestodb:masterfrom
chliang71:isnull-rowexpression

Conversation

@chliang71
Copy link
Contributor

@chliang71 chliang71 commented Jun 25, 2021

Part of simplifyRowExpressionOptimizer execution is eliminates duplicates. Part of execution is the the code below

    public RowExpression minimalNormalForm(RowExpression expression)
    {
        RowExpression conjunctiveNormalForm = convertToConjunctiveNormalForm(expression);
        RowExpression disjunctiveNormalForm = convertToDisjunctiveNormalForm(expression);
        return numOfClauses(conjunctiveNormalForm) > numOfClauses(disjunctiveNormalForm) ? disjunctiveNormalForm : conjunctiveNormalForm;
    }

One scenario is that there can be OR(IS_NULL(a), IS_NULL(a)), i.e. two duplicate IS_NULL on same variable. In this case, the two convertTo*NormalForm call will both simplify it to just one IS_NULL(a) instance, and with form set to IS_NULL (replacing original form = OR). However, this causes trouble for the numOfClauses call later, because current code assumes specialFormExpression must have 2 arguments, while here it is just IS_NULL(a) with only one argument.

It is quite tricky to get to this corner case scenario though, just having a = null or a = null is not enough. I only got one query hitting this, and have been trying to hand craft a query that can cause this. Still have little clue so far, but the query that hit this issue has the core part below hitting the issue:

with x as (
  select a from xx
), z as (
  select case when (x.a is null or y.a is null) then 'AAA' else 'BBB' as col
  from y left join x on x.a = y.a
)
select * from z
where col != 'AAA'

Interestingly, even slightly tweaking the conditions (e.g. change to col = 'AAA' instead) this issue would not happen.

Test plan - (Please fill in how you tested your changes)

Added unit test.

== RELEASE NOTES ==

General Changes
* Add support for expressions with multiple ``IS NULL`` on the same variable.

@chliang71 chliang71 force-pushed the isnull-rowexpression branch from e382e0b to 7d13b8e Compare June 25, 2021 05:48
@chliang71
Copy link
Contributor Author

cc @vkorukanti @zhenxiao any thoughts?

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. one minor issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about having meaningful names:
s/aIsNull/isNullExpression/g
s/args/arguments/g
s/xx/duplicateIsNullExpression/g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@chliang71 chliang71 force-pushed the isnull-rowexpression branch from 7d13b8e to dfeed97 Compare July 2, 2021 04:09
@chliang71
Copy link
Contributor Author

@zhenxiao I think the checks are passing, mind taking another look? Thanks!

@zhenxiao zhenxiao merged commit 4de2228 into prestodb:master Jul 2, 2021
@ajaygeorge
Copy link
Contributor

Hi @zhenxiao / @chliang71
Release notes are missing for this PR.
Please add them according to the guidelines mentioned here

@zhenxiao
Copy link
Collaborator

zhenxiao commented Jul 7, 2021

Support expressions with multiple IS NULL on the same variable @chliang71 what do you think?

@ajaygeorge ajaygeorge mentioned this pull request Jul 7, 2021
1 task
@chliang71
Copy link
Contributor Author

Thanks @zhenxiao. Seems @ajaygeorge already edited (thanks!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants