Skip to content

A simple sql to sql rewrite for cardinality()#17198

Merged
highker merged 1 commit intoprestodb:masterfrom
nlaptev:simplify_cardinality_rewrite
Jan 21, 2022
Merged

A simple sql to sql rewrite for cardinality()#17198
highker merged 1 commit intoprestodb:masterfrom
nlaptev:simplify_cardinality_rewrite

Conversation

@nlaptev
Copy link
Copy Markdown
Contributor

@nlaptev nlaptev commented Jan 18, 2022

Simplify cardinality on map keys and values functions

A new optimizer rule is added to simplify expressions like
cardinality(map_keys(m)) into cardinality((m)). Same for
map_values function.

@kaikalur kaikalur requested review from highker and rongrong January 18, 2022 22:09
@nlaptev nlaptev force-pushed the simplify_cardinality_rewrite branch 8 times, most recently from 5014403 to c905243 Compare January 19, 2022 21:47
@highker highker requested a review from yuanzhanhku January 20, 2022 01:56
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to avoid creating a new object when there is no change to the expression.
For example, we can check if the returned pointer of the rewrite is the same as the argument pointer and keep track if any argument is changed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We're no longer creating a new object every time

Copy link
Copy Markdown

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the commit message, we will need to cap each line within 72 characters (inclusive). So something like

Simplify cardinality on map keys and values functions

A new optimizer rule is added to simplify expressions like
`cardinality(map_keys(m))` into `cardinality((m))`. Same for
`map_values` function.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ImmutableList.Builder<Expression> builder = ImmutableList.builder();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first condition of if should be outside of the for loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done -- although it makes the code a little bit longer

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign (FunctionCall) argument) to some variable so it can be reused in the following 3 lines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_keys and map_values can take and only take 1 argument right?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a const at the beginning of the class

private static final Set<QualifiedName> MAP_FUNCTIONS = ImmutableSet.of(QualifiedName.of("map_values"), QualifiedName.of("map_keys"));

and use it here

MAP_FUNCTIONS.contains(((FunctionCall) argument).getName())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always create new object. We should return new object only when we hit the above written criteria, which I assume most of the queries will not be rewritten with this rule.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add map_keys tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 41 to 57
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one argument per line and leave the first line empty

assertRewritten(
    "cardinality(map(ARRAY[cardinality(map_values(m_1)),3], ARRAY[2,cardinality(map_values(m_2))]))",
    "cardinality(map(ARRAY[cardinality(m_1),3], ARRAY[2,cardinality(m_2)]))");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 48 to 65
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add capital cases and mixed cases like CaDInaliTy(....) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@nlaptev nlaptev force-pushed the simplify_cardinality_rewrite branch 2 times, most recently from f2e3cb4 to b72aede Compare January 21, 2022 00:11
@nlaptev nlaptev force-pushed the simplify_cardinality_rewrite branch 4 times, most recently from 60d1e3e to c7cdf9b Compare January 21, 2022 00:56
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Object/Void

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tmpFn/functionCall

In presto codebase, we don't usually use abbreviations or acronyms

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/newFnIfRewritten/newFunctionIfRewritten

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 70 to 72
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this part is tricky. It's recursive so that we have to hold on a lot on building new objects before we can tell if we need rewrite or not. cc: @yuanzhanhku

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a valid concern. The current implementation makes a copy of all function argument pointers recursively even if we don't need to change it. It may cause regressions for queries with lots of expressions. That being said, I don't have an easy way to address this. One idea is to have a tree node level bitmap encode what type of nodes are contained in the tree so that we could have a O(1) function to tell if a tree contains a given node type. But this requires some large refactor.

@highker highker self-assigned this Jan 21, 2022
@nlaptev nlaptev force-pushed the simplify_cardinality_rewrite branch from c7cdf9b to 61b2955 Compare January 21, 2022 16:22
A new optimizer rule is added to simplify expressions like
`cardinality(map_keys(m))` into `cardinality((m))`. Same for
`map_values` function.
@nlaptev nlaptev force-pushed the simplify_cardinality_rewrite branch from 61b2955 to 9874bde Compare January 21, 2022 17:14
@highker highker merged commit af84cc8 into prestodb:master Jan 21, 2022
@neeradsomanchi neeradsomanchi mentioned this pull request Feb 8, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants