Fix wrong results in MetadataQueryOptimizer with subfield filters #18704

rschlussel · 2022-11-18T18:30:21Z

Queries with equality filters on row subfields that had optimize_metadata_queries could return wrong results.

The MetadataQueryOptimizer rewrite should not take effect if there is a filter on a non-partition column that was pushed inside the scan. The way we check for these filters is by looking at the predicate and remainingPredicate in ConnectorTableLayout

When converting a HiveTableLayoutHandle to ConnectorTableLayout, equality filters on row subfields were being left out. As a result, we were incorrectly applying the MetadataQueryOptimizer rewrite when there was an equality filter on a row subfiled.

An example affected query shape is:
SELECT max(partition_column)
FROM my_table
WHERE struct.subfield IS NOT NULL

Test plan - new unit test

== RELEASE NOTES ==


Hive Changes
* Fix a bug when ``optimize_metadata_queries`` was set to true where queries with aggregations on partition columns and filters on row subfields could return wrong results

arunthirupathi

I assume, we were previously only working on non sub field predicate and ignored the sub field predicate.

the current code correctly ands the sub field predicate in the remaining predicate.

Is my understanding right ?

arunthirupathi · 2022-11-18T18:54:36Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

Can you please add a comment here on why this code is required ?

rschlussel · 2022-11-18T19:02:15Z

I assume, we were previously only working on non sub field predicate and ignored the sub field predicate.

the current code correctly ands the sub field predicate in the remaining predicate.

Is my understanding right ?

yes, exactly. https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java#L2851 is where we add all the non-subfield equality functions to predicate.

kaikalur · 2022-11-18T20:38:17Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

I thought there was a separate flag for subfield pushown. Do we need to check that here? Also, if it happens after the planning/optimization these info should be already available here?

There are two different optimizations

pushing down filters - this includes filters on subfields

subfield pushdown - this means telling the reader to prune out all the fields from the struct that don't get used by the query

the relevant optimization here is filter pushdown. I'm actually not sure why we check that filter pushdown is enabled here, though. if there were no filter pushdown, there shouldn't be any filters in this field, and if something else put filters on this field we should want to know about them too. It seems to me that filter pushdown should just create the HiveTableLayoutHandle that it wants and nobody else should need to know how it got to that state. If that's not how it works, there's something off about the abstraction.

However, as I don't understand why the check is there in the first place, I don't feel confident removing it without more testing. Since this is just a targeted bug fix, I kept the current logic.

My question was more like if that's turned off will we still get this fix? Maybe just add a test by turning off subfield pushdown as well and make sure the fix still works with that. Also those two are independent so let's make sure this fix works even if aria scan is turned off.

@kaikalur updated the test to include all permutations of these two properties.

v-jizhang · 2022-11-21T20:54:40Z

@bot kick off tests

Previously, queries with aggregations on partition columns and equality filters on row subfields that had optimize_metadata_queries could return wrong results. The MetadataQueryOptimizer rewrite should not take effect if there is a filter on a non-partition column that was pushed inside the scan. The way we check for these filters is by looking at the predicate and remainingPredicate in ConnectorTableLayout When converting a HiveTableLayoutHandle to ConnectorTableLayout, equality filters on row subfields were being left out. As a result, we were incorrectly applying the MetadataQueryOptimizer rewrite when there was an equality filter on a row subfield. An example affected query shape is: SELECT max(partition_column) FROM my_table WHERE struct.subfield IS NOT NULL

kaikalur · 2022-11-28T20:04:17Z

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveLogicalPlanner.java

+
+    @Test(dataProvider = "optimize_metadata_filter_properties")
+    public void testMetadataAggregationFoldingWithFilters(boolean pushdownSubfieldsEnabled, boolean pushdownFilterEnabled)
    {


kaikalur

LGTM

rschlussel force-pushed the fix-metadata-query-optimizer branch from 784787b to 8e0876d Compare November 18, 2022 18:32

rschlussel marked this pull request as ready for review November 18, 2022 18:43

rschlussel requested a review from a team as a code owner November 18, 2022 18:43

rschlussel requested review from arunthirupathi, kaikalur and presto-oss November 18, 2022 18:43

arunthirupathi approved these changes Nov 18, 2022

View reviewed changes

rschlussel force-pushed the fix-metadata-query-optimizer branch 2 times, most recently from 997ba35 to 5dccb1f Compare November 18, 2022 19:55

arunthirupathi approved these changes Nov 18, 2022

View reviewed changes

rschlussel force-pushed the fix-metadata-query-optimizer branch from 5dccb1f to fd95ead Compare November 18, 2022 20:23

kaikalur reviewed Nov 18, 2022

View reviewed changes

rschlussel force-pushed the fix-metadata-query-optimizer branch from fd95ead to fc6a827 Compare November 28, 2022 18:52

kaikalur reviewed Nov 28, 2022

View reviewed changes

kaikalur approved these changes Nov 28, 2022

View reviewed changes

rschlussel merged commit 82441f5 into prestodb:master Nov 28, 2022

wanglinsong mentioned this pull request Jan 12, 2023

Add release notes for 0.279 #18920

Merged

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wrong results in MetadataQueryOptimizer with subfield filters #18704

Fix wrong results in MetadataQueryOptimizer with subfield filters #18704

Uh oh!

rschlussel commented Nov 18, 2022 •

edited

Loading

Uh oh!

arunthirupathi left a comment

Uh oh!

arunthirupathi Nov 18, 2022

Uh oh!

rschlussel commented Nov 18, 2022

Uh oh!

kaikalur Nov 18, 2022

Uh oh!

rschlussel Nov 20, 2022

Uh oh!

kaikalur Nov 20, 2022 •

edited

Loading

Uh oh!

rschlussel Nov 28, 2022

Uh oh!

v-jizhang commented Nov 21, 2022

Uh oh!

kaikalur Nov 28, 2022

Uh oh!

kaikalur left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix wrong results in MetadataQueryOptimizer with subfield filters #18704

Fix wrong results in MetadataQueryOptimizer with subfield filters #18704

Uh oh!

Conversation

rschlussel commented Nov 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arunthirupathi left a comment

Choose a reason for hiding this comment

Uh oh!

arunthirupathi Nov 18, 2022

Choose a reason for hiding this comment

Uh oh!

rschlussel commented Nov 18, 2022

Uh oh!

kaikalur Nov 18, 2022

Choose a reason for hiding this comment

Uh oh!

rschlussel Nov 20, 2022

Choose a reason for hiding this comment

Uh oh!

kaikalur Nov 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rschlussel Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

v-jizhang commented Nov 21, 2022

Uh oh!

kaikalur Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

kaikalur left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rschlussel commented Nov 18, 2022 •

edited

Loading

kaikalur Nov 20, 2022 •

edited

Loading