Skip to content

Fix infinite loop during planning caused by PickTableLayout#16627

Closed
LiBrian415 wants to merge 1 commit intoprestodb:masterfrom
LiBrian415:array-predicate-fix
Closed

Fix infinite loop during planning caused by PickTableLayout#16627
LiBrian415 wants to merge 1 commit intoprestodb:masterfrom
LiBrian415:array-predicate-fix

Conversation

@LiBrian415
Copy link
Copy Markdown
Contributor

@LiBrian415 LiBrian415 commented Aug 19, 2021

This diff is a fix for an edge case that causes infinite loops in the PickTableLayout$pushPredicateIntoTableScan.
Specifically, if a presto query contains a predicate that compares a complex field (array, map, ...) with a value of the
same type e.g. some_array_field <> array[], pushPredicateIntoTableScan will infinitely loop until it times out or until
a stackoverflowerror. The fact that these are complex fields might not matter, it's just that the value of these
types are generally already blocks.

While the planning timing out makes sense, the stackoverflowerror needs a bit of explaining. In the constructor
of TableScanNode, it wraps the output and assignment collections in unmodifiablecollections. While this is fine
on it's own, it becomes a problem when you have an infinite loop in pushPredicateIntoTableScan. Looking at the
implementation of pushPredicateIntoTableScan, each the it's called, it creates a new TableScanNode by passing in
values for the previous TableScanNodes. This creates N-depth wrapped unmodifiableCollections where N is the number
of pushPredicateIntoTableScan called on the same node. Because of how unmodifiablecollections are implemented,
this then causes N calls to stream(), toArray(), iterator(), ... each time one is called.

Now, that we know the end result is an infinite loop that causes a timeout or a stackoverflow, the remaining
question is why does this happen in the first place. After some investigation, this seems to happen because of
how domains are gathered during the pushPredicateIntoTableScan. Specifically, using a BASIC_COLUMN_EXTRACTOR,
predicates are extracted as domains if they take the from of a simple <VariableReferenceExpression, Anything>.
This also explains why predicates like map_key(<some_map>) = array[] still work since it'll be more complex than
the pattern above so they won't be pulled in as a domain. When creating a domain, the values are then converted to
their native block types. However, since these fields are generally not hive partitions, when a layout is generated,
they'll be considered as unenforcedConstraints. However, due to the processing of predicate -> domain -> predicate,
the resultingPredicates will have the same value but they're actually different objects. Since these Block subclasses don't
override equals, when the PickTableLayout checks if there has been some transformation (the optimization produced
a different plan structure), it'll always think something has changed. However, the only thing that changed was the
object, but the information contained by the blocks are still the same.

The fix here is just to override equals and hashcode of blocks.

tldr; Marker creation in Domain creation constructs new blocks from old blocks. This causes issues when checking for
block equality as blocks don't override equals.

Note: this wasn't tested against the latest version of Presto.

@LiBrian415 LiBrian415 changed the title Fix infinite loop in planning stage caused by predicates involving array[] Fix infinite loop during planning caused by PickTableLayout Aug 19, 2021
@v-jizhang v-jizhang self-requested a review August 27, 2021 22:18
@v-jizhang
Copy link
Copy Markdown
Contributor

Hi @LiBrian415, I'm trying to understand how an infinite loop in pushPredicateIntoTableScan. Do you have repro steps to share? Thanks

@LiBrian415
Copy link
Copy Markdown
Contributor Author

Hi @v-jizhang. Thanks for taking a look. I was able to reproduce it locally on the latest version of presto by following: https://stackoverflow.com/questions/48932907/setup-standalone-hive-metastore-service-for-presto-and-aws-s3. For visibility, I've added the configs and commands that I used below.
hive.properties

connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=file:///tmp/hive_catalog
hive.metastore.user=<user_name>
create schema hive.default;
use hive.default;
create table t (a array<varchar>);
insert into t (values array['hi']);
select * from t where a <> array[]; 

After running select * from t where a <> array[];, I'll get a statement is too large (stack overflow during analysis) message and when I use the debugger on intellij, the planner seems to be stuck in the infinite mentioned before. I've only included a table with an array but using a map and comparing the col to map() would also cause the same infinite loop.

@LiBrian415 LiBrian415 force-pushed the array-predicate-fix branch from 3e76154 to 35a94ce Compare August 31, 2021 01:38
@v-jizhang
Copy link
Copy Markdown
Contributor

@LiBrian415 Thanks, I can confirm the infinite loop exists and your fix look good to me.

Copy link
Copy Markdown
Contributor

@v-jizhang v-jizhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should GroupByIdBlock$hashCode and LazyBlock$hashCode be implemented also? Thanks

@kaikalur
Copy link
Copy Markdown
Contributor

kaikalur commented Sep 8, 2021

I'm not able to repro with the latest version.

@v-jizhang
Copy link
Copy Markdown
Contributor

I'm not able to repro with the latest version.

I've double checked. It is still in the current release https://prestodb.io/download.html and the current code.
I also tested this PR and I can confirm the bug is fixed by this PR.

Copy link
Copy Markdown
Contributor

@v-jizhang v-jizhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should GroupByIdBlock$hashCode and LazyBlock$hashCode be implemented also? Thanks

@LiBrian415
Copy link
Copy Markdown
Contributor Author

Hi @v-jizhang, thanks for double checking that the issue is in the current release. I don't think overriding the methods for LazyBlock and GroupByIdBlock is necessary. LazyBlock seems to be only used around page sources, which is after the plan/optimization steps, and it's unlikely that there exists a native type that will be converted to a GroupByIdBlock.

tldr; Since the issue is caused by comparing blocks created from the conversation of native types to blocks during the optimization step, we'll only need to override hashCode and equals for blocks that can be potentially created when converting native values to block.

Copy link
Copy Markdown
Contributor

@v-jizhang v-jizhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LiBrian415
Copy link
Copy Markdown
Contributor Author

Thanks for the review. :D

@rongrong
Copy link
Copy Markdown
Contributor

rongrong commented Oct 1, 2021

@LiBrian415 can you add a test case to AbstractTestQueries? Thanks!

@highker
Copy link
Copy Markdown

highker commented Oct 7, 2021

+1, @LiBrian415, could you add a test case that could repro the issue without your patch?

@highker highker self-assigned this Oct 7, 2021
@highker highker removed their assignment Oct 22, 2021
@rohanpednekar
Copy link
Copy Markdown
Contributor

@LiBrian415 Hope all is well with you. Just checking in to see if you are still planning to work on this PR?
@v-jizhang Do you think you can pick this up further to get it merged if needed? Thanks!

@LiBrian415
Copy link
Copy Markdown
Contributor Author

@rohanpednekar Hi, sorry about that. I've been busy with school for the last few months and I'm not sure when I'll have time to finish this up. I'd like to have someone else pick it up if it's still useful. Thanks.

@v-jizhang
Copy link
Copy Markdown
Contributor

Replaced by #17720

@v-jizhang v-jizhang closed this May 3, 2022
v-jizhang added a commit to v-jizhang/presto that referenced this pull request May 10, 2022
Replaces prestodb#16627

Co-authored-by: Brian Li <librian415@gmail.com>
rschlussel pushed a commit that referenced this pull request May 10, 2022
Replaces #16627

Co-authored-by: Brian Li <librian415@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants