Fix infinite loop during planning caused by PickTableLayout#16627
Fix infinite loop during planning caused by PickTableLayout#16627LiBrian415 wants to merge 1 commit intoprestodb:masterfrom
Conversation
|
Hi @LiBrian415, I'm trying to understand how an infinite loop in pushPredicateIntoTableScan. Do you have repro steps to share? Thanks |
|
Hi @v-jizhang. Thanks for taking a look. I was able to reproduce it locally on the latest version of presto by following: https://stackoverflow.com/questions/48932907/setup-standalone-hive-metastore-service-for-presto-and-aws-s3. For visibility, I've added the configs and commands that I used below. After running |
3e76154 to
35a94ce
Compare
|
@LiBrian415 Thanks, I can confirm the infinite loop exists and your fix look good to me. |
v-jizhang
left a comment
There was a problem hiding this comment.
Should GroupByIdBlock$hashCode and LazyBlock$hashCode be implemented also? Thanks
|
I'm not able to repro with the latest version. |
I've double checked. It is still in the current release https://prestodb.io/download.html and the current code. |
v-jizhang
left a comment
There was a problem hiding this comment.
Should GroupByIdBlock$hashCode and LazyBlock$hashCode be implemented also? Thanks
|
Hi @v-jizhang, thanks for double checking that the issue is in the current release. I don't think overriding the methods for LazyBlock and GroupByIdBlock is necessary. LazyBlock seems to be only used around page sources, which is after the plan/optimization steps, and it's unlikely that there exists a native type that will be converted to a GroupByIdBlock. tldr; Since the issue is caused by comparing blocks created from the conversation of native types to blocks during the optimization step, we'll only need to override hashCode and equals for blocks that can be potentially created when converting native values to block. |
|
Thanks for the review. :D |
|
@LiBrian415 can you add a test case to |
|
+1, @LiBrian415, could you add a test case that could repro the issue without your patch? |
|
@LiBrian415 Hope all is well with you. Just checking in to see if you are still planning to work on this PR? |
|
@rohanpednekar Hi, sorry about that. I've been busy with school for the last few months and I'm not sure when I'll have time to finish this up. I'd like to have someone else pick it up if it's still useful. Thanks. |
|
Replaced by #17720 |
Replaces prestodb#16627 Co-authored-by: Brian Li <librian415@gmail.com>
Replaces #16627 Co-authored-by: Brian Li <librian415@gmail.com>
This diff is a fix for an edge case that causes infinite loops in the PickTableLayout$pushPredicateIntoTableScan.
Specifically, if a presto query contains a predicate that compares a complex field (array, map, ...) with a value of the
same type e.g. some_array_field <> array[], pushPredicateIntoTableScan will infinitely loop until it times out or until
a stackoverflowerror. The fact that these are complex fields might not matter, it's just that the value of these
types are generally already blocks.
While the planning timing out makes sense, the stackoverflowerror needs a bit of explaining. In the constructor
of TableScanNode, it wraps the output and assignment collections in unmodifiablecollections. While this is fine
on it's own, it becomes a problem when you have an infinite loop in pushPredicateIntoTableScan. Looking at the
implementation of pushPredicateIntoTableScan, each the it's called, it creates a new TableScanNode by passing in
values for the previous TableScanNodes. This creates N-depth wrapped unmodifiableCollections where N is the number
of pushPredicateIntoTableScan called on the same node. Because of how unmodifiablecollections are implemented,
this then causes N calls to stream(), toArray(), iterator(), ... each time one is called.
Now, that we know the end result is an infinite loop that causes a timeout or a stackoverflow, the remaining
question is why does this happen in the first place. After some investigation, this seems to happen because of
how domains are gathered during the pushPredicateIntoTableScan. Specifically, using a BASIC_COLUMN_EXTRACTOR,
predicates are extracted as domains if they take the from of a simple <VariableReferenceExpression, Anything>.
This also explains why predicates like map_key(<some_map>) = array[] still work since it'll be more complex than
the pattern above so they won't be pulled in as a domain. When creating a domain, the values are then converted to
their native block types. However, since these fields are generally not hive partitions, when a layout is generated,
they'll be considered as unenforcedConstraints. However, due to the processing of predicate -> domain -> predicate,
the resultingPredicates will have the same value but they're actually different objects. Since these Block subclasses don't
override equals, when the PickTableLayout checks if there has been some transformation (the optimization produced
a different plan structure), it'll always think something has changed. However, the only thing that changed was the
object, but the information contained by the blocks are still the same.
The fix here is just to override equals and hashcode of blocks.
tldr; Marker creation in Domain creation constructs new blocks from old blocks. This causes issues when checking for
block equality as blocks don't override equals.
Note: this wasn't tested against the latest version of Presto.