Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better processing and optimization if IN <list> predicates #7707

Merged
merged 6 commits into from
Sep 4, 2023
Merged

Conversation

dyemanov
Copy link
Member

This PR implements native IN processing rather than converting it into a binary tree of ORed predicates. Details are described below.

  1. Processing is now linear rather than recursive, thus no runtime stack limitations. The artificial limit of 1500 items is gone. The current implementation has a hard-limit of 64K items just because BLR needs some limit and because this value looks OK from the sanity POV.

  2. Lists that are known to be constant are pre-evaluated as invariants and cached as a binary search tree, thus making comparisons faster if condition needs to be checked for many rows or if the value list is long enough.

  3. If the list is very long or if the IN predicate is not selective, the index scan supports searching groups using the sibling pointer (i.e. horizontally) rather than searching every group from root (i.e. vertically). This corresponds to Improve the index scan to match multiple values at once [CORE4600] #4915.

Performance results in TPC-R queries containing IN predicates:

Q12: ~ 2x improvement (list scan is used instead of full-scan with bitmap)
Q16: no difference (the plan is the same)
Q19: ~ 10x improvement (list scan is used instead of many range scans)
Q22: ~ 2x improvement (the plan is the same, but expression evaluation is more optimal)

The patch (in its original incarnation, based on v3 and v4 codebases) was tested by our customer in production since 2022 and looked stable, although they didn't see such a noticeable performance boost. But they needed longish lists (> 1500 items) more than a better performance ;-) When porting this into master I improved some things I didn't like originally.

@dyemanov dyemanov self-assigned this Aug 10, 2023
@@ -138,6 +141,41 @@ class ComparativeBoolNode final : public TypedNode<BoolExprNode, ExprNode::TYPE_
};


class InListBoolNode : public TypedNode<BoolExprNode, ExprNode::TYPE_IN_LIST_BOOL>
{
const static UCHAR blrOp = blr_in_list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make a constant instead of use the blr directly in gen function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in two methods (also in internalPrint) and I don't like duplicating things ;-)


static DmlNode* parse(thread_db* tdbb, MemoryPool& pool, CompilerScratch* csb, const UCHAR blrOp);

virtual void getChildren(NodeRefsHolder& holder, bool dsql) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is new code, worth change the virtual redeclarations to use override instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.


if (!(impure->vlu_flags & VLU_computed))
{
delete impure->vlu_misc.vlu_sortedList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to initialize the field to nullptr just after the delete.
Because if the new constructor fail, it could be left with garbage.
This is done after delete impure->vlu_misc.vlu_invariant; in SubstringSimilarNode::execute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

return MOV_compare(JRD_get_thread_data(), desc1, desc2);
}

LookupValueList::LookupValueList(MemoryPool& pool, const ValueListNode* values, ULONG impure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be possible to avoid const_casts replacing some constructors to not use pointer to consts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try.

@dyemanov dyemanov merged commit 0493422 into master Sep 4, 2023
@dyemanov dyemanov deleted the in-list branch September 4, 2023 06:31
dyemanov added a commit that referenced this pull request Sep 13, 2023
* WIP

* Original (circa 2022) implementation of the IN LIST optimization, with some post-fixes and minor adjustments

* Make it possible to optimize IN <list> for middle segments in compund indices

* Avoid modifying the retrieval structure at runtime, it may be shared among concurrent requests

* Simplify the code a little. Better cost calculation. Support both root-based and sibling-based list scans inside the same plan node.

* Removed the unneeded const casts and other changed as suggested by Adriano
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants