[WIP] Merge NestedLoopJoin Build vectors by karteekmurthys · Pull Request #10048 · facebookincubator/velox

karteekmurthys · 2024-06-04T22:25:53Z

Current implementation of NestedLoopJoin produces ungrouped final results which breaks the StreamingAggregation, which expects input to the aggregate operator is pre-grouped. Based on this fact, the StreamingAgg dedups the rows.

In NestedLoop Probe side if the input is

1, 1
2, 2

and build side you have 2 vectors

Vector 1 : 1
Vector 2: 2

The output is still going to be

1, 1, 1
2, 2, 1
1, 1, 2
2, 2, 2

which is ungrouped.

With this PR change, the above output would look like:

1, 1, 1
1, 1, 2
2, 2, 1
2, 2, 2

netlify · 2024-06-04T22:26:12Z

✅ Deploy Preview for meta-velox ready!

Name	Link
🔨 Latest commit	`5999b9e`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/665f94733ed2700008ae79d3
😎 Deploy Preview	https://deploy-preview-10048--meta-velox.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Yuhta · 2024-06-04T22:34:56Z

velox/exec/NestedLoopJoinBuild.cpp

  }

+  while (dataVectors_.size() > 1) {
+    dataVectors_[0]->append(dataVectors_[dataVectors_.size()-1].get());


No need to merge the vector, you just need to change the loop order in probe to do the probe side first

@karteekmurthys : Agree with Jimmy. We need to change the probe loop to generate the indices differently.

One approach I tried was generating all the probe side indices for total build vector size (sum of all build vectors at the time of probing). Since we compute buildSize as sum of all build vectors, this also takes care of generating the build side indices. Now, all we need to handle is projections which expects a single RowVector for probe and build side. It seems like at this point we are forced to merge all our build vectors to be projected at once. That is why I chose to merge vectors in NestedLoopBuild side. Please CMIW there could be a better way to map indices to vectors.

projectChildren( projectedChildren, buildVectors_.value()[buildIndex_], <- Single RowVector, but we need to project multiple into single output. buildProjections, numOutputRows, buildIndices_);

Synced with @Yuhta will try another approach and update this PR.

Discussed offline, merging build vectors help avoiding small output batches so might worth doing here. But BaseVector::append is not the right implementation here as it's too slow. Can you create a static method in BaseVector like this?

VectorPtr BaseVector::merge(const std::vector<VectorPtr>& vectors, memory::MemoryPool* pool) { VELOX_CHECK(!vector.empty()); auto& type = vectors[0].type(); vector_size_t size = 0; for (auto& vector : vectors) { VELOX_CHECK_EQ(*vector->type(), *type); size += vector->size(); } auto result = create(type, size, pool); size = 0; for (auto& vector : vectors) { result->copy(vector.get(), size, 0, vector->size()); size += vector->size(); } return result; }

I think about it again, merging will produce batches larger than we currently use, so have a risk memory-wise. @mbasmanova Do you have any opinion here? Will merging the build side in nested loop causing any potential issue?

@Yuhta, @mbasmanova : There is another option here. This grouping behavior from NestedLoopJoinNode is needed because of the use of StreamingAggregation in the original query.

We could change the LocalPlanner to use HashAggregation instead of the StreamingAggregation when used over a NestedLoopJoin instead for the correctness.

To elaborate, the original issue came from incorrect results in the following query:

SELECT count(*) FROM orders o WHERE EXISTS(SELECT 1 FROM orders i WHERE o.orderkey < i.orderkey AND i.orderkey % 1000 = 0);

count(*) needed aggregation and the subquery needed NestedLoopJoin in Prestissimo.

In Presto Java -- the same JoinNode translates to LookupJoin which imposes a sorting on the HashBuild side in this code https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/sql/planner/LocalExecutionPlanner.java#L2436. So it justified the usage of StreamingAggregation.

We don't have an equivalent for such a LookupJoin in Velox and used NestedLoopJoin.

Since StreamingAggregation above NestedLoopJoin requires pre-grouping, it didn't give correct results.

As demonstrated in @karteekmurthys example

In NestedLoop Probe side if the input is 1, 1 2, 2 and build side you have 2 vectors Vector 1 : 1 Vector 2: 2 The output is going to be 1, 1, 1 2, 2, 1 1, 1, 2 2, 2, 2 which is ungrouped.

This double counts the groups for both 1 and 2 keys.

We could change in local planning to use HashAggregation instead of StreamingAggregation above NestedLoopJoin to avoid the mis-counting.

wdyt ?

@amitkdutta

aditi-pandit · 2024-06-05T16:05:41Z

velox/vector/ComplexVector.cpp

      // the length.
      if (newSize > oldSize) {
-        VELOX_CHECK(child.unique(), "Resizing shared child vector");
+        // VELOX_CHECK(child.unique(), "Resizing shared child vector");


Why have you commented this line ?

This check makes sure if the vectors are shared we don't resize them. I will try an alternate approach like you and @Yuhta suggested.

mbasmanova · 2024-06-07T17:34:36Z

@karteekmurthys @aditi-pandit Catching up on this issue. It sounds like you found that Presto optimizer assumed that NLJ produces output in a particular order. Is this the case? Would you share a pointer to that assumption? I remember seeing that Presto optimizer has some assumptions about the LEFT JOIN, but not sure about NLJ.

Before we proceed with the solution, let's investigate optimizer a bit more to come up with a full list of assumptions like this. Let's then create GitHub issue to describe this assumption and discuss how we can satisfy that in Velox.

aditi-pandit · 2024-06-08T20:22:08Z

@karteekmurthys @aditi-pandit Catching up on this issue. It sounds like you found that Presto optimizer assumed that NLJ produces output in a particular order. Is this the case? Would you share a pointer to that assumption? I remember seeing that Presto optimizer has some assumptions about the LEFT JOIN, but not sure about NLJ.

Before we proceed with the solution, let's investigate optimizer a bit more to come up with a full list of assumptions like this. Let's then create GitHub issue to describe this assumption and discuss how we can satisfy that in Velox.

@mbasmanova : Well, Presto Java JoinNode translates to NestedLoopJoin only for cross join. This case with a LEFT OUTER with non-equal filter condition translates to a LookupJoin during physical planning in LocalExecutionPlanner for this case https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/sql/planner/LocalExecutionPlanner.java#L1958... and the LookupJoinSourceFactory has means to access sortChannels https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/sql/planner/LocalExecutionPlanner.java#L2390

The EXPLAIN plan snippet for LeftJoin also shows the SortExpression property

 - Aggregate(STREAMING)[orderkey, unique][PlanNodeId 263] => [orderkey:bigint, unique:bigint, count_21:bigint]                                                                                                            
    Estimates: {source: CostBasedSourceInfo, rows: ? (?), cpu: ?, memory: ?, network: 11.00} count_21 := "presto.default.count"((non_null))                                                                                                                                                                                                                                                                                        
     - LeftJoin[PlanNodeId 262][(orderkey) < (orderkey_0)] => [orderkey:bigint, unique:bigint, non_null:boolean]                                                                                                         
        Estimates: {source: CostBasedSourceInfo, rows: ? (?), cpu: ?, memory: 11.00, network: 11.00}                                                                                                                 
          Distribution: REPLICATED                                                                                                                                                                                     
          **SortExpression[orderkey_0]**                                                                                                                                                                                   
      - AssignUniqueId[PlanNodeId 261] => [orderkey:bigint, unique:bigint]

There isn't an equivalent in Velox for LookupJoin unless you had ideas around some other ideas for it.

Prestissimo translates JoinNode to NestedLoopJoin here and there isn't any concept of sortChannels or preGrouping for it https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp#L1122

The Velox plan fragment had been resolved to StreamingAggregation already which has pre-grouped Channels. That might be becasue of the sortExpression carried in the contexts.

mbasmanova · 2024-06-10T14:36:08Z

@aditi-pandit Aditi, looks like the plan translation in Prestissimo has a bug. It ignores 'sortChannels' information in the join node incorrectly. An immediate fix would be to change the translation logic to fail to translate join nodes with sorting requirements. The next step would be to design a proper solution to support this use case.

mbasmanova · 2024-06-10T14:40:49Z

We may need to study this PR prestodb/presto#8614 to understand this use case.

aditi-pandit · 2024-06-10T16:37:15Z

@aditi-pandit Aditi, looks like the plan translation in Prestissimo has a bug. It ignores 'sortChannels' information in the join node incorrectly. An immediate fix would be to change the translation logic to fail to translate join nodes with sorting requirements. The next step would be to design a proper solution to support this use case.

@mbasmanova : The JoinNode serialized to Prestissimo doesn't have sortChannels field as far as I can see

struct JoinNode : public PlanNode {
  JoinType type = {};
  std::shared_ptr<PlanNode> left = {};
  std::shared_ptr<PlanNode> right = {};
  List<EquiJoinClause> criteria = {};
  List<VariableReferenceExpression> outputVariables = {};
  std::shared_ptr<std::shared_ptr<RowExpression>> filter = {};
  std::shared_ptr<VariableReferenceExpression> leftHashVariable = {};
  std::shared_ptr<VariableReferenceExpression> rightHashVariable = {};
  std::shared_ptr<JoinDistributionType> distributionType = {};
  Map<String, VariableReferenceExpression> dynamicFilters = {};

  JoinNode() noexcept;
};

Is it something else you are referring to ?

karteekmurthys · 2024-06-14T06:45:59Z

We are not taking this approach of modifying nestedloop join as of now. We will avoid using pregrouped keys when source is nestedloop join, the fix is here: prestodb/presto#22998

Merge NestedLoopJoin Build vectors

5999b9e

karteekmurthys requested a review from aditi-pandit June 4, 2024 22:25

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 4, 2024

Yuhta requested a review from mbasmanova June 4, 2024 22:33

Yuhta reviewed Jun 4, 2024

View reviewed changes

karteekmurthys self-assigned this Jun 4, 2024

aditi-pandit reviewed Jun 5, 2024

View reviewed changes

karteekmurthys closed this Jun 14, 2024

aditi-pandit mentioned this pull request Jul 31, 2024

[native] Disable Nested Loop Join in Prestissimo. prestodb/presto#23341

Open

Conversation

karteekmurthys commented Jun 4, 2024

Uh oh!

netlify bot commented Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox ready!

Uh oh!

Yuhta Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

aditi-pandit Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

karteekmurthys Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

karteekmurthys Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

Yuhta Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

Yuhta Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditi-pandit Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditi-pandit Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

karteekmurthys Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

mbasmanova commented Jun 7, 2024

Uh oh!

aditi-pandit commented Jun 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbasmanova commented Jun 10, 2024

Uh oh!

mbasmanova commented Jun 10, 2024

Uh oh!

aditi-pandit commented Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karteekmurthys commented Jun 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

netlify bot commented Jun 4, 2024 •

edited

Loading

Yuhta Jun 6, 2024 •

edited

Loading

aditi-pandit Jun 6, 2024 •

edited

Loading

aditi-pandit commented Jun 8, 2024 •

edited

Loading

aditi-pandit commented Jun 10, 2024 •

edited

Loading