Skip to content

Conversation

@wankunde
Copy link

@wankunde wankunde commented Dec 29, 2024

What changes were proposed in this pull request?

Bug fix for MergeJoin which the join filter is not null. For example, in the tpcds 95, the subquery ws_wh contains a join filter ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk

SELECT
    ws1.ws_order_number,
    ws1.ws_warehouse_sk wh1,
    ws2.ws_warehouse_sk wh2
  FROM web_sales ws1, web_sales ws2
  WHERE ws1.ws_order_number = ws2.ws_order_number
    AND ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk

How to reproduce this issue:

In tpc95 query, if the right matches contains more than one batch, and these batches are DictionaryVector.
When we addOutputRow() from those batches, SMJ join will only copy the Dictionary indices.
And then we decode the incorrect value from the last DictionaryVector.

image

void copyRow(
    const RowVectorPtr& source,
    vector_size_t sourceIndex,
    const RowVectorPtr& target,
    vector_size_t targetIndex,
    const std::vector<IdentityProjection>& projections) {
  for (const auto& projection : projections) {
    const auto& sourceChild = source->childAt(projection.inputChannel);
    const auto& targetChild = target->childAt(projection.outputChannel);
    targetChild->copy(sourceChild.get(), targetIndex, sourceIndex, 1);
  }
}

Why are the changes needed?

Fix data quality issue.

How was this patch tested?

Test with TPCDS 95 query and local query

SELECT
    ws1.ws_order_number,
    ws1.ws_warehouse_sk wh1,
    ws2.ws_warehouse_sk wh2,
    ws2.ws_warehouse_sk - ws1.ws_warehouse_sk as diff
  FROM web_sales ws1, web_sales ws2
  WHERE ws1.ws_order_number = ws2.ws_order_number
    AND ws1.ws_order_number is not null
    AND ws2.ws_order_number is not null
    AND ws1.ws_warehouse_sk - ws2.ws_warehouse_sk = 0;

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 29, 2024
@netlify
Copy link

netlify bot commented Dec 29, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c9c651a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67728f55e95d5b00080ad29d

@wankunde wankunde changed the title Force output rows for new right batch Force output rows for new right batch in MergeJoin Dec 29, 2024
@wankunde wankunde changed the title Force output rows for new right batch in MergeJoin fix: Force output rows for new right batch in MergeJoin Dec 29, 2024
@JkSelf
Copy link
Collaborator

JkSelf commented Dec 30, 2024

@wankunde Thank you for addressing this issue. It is a known problem and can be resolved in #11771

@wankunde
Copy link
Author

@wankunde Thank you for addressing this issue. It is a known problem and can be resolved in #11771

Hi, @JkSelf , in #11771 only semi join and anti join are fixed with JoinTracker, but inner join with a filter has not been fixed.
In my case, the filter exprSet will eval on the incorrect input DictionaryVector.

@wankunde wankunde changed the title fix: Force output rows for new right batch in MergeJoin fix: Update the filterInput when flatten the right projections Dec 30, 2024
@JkSelf
Copy link
Collaborator

JkSelf commented Jan 14, 2025

@wankunde Thank you for addressing this issue. It is a known problem and can be resolved in #11771

Hi, @JkSelf , in #11771 only semi join and anti join are fixed with JoinTracker, but inner join with a filter has not been fixed. In my case, the filter exprSet will eval on the incorrect input DictionaryVector.

@wankunde Based on our offline discussion, can #11771 resolve your issue?

@wankunde
Copy link
Author

@wankunde Thank you for addressing this issue. It is a known problem and can be resolved in #11771

Hi, @JkSelf , in #11771 only semi join and anti join are fixed with JoinTracker, but inner join with a filter has not been fixed. In my case, the filter exprSet will eval on the incorrect input DictionaryVector.

@wankunde Based on our offline discussion, can #11771 resolve your issue?

Thanks, @JkSelf , #11771 already fixed the Q95 issue in
https://github.com/facebookincubator/velox/pull/11771/files#diff-8279da91a596a4b97a73bc4d9aa635da1baaf254dcdb2f7f8049bd21e7a9551eR416

@stale
Copy link

stale bot commented Apr 15, 2025

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Apr 15, 2025
@stale stale bot closed this Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants