-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: partition by left most cluster key when building merge into filter #13547
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
# Conflicts: # src/query/service/src/interpreters/interpreter_merge_into.rs
# Conflicts: # src/query/service/src/interpreters/interpreter_merge_into.rs
Tested in standalone mode for 3000 times:
|
Tested in cluster mode for 3000 times using https://github.com/JackTan25/test-scripts/tree/distributed_test
|
Ready for review? |
The test is not over, but this pr is merged , we also need to test delete, and the most important cloud performance test. cc @BohuTANG merge too fast. |
@JackTan25 my fault, I put it into the merge queue... Let's wait for the result of stress test with deletion scenarios, if anything goes wrong, revert it |
Can we merge the basic long run(@dantengsky 's) with merge into long run(JackTan25/distributed_test)? Run all of them each time. |
The test result that @SkyFan2002 given in #13547 (comment) is based on the "merged" version (long run script, adapts to "merge-into" scenario). Later, @JackTan25 improved the test script to cover the "deletion" operation of the merge-into statement. Currently, @SkyFan2002 is stress testing this PR again, using the test script of improved version. |
Tested in cluster mode for 3000 times using: https://github.com/JackTan25/test-scripts/tree/test_delete
|
…ter (databendlabs#13547) * chore: support collect statistics of multi join expr in merge into * partition by first join condition * fix filter * update * fix clippy * fix * fix bind * fix and add log * fix stackoverflow
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
For example:
Then left most cluster key of target table,
to_yyyymmdd(insert_time)
, will be used as partition key of source table whencollecting statistics, by:
to_yyyymmdd(target.insert_time)
is projected toto_yyyymmdd(source.insert_time)
, which is inferred from joincondition
t1.insert_time = t2.insert_time
.Rewrite logical plan of source table to query the statistics,like this:
close #13567
This change is