-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support distributed merge_into #13151
feat: support distributed merge_into #13151
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
6e6c03d
to
eebe057
Compare
eebe057
to
b2c4e34
Compare
This Pr is not a perfect implementation of distributed merge into. We should do rowids hash shuffle and make apply_delete_and_update distributed. But this will cost more time, so that's use this version firstly, let me do that in the next pr. |
4f0bed4
to
e4f7450
Compare
…/databend into distributed_merge_into
Docker Image for PR
|
…/databend into distributed_merge_into
Docker Image for PR
|
Thanks for Winter @zhang2014 , give me good advice. |
Docker Image for PR
|
distributed mode. test>> DATABEND_DSN="databend://root:@localhost:8118/?sslmode=disable&enable_experimental_merge_into=1&enable_distributed_merge_into=1" cargo run -r -- 3000 2>&1 | tee distributed_rr.log
[2023-10-30T15:53:14Z INFO test_replace_recluster] executing table maintenance batch : 19638
[2023-10-30T15:53:14Z INFO test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO test_replace_recluster] Ok. merge-into batch : 2999
[2023-10-30T15:53:14Z INFO test_replace_recluster] Ok. maintenance batch : 19638
[2023-10-30T15:53:14Z INFO test_replace_recluster] ==========================
[2023-10-30T15:53:14Z INFO test_replace_recluster] ====verify table state====
[2023-10-30T15:53:14Z INFO test_replace_recluster] ==========================
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster] number of successfully executed merge-into statements : 3000
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster] CHECK: value of successfully executed merge-into statements
[2023-10-30T15:53:14Z INFO test_replace_recluster] CHECK: value of successfully executed merge-into statements: client 3000000, server 3000000
[2023-10-30T15:53:14Z INFO test_replace_recluster] CHECK: distinct ids: client 3000, server 3000
[2023-10-30T15:53:14Z INFO test_replace_recluster] CHECK: value of correlated column
[2023-10-30T15:53:14Z INFO test_replace_recluster] CHECK: full table scanning
[2023-10-30T15:53:14Z INFO test_replace_recluster] ===========================
[2023-10-30T15:53:14Z INFO test_replace_recluster] ====== PASSED ====
[2023-10-30T15:53:14Z INFO test_replace_recluster] ===========================
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster] ========METRICS============
[2023-10-30T15:53:14Z INFO test_replace_recluster] fuse_commit_mutation_unresolvable_conflict_total : 2328.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 17238874464.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 33190666751.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 29753.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 33448.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":2732.0},{"less_than":50.0,"count":2732.0},{"less_than":100.0,"count":2732.0},{"less_than":250.0,"count":2732.0},{"less_than":500.0,"count":2732.0},{"less_than":1000.0,"count":2732.0},{"less_than":2500.0,"count":2732.0},{"less_than":5000.0,"count":2732.0},{"less_than":10000.0,"count":2732.0},{"less_than":20000.0,"count":2732.0},{"less_than":30000.0,"count":2732.0},{"less_than":60000.0,"count":2732.0},{"less_than":300000.0,"count":2732.0},{"less_than":600000.0,"count":2732.0},{"less_than":1800000.0,"count":2732.0},{"less_than":null,"count":2732.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":5940.0},{"less_than":50.0,"count":5940.0},{"less_than":100.0,"count":5940.0},{"less_than":250.0,"count":5940.0},{"less_than":500.0,"count":5940.0},{"less_than":1000.0,"count":5940.0},{"less_than":2500.0,"count":5940.0},{"less_than":5000.0,"count":5940.0},{"less_than":10000.0,"count":5940.0},{"less_than":20000.0,"count":5940.0},{"less_than":30000.0,"count":5940.0},{"less_than":60000.0,"count":5940.0},{"less_than":300000.0,"count":5940.0},{"less_than":600000.0,"count":5940.0},{"less_than":1800000.0,"count":5940.0},{"less_than":null,"count":5940.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 2732.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 5940.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_append_blocks_counter_total : 5732.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_append_blocks_counter_total : 5940.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 3510381.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 773619.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":3001.0},{"less_than":50.0,"count":3006.0},{"less_than":100.0,"count":3012.0},{"less_than":250.0,"count":3018.0},{"less_than":500.0,"count":3032.0},{"less_than":1000.0,"count":3066.0},{"less_than":2500.0,"count":3419.0},{"less_than":5000.0,"count":3427.0},{"less_than":10000.0,"count":3427.0},{"less_than":20000.0,"count":3427.0},{"less_than":30000.0,"count":3427.0},{"less_than":60000.0,"count":3427.0},{"less_than":300000.0,"count":3427.0},{"less_than":600000.0,"count":3427.0},{"less_than":1800000.0,"count":3427.0},{"less_than":null,"count":3427.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":3028.0},{"less_than":50.0,"count":3092.0},{"less_than":100.0,"count":3108.0},{"less_than":250.0,"count":3114.0},{"less_than":500.0,"count":3115.0},{"less_than":1000.0,"count":3123.0},{"less_than":2500.0,"count":3428.0},{"less_than":5000.0,"count":3428.0},{"less_than":10000.0,"count":3428.0},{"less_than":20000.0,"count":3428.0},{"less_than":30000.0,"count":3428.0},{"less_than":60000.0,"count":3428.0},{"less_than":300000.0,"count":3428.0},{"less_than":600000.0,"count":3428.0},{"less_than":1800000.0,"count":3428.0},{"less_than":null,"count":3428.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_count : 3427.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_count : 3428.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_sum : 542425.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_sum : 440406.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 7.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 58.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 7000.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 58000.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":2732.0},{"less_than":50.0,"count":2732.0},{"less_than":100.0,"count":2732.0},{"less_than":250.0,"count":2732.0},{"less_than":500.0,"count":2732.0},{"less_than":1000.0,"count":2732.0},{"less_than":2500.0,"count":2732.0},{"less_than":5000.0,"count":2732.0},{"less_than":10000.0,"count":2732.0},{"less_than":20000.0,"count":2732.0},{"less_than":30000.0,"count":2732.0},{"less_than":60000.0,"count":2732.0},{"less_than":300000.0,"count":2732.0},{"less_than":600000.0,"count":2732.0},{"less_than":1800000.0,"count":2732.0},{"less_than":null,"count":2732.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":5940.0},{"less_than":50.0,"count":5940.0},{"less_than":100.0,"count":5940.0},{"less_than":250.0,"count":5940.0},{"less_than":500.0,"count":5940.0},{"less_than":1000.0,"count":5940.0},{"less_than":2500.0,"count":5940.0},{"less_than":5000.0,"count":5940.0},{"less_than":10000.0,"count":5940.0},{"less_than":20000.0,"count":5940.0},{"less_than":30000.0,"count":5940.0},{"less_than":60000.0,"count":5940.0},{"less_than":300000.0,"count":5940.0},{"less_than":600000.0,"count":5940.0},{"less_than":1800000.0,"count":5940.0},{"less_than":null,"count":5940.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 2732.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 5940.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 630.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 890.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_rows_total : 510381.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_matched_rows_total : 773619.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_not_matched_operation_milliseconds : [{"less_than":10.0,"count":3000.0},{"less_than":50.0,"count":3000.0},{"less_than":100.0,"count":3000.0},{"less_than":250.0,"count":3000.0},{"less_than":500.0,"count":3000.0},{"less_than":1000.0,"count":3000.0},{"less_than":2500.0,"count":3000.0},{"less_than":5000.0,"count":3000.0},{"less_than":10000.0,"count":3000.0},{"less_than":20000.0,"count":3000.0},{"less_than":30000.0,"count":3000.0},{"less_than":60000.0,"count":3000.0},{"less_than":300000.0,"count":3000.0},{"less_than":600000.0,"count":3000.0},{"less_than":1800000.0,"count":3000.0},{"less_than":null,"count":3000.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_count : 3000.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_replace_blocks_counter_total : 566.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_replace_blocks_counter_total : 1265.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 153647863.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 327488439.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":6160.0},{"less_than":50.0,"count":6160.0},{"less_than":100.0,"count":6160.0},{"less_than":250.0,"count":6160.0},{"less_than":500.0,"count":6160.0},{"less_than":1000.0,"count":6160.0},{"less_than":2500.0,"count":6160.0},{"less_than":5000.0,"count":6160.0},{"less_than":10000.0,"count":6160.0},{"less_than":20000.0,"count":6160.0},{"less_than":30000.0,"count":6160.0},{"less_than":60000.0,"count":6160.0},{"less_than":300000.0,"count":6160.0},{"less_than":600000.0,"count":6160.0},{"less_than":1800000.0,"count":6160.0},{"less_than":null,"count":6160.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":9368.0},{"less_than":50.0,"count":9368.0},{"less_than":100.0,"count":9368.0},{"less_than":250.0,"count":9368.0},{"less_than":500.0,"count":9368.0},{"less_than":1000.0,"count":9368.0},{"less_than":2500.0,"count":9368.0},{"less_than":5000.0,"count":9368.0},{"less_than":10000.0,"count":9368.0},{"less_than":20000.0,"count":9368.0},{"less_than":30000.0,"count":9368.0},{"less_than":60000.0,"count":9368.0},{"less_than":300000.0,"count":9368.0},{"less_than":600000.0,"count":9368.0},{"less_than":1800000.0,"count":9368.0},{"less_than":null,"count":9368.0}]
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_split_milliseconds_count : 6160.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_split_milliseconds_count : 9368.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_split_milliseconds_sum : 0.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_split_milliseconds_sum : 1.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_unmatched_rows_total : 3510381.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] query_merge_into_unmatched_rows_total : 3773619.0
[2023-10-30T15:53:14Z INFO test_replace_recluster] ===========================
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster]
[2023-10-30T15:53:14Z INFO test_replace_recluster] ======CLUSTERING INFO======
[2023-10-30T15:53:14Z INFO test_replace_recluster] cluster_key : (to_yyyymmdd(insert_time), id)
[2023-10-30T15:53:14Z INFO test_replace_recluster] block_count: 13
[2023-10-30T15:53:14Z INFO test_replace_recluster] constant_block_count: 0
[2023-10-30T15:53:14Z INFO test_replace_recluster] unclustered_block_count: 0
[2023-10-30T15:53:14Z INFO test_replace_recluster] average_overlaps: 8.7692
[2023-10-30T15:53:14Z INFO test_replace_recluster] average_depth: 7
[2023-10-30T15:53:14Z INFO test_replace_recluster] block_depth_histogram: {"00007":13}
[2023-10-30T15:53:14Z INFO test_replace_recluster] =========================== correctess passed,but get once "panicked at 'assertion failed: block_idx < segment_info.blocks.len()', src/query/storages/fuse/src/operations/merge_into/mutator/ |
standalone mode: test>> DATABEND_DSN="databend://root:@localhost:8118/?sslmode=disable&enable_experimental_merge_into=1&enable_distributed_merge_into=0" cargo run -r -- 3000 2>&1 | tee rr.log
[2023-10-30T18:05:24Z INFO test_replace_recluster] ==========================
[2023-10-30T18:05:24Z INFO test_replace_recluster] ====verify table state====
[2023-10-30T18:05:24Z INFO test_replace_recluster] ==========================
[2023-10-30T18:05:24Z INFO test_replace_recluster]
[2023-10-30T18:05:24Z INFO test_replace_recluster]
[2023-10-30T18:05:24Z INFO test_replace_recluster] number of successfully executed merge-into statements : 3000
[2023-10-30T18:05:24Z INFO test_replace_recluster]
[2023-10-30T18:05:24Z INFO test_replace_recluster]
[2023-10-30T18:05:24Z INFO test_replace_recluster] CHECK: value of successfully executed merge-into statements
[2023-10-30T18:05:24Z INFO test_replace_recluster] CHECK: value of successfully executed merge-into statements: client 3000000, server 3000000
[2023-10-30T18:05:25Z INFO test_replace_recluster] CHECK: distinct ids: client 3000, server 3000
[2023-10-30T18:05:25Z INFO test_replace_recluster] CHECK: value of correlated column
[2023-10-30T18:05:25Z INFO test_replace_recluster] CHECK: full table scanning
[2023-10-30T18:05:25Z INFO test_replace_recluster] ===========================
[2023-10-30T18:05:25Z INFO test_replace_recluster] ====== PASSED ====
[2023-10-30T18:05:25Z INFO test_replace_recluster] ===========================
[2023-10-30T18:05:25Z INFO test_replace_recluster]
[2023-10-30T18:05:25Z INFO test_replace_recluster]
[2023-10-30T18:05:25Z INFO test_replace_recluster] ========METRICS============
[2023-10-30T18:05:25Z INFO test_replace_recluster] fuse_commit_mutation_unresolvable_conflict_total : 1528.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] fuse_remote_io_read_bytes_after_merged_total : 50356283966.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] fuse_remote_io_seeks_after_merged_total : 61773.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds : [{"less_than":10.0,"count":8250.0},{"less_than":50.0,"count":8250.0},{"less_than":100.0,"count":8250.0},{"less_than":250.0,"count":8250.0},{"less_than":500.0,"count":8250.0},{"less_than":1000.0,"count":8250.0},{"less_than":2500.0,"count":8250.0},{"less_than":5000.0,"count":8250.0},{"less_than":10000.0,"count":8250.0},{"less_than":20000.0,"count":8250.0},{"less_than":30000.0,"count":8250.0},{"less_than":60000.0,"count":8250.0},{"less_than":300000.0,"count":8250.0},{"less_than":600000.0,"count":8250.0},{"less_than":1800000.0,"count":8250.0},{"less_than":null,"count":8250.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_count : 8250.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_accumulate_milliseconds_sum : 14.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_append_blocks_counter_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_append_blocks_counter_total : 11250.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_append_blocks_rows_counter_total : 4284000.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_apply_milliseconds : [{"less_than":10.0,"count":3000.0},{"less_than":50.0,"count":3001.0},{"less_than":100.0,"count":3001.0},{"less_than":250.0,"count":3003.0},{"less_than":500.0,"count":3010.0},{"less_than":1000.0,"count":3016.0},{"less_than":2500.0,"count":3160.0},{"less_than":5000.0,"count":3202.0},{"less_than":10000.0,"count":3428.0},{"less_than":20000.0,"count":3428.0},{"less_than":30000.0,"count":3428.0},{"less_than":60000.0,"count":3428.0},{"less_than":300000.0,"count":3428.0},{"less_than":600000.0,"count":3428.0},{"less_than":1800000.0,"count":3428.0},{"less_than":null,"count":3428.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_count : 3428.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_apply_milliseconds_sum : 1902364.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_deleted_blocks_counter_total : 56.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_deleted_blocks_rows_counter_total : 56000.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds : [{"less_than":10.0,"count":8223.0},{"less_than":50.0,"count":8250.0},{"less_than":100.0,"count":8250.0},{"less_than":250.0,"count":8250.0},{"less_than":500.0,"count":8250.0},{"less_than":1000.0,"count":8250.0},{"less_than":2500.0,"count":8250.0},{"less_than":5000.0,"count":8250.0},{"less_than":10000.0,"count":8250.0},{"less_than":20000.0,"count":8250.0},{"less_than":30000.0,"count":8250.0},{"less_than":60000.0,"count":8250.0},{"less_than":300000.0,"count":8250.0},{"less_than":600000.0,"count":8250.0},{"less_than":1800000.0,"count":8250.0},{"less_than":null,"count":8250.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_count : 8250.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_operation_milliseconds_sum : 8983.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_rows_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_matched_rows_total : 1284000.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_not_matched_operation_milliseconds : [{"less_than":10.0,"count":3000.0},{"less_than":50.0,"count":3000.0},{"less_than":100.0,"count":3000.0},{"less_than":250.0,"count":3000.0},{"less_than":500.0,"count":3000.0},{"less_than":1000.0,"count":3000.0},{"less_than":2500.0,"count":3000.0},{"less_than":5000.0,"count":3000.0},{"less_than":10000.0,"count":3000.0},{"less_than":20000.0,"count":3000.0},{"less_than":30000.0,"count":3000.0},{"less_than":60000.0,"count":3000.0},{"less_than":300000.0,"count":3000.0},{"less_than":600000.0,"count":3000.0},{"less_than":1800000.0,"count":3000.0},{"less_than":null,"count":3000.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_count : 3000.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_not_matched_operation_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_replace_blocks_counter_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_replace_blocks_counter_total : 1683.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_replace_blocks_rows_counter_total : 470291517.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":0.0},{"less_than":50.0,"count":0.0},{"less_than":100.0,"count":0.0},{"less_than":250.0,"count":0.0},{"less_than":500.0,"count":0.0},{"less_than":1000.0,"count":0.0},{"less_than":2500.0,"count":0.0},{"less_than":5000.0,"count":0.0},{"less_than":10000.0,"count":0.0},{"less_than":20000.0,"count":0.0},{"less_than":30000.0,"count":0.0},{"less_than":60000.0,"count":0.0},{"less_than":300000.0,"count":0.0},{"less_than":600000.0,"count":0.0},{"less_than":1800000.0,"count":0.0},{"less_than":null,"count":0.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_split_milliseconds : [{"less_than":10.0,"count":11249.0},{"less_than":50.0,"count":11250.0},{"less_than":100.0,"count":11250.0},{"less_than":250.0,"count":11250.0},{"less_than":500.0,"count":11250.0},{"less_than":1000.0,"count":11250.0},{"less_than":2500.0,"count":11250.0},{"less_than":5000.0,"count":11250.0},{"less_than":10000.0,"count":11250.0},{"less_than":20000.0,"count":11250.0},{"less_than":30000.0,"count":11250.0},{"less_than":60000.0,"count":11250.0},{"less_than":300000.0,"count":11250.0},{"less_than":600000.0,"count":11250.0},{"less_than":1800000.0,"count":11250.0},{"less_than":null,"count":11250.0}]
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_split_milliseconds_count : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_split_milliseconds_count : 11250.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_split_milliseconds_sum : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_split_milliseconds_sum : 30.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_unmatched_rows_total : 3000000.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] query_merge_into_unmatched_rows_total : 0.0
[2023-10-30T18:05:25Z INFO test_replace_recluster] ===========================
[2023-10-30T18:05:25Z INFO test_replace_recluster]
[2023-10-30T18:05:25Z INFO test_replace_recluster]
[2023-10-30T18:05:25Z INFO test_replace_recluster] ======CLUSTERING INFO======
[2023-10-30T18:05:25Z INFO test_replace_recluster] cluster_key : (to_yyyymmdd(insert_time), id)
[2023-10-30T18:05:25Z INFO test_replace_recluster] block_count: 12
[2023-10-30T18:05:25Z INFO test_replace_recluster] constant_block_count: 0
[2023-10-30T18:05:25Z INFO test_replace_recluster] unclustered_block_count: 0
[2023-10-30T18:05:25Z INFO test_replace_recluster] average_overlaps: 5
[2023-10-30T18:05:25Z INFO test_replace_recluster] average_depth: 4
[2023-10-30T18:05:25Z INFO test_replace_recluster] block_depth_histogram: {"00004":12}
[2023-10-30T18:05:25Z INFO test_replace_recluster] =========================== passed 3000 times. |
cloud test test> select count(*) from target_table;
SELECT
count(*)
FROM
target_table
-[ RECORD 1 ]-----------------------------------
count(*): 1200575805
1 row read in 0.757 sec. Processed 1 row, 1 B (1.32 row/s, 1 B/s)
test> select count(*) from source_table;
SELECT
count(*)
FROM
source_table
-[ RECORD 1 ]-----------------------------------
count(*): 500000
1 row read in 0.909 sec. Processed 1 row, 1 B (1.1 row/s, 1 B/s)
test> set enable_distributed_merge_into = 0;
SET
enable_distributed_merge_into = 0
0 row read in 0.622 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)
test> merge into target_table as t1 using (select * from source_table) as t2 on t1.l_partkey = t2.l_partkey and t1.l_orderkey = t2.l_orderkey and t1.l_suppkey = t2.l_suppkey and t1.l_linenumber = t2.l_linenumber when matched then update * when not matched then insert *;
MERGE INTO target_table AS t1 USING (
SELECT
*
FROM
source_table
) AS t2 ON t1.l_partkey = t2.l_partkey
AND t1.l_orderkey = t2.l_orderkey
AND t1.l_suppkey = t2.l_suppkey
AND t1.l_linenumber = t2.l_linenumber
WHEN matched THEN
UPDATE
*
WHEN NOT matched THEN
INSERT
*
0 row read in 76.038 sec. Processed 1.2 billion row, 217.76 GiB (15.8 million row/s, 2.86 GiB/s)
test> set enable_distributed_merge_into = 1;
SET
enable_distributed_merge_into = 1
0 row read in 0.663 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)
test> merge into target_table as t1 using (select * from source_table) as t2 on t1.l_partkey = t2.l_partkey and t1.l_orderkey = t2.l_orderkey and t1.l_suppkey = t2.l_suppkey and t1.l_linenumber = t2.l_linenumber when matched then update * when not matched then insert *;
MERGE INTO target_table AS t1 USING (
SELECT
*
FROM
source_table
) AS t2 ON t1.l_partkey = t2.l_partkey
AND t1.l_orderkey = t2.l_orderkey
AND t1.l_suppkey = t2.l_suppkey
AND t1.l_linenumber = t2.l_linenumber
WHEN matched THEN
UPDATE
*
WHEN NOT matched THEN
INSERT
*
0 row read in 40.058 sec. Processed 1.2 billion row, 217.76 GiB (29.98 million row/s, 5.44 GiB/s) |
...storages/fuse/src/operations/merge_into/processors/processor_merge_into_matched_and_split.rs
Show resolved
Hide resolved
src/query/storages/fuse/src/operations/merge_into/processors/transform_add_rownumber_column.rs
Show resolved
Hide resolved
…/databend into distributed_merge_into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, trivial comments could be addressed in another pr to improve
src/query/service/src/pipelines/processors/transforms/processor_deduplicate_row_number.rs
Show resolved
Hide resolved
src/query/service/src/pipelines/processors/transforms/processor_deduplicate_row_number.rs
Show resolved
Hide resolved
src/query/service/src/pipelines/processors/transforms/processor_deduplicate_row_number.rs
Show resolved
Hide resolved
...ry/service/src/pipelines/processors/transforms/processor_extract_hash_table_by_row_number.rs
Show resolved
Hide resolved
src/query/storages/fuse/src/operations/merge_into/processors/transform_add_rownumber_column.rs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, trivial comments could be addressed in another pr to improve
* add settings * right join for merge into first * add distribution optimization for merge into join * split merge into plan * fix update identify error * finish distibuted baisc codes * fix typo * uniform row_kind and mutation_log * fix MixRowKindAndLog serialize and deserialize * add tests * fix check * fix check * fix check * fix test * fix test * fix * remove memory size limit * optmizie merge source and add row_number processor * fix delete bug * add row number plan * fix row number * refactor merge into pipeline * split row_number and log, try to get hash table source data * finish distributed codes, need to get data from hashtable * finish not macthed append data * fix filter * fix filter * fix distributed bugs,many bugs, need to support insert * fix bugs * fix check and clean codes * fix check * add more tests * fix flaky * fix test result * fix order * clean codes * remove local builder branch * refactor logic * clean codes
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Summary about this PR
1.checkout to right join
2.implement distributed merge into
old shuffle join logic execution stream is below:
new physical plan design:
Feature: Merge Into Optimizations #12595
This change is