-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Enhancement] Optimize memory usage of primary key table large load (#…
…12068) (#13744) Currently, RowsetUpdateState::load will preload all segments primary keys into memory, if the load(rowset) is very large, it will use a lot of memory during the commit or apply phrase. For large load(rowset), we don't preload all segment's primary keys but process segment by segment, which can reduce the memory usage during apply. It is important to note that the limitation is a soft limit because we can't tolerate the failure to apply, so memory usage may still exceed the limitation. In my test env, one BE with two HDD, using Broker load, create a table with persistent index: use tpcds to test create table sql, using broker load: CREATE TABLE `store_sales` ( `ss_item_sk` bigint(20) NOT NULL COMMENT "", `ss_ticket_number` bigint(20) NOT NULL COMMENT "", `ss_sold_date_sk` bigint(20) NULL COMMENT "", `ss_sold_time_sk` bigint(20) NULL COMMENT "", `ss_customer_sk` bigint(20) NULL COMMENT "", `ss_cdemo_sk` bigint(20) NULL COMMENT "", `ss_hdemo_sk` bigint(20) NULL COMMENT "", `ss_addr_sk` bigint(20) NULL COMMENT "", `ss_store_sk` bigint(20) NULL COMMENT "", `ss_promo_sk` bigint(20) NULL COMMENT "", `ss_quantity` bigint(20) NULL COMMENT "", `ss_wholesale_cost` decimal64(7, 2) NULL COMMENT "", `ss_list_price` decimal64(7, 2) NULL COMMENT "", `ss_sales_price` decimal64(7, 2) NULL COMMENT "", `ss_ext_discount_amt` decimal64(7, 2) NULL COMMENT "", `ss_ext_sales_price` decimal64(7, 2) NULL COMMENT "", `ss_ext_wholesale_cost` decimal64(7, 2) NULL COMMENT "", `ss_ext_list_price` decimal64(7, 2) NULL COMMENT "", `ss_ext_tax` decimal64(7, 2) NULL COMMENT "", `ss_coupon_amt` decimal64(7, 2) NULL COMMENT "", `ss_net_paid` decimal64(7, 2) NULL COMMENT "", `ss_net_paid_inc_tax` decimal64(7, 2) NULL COMMENT "", `ss_net_profit` decimal64(7, 2) NULL COMMENT "" ) ENGINE=OLAP PRIMARY KEY(`ss_item_sk`, `ss_ticket_number`) COMMENT "OLAP" DISTRIBUTED BY HASH(`ss_item_sk`, `ss_ticket_number`) BUCKETS 2 PROPERTIES ( "replication_num" = "1", "in_memory" = "false", "storage_format" = "DEFAULT", "enable_persistent_index" = "true", "compression" = "LZ4" ); PrimaryKey Length RowNum BucketNum Load time(s) Apply time(ms) Peak Memory usage(GB) Note 16 Bytes 864001869 2 7643 355200 25.03 branch-opt 16 Bytes 864001869 2 7591 348465 46.45 branch-main 16 Bytes 864001869 100 7194 32705 25.11 branch-opt 16 Bytes 864001869 100 7104 30705 43.14 branch-main Note there are still some scenarios we don't resolve in this pr: In the partial update, the read column data maybe very large and we don't resolve it in this pr We still need to load all primary key into L0 of persistent index first which maybe cause OOM
- Loading branch information
Showing
6 changed files
with
249 additions
and
89 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.