Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Enhancement] Optimize mem usage of partial update (#14187)
We have partially optimized the primary key model for large import memory usage in this pr(#12068), but the enhancement doesn't work if the load is partial update. And we also need a lot of memory if you do a large number of partial updates in one transaction. So this pr will try to reduce the memory usage of large partial update. There are two reasons for large memory usage during partial column updates: 1. The first one is that updating a few columns may increase the segment file size and we need to load all data of segment into memory which will cost a lot of memory. 2. The second one is that doing partial update requires reading data from other columns into memory, which can take up a lot of memory if the table has many columns. In order to reduce memory usage, the following two adjustments are made: 1. The first one is to estimate the length of the updated partial columns in each row when importing data, thus reducing the size of the segment file 2. The second one is not to load all the data of the rowset into memory at once, but to load them one by one according to the segment. In my test env, one BE with two HDD, using StreamLoad, create a table with 65 column, 20 buckets: ``` CREATE TABLE `partial_test` ( `col_1` bigint(20) NOT NULL COMMENT "", `col_2` bigint(20) NOT NULL COMMENT "", `col_3` bigint(20) NOT NULL COMMENT "", `col_4` varchar(150) NOT NULL COMMENT "", `col_5` varchar(150) NOT NULL COMMENT "", `col_6` varchar(150) NULL COMMENT "", `col_7` varchar(150) NULL COMMENT "", `col_8` varchar(1024) NULL COMMENT "", `col_9` varchar(120) NULL COMMENT "", `col_10` varchar(60) NULL COMMENT "", `col_11` varchar(10) NULL COMMENT "", `col_12` varchar(120) NULL COMMENT "", `col_13` varchar(524) NULL COMMENT "", `col_14` varchar(100) NULL COMMENT "", `col_15` varchar(150) NULL COMMENT "", `col_16` varchar(150) NULL COMMENT "", `col_17` varchar(150) NULL COMMENT "", `col_18` bigint(20) NULL COMMENT "", `col_19` varchar(500) NULL COMMENT "", `col_20` varchar(150) NULL COMMENT "", `col_21` tinyint(4) NULL COMMENT "", `col_22` int(11) NULL COMMENT "", `col_23` varchar(524) NULL COMMENT "", `col_24` bigint(20) NULL COMMENT "", `col_25` bigint(20) NULL COMMENT "", `col_26` varchar(8) NULL COMMENT "", `col_27` decimal64(18, 6) NULL COMMENT "", `col_28` decimal64(18, 6) NULL COMMENT "", `col_29` decimal64(18, 6) NULL COMMENT "", `col_30` decimal64(18, 6) NULL COMMENT "", `col_31` decimal64(18, 6) NULL COMMENT "", `col_32` decimal64(18, 6) NULL COMMENT "", `col_33` bigint(20) NULL COMMENT "", `col_34` decimal64(18, 6) NULL COMMENT "", `col_35` varchar(8) NULL COMMENT "", `col_36` decimal64(18, 6) NULL COMMENT "", `col_37` decimal64(18, 6) NULL COMMENT "", `col_38` varchar(8) NULL COMMENT "", `col_39` decimal64(18, 6) NULL COMMENT "", `col_40` decimal64(18, 6) NULL COMMENT "", `col_41` varchar(8) NULL COMMENT "", `col_42` decimal64(18, 6) NULL COMMENT "", `col_43` decimal64(18, 6) NULL COMMENT "", `col_44` decimal64(18, 6) NULL COMMENT "", `col_45` decimal64(18, 6) NULL COMMENT "", `col_46` int(11) NULL COMMENT "", `col_47` int(11) NOT NULL COMMENT "", `col_48` tinyint(4) NULL COMMENT "", `col_49` varchar(200) NULL COMMENT "", `col_50` tinyint(4) NULL COMMENT "", `col_51` varchar(200) NULL COMMENT "", `col_52` varchar(10) NULL COMMENT "", `col_53` tinyint(4) NULL COMMENT "", `col_54` tinyint(4) NULL COMMENT "", `col_55` varchar(150) NULL COMMENT "", `col_56` varchar(150) NULL COMMENT "", `col_57` varchar(500) NULL COMMENT "", `col_58` tinyint(4) NULL COMMENT "", `col_59` varchar(100) NULL COMMENT "", `col_60` varchar(150) NULL COMMENT "", `col_61` varchar(150) NULL COMMENT "", `col_62` varchar(150) NULL COMMENT "", `col_63` varchar(150) NULL COMMENT "", `col_64` datetime NULL COMMENT "", `col_65` datetime NULL COMMENT "" ) ENGINE=OLAP PRIMARY KEY(`col_1`, `col_2`, `col_3`) COMMENT "OLAP" DISTRIBUTED BY HASH(`col_1`, `col_2`) BUCKETS 20 PROPERTIES ( "replication_num" = "1", "in_memory" = "false", "storage_format" = "V2", "enable_persistent_index" = "true", "compression" = "LZ4" ); ``` |PrimaryKey Length| RowNum|BucketNum| Column Num| Partial ColumnNum | PartialUpdate RowsNum| Load time(s)| Apply time(ms)| Peak UpdateMemory usage | Note | |---------------------|----------|------------|----------------|--------------------|------------------------------|----|-----|-----|----| |12 Bytes| 300M | 20 | 65 | 5 | 100M | 135261 | 106693 | 78.9G | branch-main | |12 Bytes| 300M | 20 | 65 | 5 | 100M | 166449| 149870 | 10.3G | branch-opt | |12 Bytes| 300M | 20 | 65 | 5 | 100K | 2078 | 529 | 60.1M | branch-main | |12 Bytes| 300M | 20 | 65 | 5 | 100K | 2211 | 541 | 60.2M | branch-opt | (cherry picked from commit 545b7be) # Conflicts: # be/src/storage/memtable.h # be/src/storage/rowset_update_state.cpp # be/src/storage/rowset_update_state.h # be/src/storage/tablet_updates.cpp
- Loading branch information