Skip to content

[Scaling Writers] Close Idle Writers#19649

Merged
sopel39 merged 2 commits intotrinodb:masterfrom
gaurav8297:flush_memory_scaling_writers
Dec 13, 2023
Merged

[Scaling Writers] Close Idle Writers#19649
sopel39 merged 2 commits intotrinodb:masterfrom
gaurav8297:flush_memory_scaling_writers

Conversation

@gaurav8297
Copy link
Copy Markdown
Member

@gaurav8297 gaurav8297 commented Nov 7, 2023

Description

In this PR there are following changes:

  1. Closing of idle writers in a single writer thread based on some threshold. This change will help reduce memory footprint in long-running partitioned writes. Thus, we can scale more to increase the write performance.
  2. Remove max scaling from SkewedPartitionRebalancer.

Additional context and related issues

Benchmarks:

Before:
With memory constraints:

trino:lstec> insert into trips_data_big_gaurav_orc_test_3 select * from trips_data_big_gaurav;

Query 20231114_024556_00005_tc9mv, FAILED, 9 nodes
Splits: 28,887 total, 27,454 done (95.04%)
126:08 [21.9B rows, 968GB] [2.89M rows/s, 131MB/s]

Query aborted by user

peak memory used: ~128GB

Without memory constraints:

trino:lstec> set session max_memory_per_partition_writer='1MB';
trino:lstec> insert into trips_data_big_gaurav_orc_test_3 select * from trips_data_big_gaurav;
INSERT: 27676827574 rows

Query 20231114_052350_00009_tc9mv, FINISHED, 9 nodes
Splits: 35,725 total, 35,725 done (100.00%)
39:14 [27.7B rows, 1.2TB] [11.8M rows/s, 532MB/s]

peak memory used: ~700GB

After:

trino:lstec> insert into trips_data_big_gaurav_orc_test_3 select * from trips_data_big_gaurav;
INSERT: 27676827574 rows

Query 20231113_181736_00006_wkhru, FINISHED, 9 nodes
Splits: 35,739 total, 35,739 done (100.00%)
23:15 [27.7B rows, 1.2TB] [19.8M rows/s, 898MB/s]

peak memory used: ~450GB

Another benchmark with limited max memory per node:

trino:insert_demo> set session query_max_memory_per_node = '8GB';
SET SESSION
trino:insert_demo> insert into lineitem_few_skewed_partitions select orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, c
ommitdate, receiptdate, shipinstruct, comment, shipdate, shipmode from tpch_sf1000_orc_part.lineitem;
INSERT: 5999989709 rows

Query 20231113_193710_00012_wkhru, FINISHED, 9 nodes
Splits: 5,402 total, 5,402 done (100.00%)
7:33 [6B rows, 159GB] [13.3M rows/s, 359MB/s]

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 7, 2023
@github-actions github-actions bot added tests:hive iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Nov 7, 2023
@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch 6 times, most recently from 8369ff3 to 118ec2c Compare November 13, 2023 02:19
@gaurav8297 gaurav8297 requested a review from sopel39 November 13, 2023 02:19
@gaurav8297 gaurav8297 marked this pull request as ready for review November 13, 2023 02:19
@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch 5 times, most recently from 6623979 to 4db8545 Compare November 20, 2023 14:23
@gaurav8297 gaurav8297 requested a review from sopel39 November 20, 2023 14:23
@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch 4 times, most recently from 74a19dd to 6785d7e Compare November 22, 2023 03:07
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some commits lgtm. Consider extracting it as separate PRs

@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch 2 times, most recently from 97e3530 to 0786ee0 Compare November 30, 2023 16:20
@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch 8 times, most recently from c6891a8 to 6b26c28 Compare December 7, 2023 09:44
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Dec 7, 2023

cc @findepi @alexjo2144

@findepi
Copy link
Copy Markdown
Member

findepi commented Dec 8, 2023

a high level question at this point

Closing of idle writers in a single writer thread based on some threshold. This change will help reduce memory footprint in long-running partitioned writes. Thus, we can scale more to increase the write performance.

This is always a good idea to reclaim unused resources!
What about peak memory needs? Does this PR change number of open partitions per writing node?
how does this relate to iceberg.max-partitions-per-writer = 100?

cc @sopel39 @gaurav8297 @alexjo2144

@gaurav8297
Copy link
Copy Markdown
Member Author

gaurav8297 commented Dec 10, 2023

What about peak memory needs?

Peak memory usage will reduce in the case of idle writers since we are closing the writers and reclaiming their resources early on.

Does this PR change number of open partitions per writing node? how does this relate to iceberg.max-partitions-per-writer = 100?

We do not change anything wrt the maximum number of open partitions (or writers). Currently, when we close the idle writer, we set its index to null in HivePageSink.writers. However, the check for the number of open partitions consider a null writer as an open writer (code). Maybe it's okay for now. But we can improve this by only considering non-null writers during this check.

cc @sopel39 @findepi

@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch 3 times, most recently from 4bdc124 to 3a4254e Compare December 10, 2023 04:43
@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch from 3a4254e to 61f7243 Compare December 12, 2023 12:23
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments

This change will help in reducing memory
footprint in long-running partitioned
writes. Thus, we can scale more to increase
the write performance.
@gaurav8297 gaurav8297 force-pushed the flush_memory_scaling_writers branch from 61f7243 to 52c0c96 Compare December 13, 2023 12:29
@sopel39 sopel39 merged commit 7e708ba into trinodb:master Dec 13, 2023
@github-actions github-actions bot added this to the 435 milestone Dec 13, 2023
@sopel39 sopel39 mentioned this pull request Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

3 participants