Skip to content

Improve memory usage in TableFinishOperator#16036

Merged
arhimondr merged 3 commits intoprestodb:masterfrom
viczhang861:track_stats_memory
May 10, 2021
Merged

Improve memory usage in TableFinishOperator#16036
arhimondr merged 3 commits intoprestodb:masterfrom
viczhang861:track_stats_memory

Conversation

@viczhang861
Copy link
Contributor

@viczhang861 viczhang861 commented May 3, 2021

Fixes #16022

  • Commit Track column statistics only in recoverable mode will improve Presto on Spark driver memory as POS uses TASK_COMMIT strategy.

Test plan

  • Built a custom package and deployed to a real cluster with shadowed queries.
  • Enable large batch mode, set enable_stats_collection_for_temporary_table=true, hash_partition_count=16384 to increase memory used for stats collection. Running shadow query for 24 hr, no full GC found in coordinator.
== RELEASE NOTES ==

General Changes
* Track system memory used by column statistics in TableFinishOperator.

Data in statistics pages is final for any completed
task in non-recoverable mode.
The total memory for all statistics page could be
large and cause full GC issue. Removing page that
will not be accessed.
Column statistics and PartitionUpdate could be
memory expensive, for example, in grouped execution.
@viczhang861 viczhang861 changed the title Improve memory usage in TableFinishOperator [WIP]Improve memory usage in TableFinishOperator May 3, 2021
@viczhang861 viczhang861 changed the title [WIP]Improve memory usage in TableFinishOperator Improve memory usage in TableFinishOperator May 3, 2021
@viczhang861 viczhang861 requested review from a team and mayankgarg1990 May 3, 2021 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Coordinator full GC due to untracked memory in TableFinishOperator

2 participants