Skip to content

Conversation

@wendigo
Copy link
Contributor

@wendigo wendigo commented Jun 14, 2025

This prevents dictionary blocks to expand to a single huge segment that most clients won't be able to read due to its size.

This now partitions these kind of pages to smaller ones and allows the operator to output multiple spooled/inlined pages in a single getOutput call.

Fixes #25999

Description

Additional context and related issues

Release notes

## Client protocol
* Partition large pages to avoid OOM while serializing data in the spooling protocol. ({issue}`25999`)

@wendigo wendigo requested a review from Copilot June 14, 2025 12:05
@cla-bot cla-bot bot added the cla-signed label Jun 14, 2025

This comment was marked as outdated.

@wendigo wendigo force-pushed the serafin/partition-spooling-pages branch from d687e96 to c7b6bd0 Compare June 14, 2025 14:08
@wendigo wendigo requested review from Copilot and losipiuk June 14, 2025 14:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR partitions large spooled metadata pages into smaller segments to prevent oversized segments that clients may have difficulty reading. Key changes include updating the metadata serialization/deserialization API to work with lists of blocks, revising tests accordingly, and updating the operator logic to handle both spooled and inlined pages with segment counters.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
core/trino-main/src/test/java/io/trino/server/protocol/spooling/TestSpooledMetadataBlockSerde.java Updated tests to compare list-based metadata serialization/deserialization.
core/trino-main/src/test/java/io/trino/operator/TestSpoolingPagePartitioner.java Added tests for partitioning pages based on size thresholds.
core/trino-main/src/sql/planner/LocalExecutionPlanner.java Modified validation to support spooled metadata pages with multiple positions.
core/trino-main/src/server/protocol/spooling/SpoolingQueryDataProducer.java Updated to iterate over multiple metadata blocks from deserialization.
core/trino-main/src/server/protocol/spooling/SpooledMetadataBlockSerde.java Revised serialize/deserialize API to support multiple metadata blocks.
core/trino-main/src/server/protocol/spooling/SpooledMetadataBlock.java Removed the default serialize method to rely on the updated API.
core/trino-main/src/operator/SpoolingPagePartitioner.java Introduced page partitioning logic to split large pages, ensuring each partition meets defined thresholds.
core/trino-main/src/operator/OutputSpoolingOperatorFactory.java Adjusted operator behavior to use the new API, added segment counters, and routed both spooled and inlined outputs through the serialization process.
Comments suppressed due to low confidence (2)

core/trino-main/src/main/java/io/trino/server/protocol/spooling/SpooledMetadataBlockSerde.java:94

  • Update the method documentation for serialize/deserialize to clearly state that they now support multiple SpooledMetadataBlock instances, and outline any backwards compatibility considerations.
public static List<SpooledMetadataBlock> deserialize(Page page)

core/trino-main/src/main/java/io/trino/operator/OutputSpoolingOperatorFactory.java:328

  • Ensure downstream consumers expecting a serialized Page instance are updated to handle the new inline API output (i.e. a List wrapped by the serialize method). Clarifying this in both the code and documentation can prevent integration issues.
private List<SpooledMetadataBlock> inline(List<Page> pages)

Comment on lines +152 to +153
private final AtomicLong spooledSegmentsCount = new AtomicLong();
private final AtomicLong inlinedSegmentsCount = new AtomicLong();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those could be added in separate commit

@wendigo wendigo force-pushed the serafin/partition-spooling-pages branch 2 times, most recently from dcdd278 to 5116d3a Compare June 26, 2025 16:57
This prevents dictionary blocks to expand to a single huge segment
that most clients won't be able to read due to its size.

This now partitions these kind of pages to smaller ones and allows
the operator to output multiple spooled/inlined pages in a single getOutput call.
@wendigo wendigo force-pushed the serafin/partition-spooling-pages branch from 5116d3a to 4deff20 Compare July 1, 2025 08:33
@wendigo wendigo merged commit 61db4ea into master Jul 1, 2025
201 of 206 checks passed
@wendigo wendigo deleted the serafin/partition-spooling-pages branch July 1, 2025 12:31
@github-actions github-actions bot added this to the 477 milestone Jul 1, 2025
@ebyhr
Copy link
Member

ebyhr commented Aug 6, 2025

@wendigo Could you update the release note entry?

@wendigo
Copy link
Contributor Author

wendigo commented Aug 6, 2025

@ebyhr done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Trino worker dies while processing the query from #21443

5 participants