Skip to content

Conversation

@owenowenisme
Copy link
Member

@owenowenisme owenowenisme commented Nov 17, 2025

Description

Previously we will try slice the block when self._total_pending_rows >= self._target_num_rows or flush_remaining is True, but flush_remaining doesn't mean self._total_pending_rows >= self._target_num_rows so it could make the slicing failed because our slicing logic is based on assumption there should be at least one full block.

This PR fix the logic and added test for such case.

Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme owenowenisme requested a review from a team as a code owner November 17, 2025 09:58
@owenowenisme owenowenisme changed the title [Data] Fix streaming repartition slicing block when flushing [Data] Avoid slicing block when total_pending_rows < target Nov 17, 2025
@owenowenisme owenowenisme added the go add ONLY when ready to merge, run all tests label Nov 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a crash in streaming repartition that occurs when flushing remaining blocks that have fewer rows than the target block size. The change correctly separates the flushing logic from the main bundling logic, which resolves the crash. However, the implementation has a subtle bug where it may not flush all pending data if there are enough rows for at least one full block when done_adding_bundles is called, potentially leading to data loss. I've suggested a more robust implementation that ensures all full blocks are created before flushing the remainder.

Signed-off-by: You-Cheng Lin <[email protected]>
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Nov 17, 2025
@raulchen raulchen merged commit dad8002 into ray-project:master Nov 17, 2025
6 checks passed
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ect#58699)

## Description
Previously we will try slice the block when `self._total_pending_rows >=
self._target_num_rows` or `flush_remaining` is True, but flush_remaining
doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it
could make the slicing failed because our slicing logic is based on
assumption there should be at least one full block.

This PR fix the logic and added test for such case.

---------

Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ect#58699)

## Description
Previously we will try slice the block when `self._total_pending_rows >=
self._target_num_rows` or `flush_remaining` is True, but flush_remaining
doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it
could make the slicing failed because our slicing logic is based on
assumption there should be at least one full block.

This PR fix the logic and added test for such case.

---------

Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ect#58699)

## Description
Previously we will try slice the block when `self._total_pending_rows >=
self._target_num_rows` or `flush_remaining` is True, but flush_remaining
doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it
could make the slicing failed because our slicing logic is based on
assumption there should be at least one full block.

This PR fix the logic and added test for such case.

---------

Signed-off-by: You-Cheng Lin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants