Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Shuffle] isolate mappers in different subtasks for fetch_by_index mode #3239

Merged

Conversation

chaokunyang
Copy link
Contributor

@chaokunyang chaokunyang commented Aug 31, 2022

What do these changes do?

Mars coloring-based fusion algorithm may fuse multiple mapper subtasks of different shuffles into one subtask, which conflict with ShuffleFetchType.FETCH_BY_INDEX shuffle block resolution mode. In ShuffleFetchType.FETCH_BY_INDEX shuffle block resolution mode, shuffle data is resolved by index. If one subtask contains shuffle mapper of different shuffle, current resolution mode won't work.

One solution is adding second-level index to ShuffleManager to work around this. It's complicated and not compatible with push-based shuffle and other shuffle system such as CloudShuffleServicealibaba/RemoteShuffleServiceFirestorm.

Considering very little shuffle operands will be fused into the same subtask(currently only DataFrameIndexAlign mappers of different shuffle are fused into one subtask). We can just change the fuse algorithm to make those mappers isolated with each other into different subtasks.
image

For other shuffle operands, there won’t be any change in the generated subtask graph.

Related issue number

Fixes #3226
#2719
#2916

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@chaokunyang chaokunyang requested a review from a team as a code owner August 31, 2022 03:36
@chaokunyang chaokunyang changed the title [Shuffle] ensure every subtask contains only at most one mapper [Shuffle] isolate mappers in different subtasks for fetch_by_index mode Aug 31, 2022
Copy link
Contributor

@fyrestone fyrestone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can find a better way to avoid multiple graph iterations in the future, e.g. normalize shuffle operands to remove many special cases.

Copy link
Contributor

@zhongchun zhongchun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This may cause more subtasks. Maybe we can optimize graph later to reduce subtasks.

@chaokunyang chaokunyang merged commit ee152b6 into mars-project:master Sep 5, 2022
@fyrestone fyrestone added this to In Progress in Ray DAG Execution Backend via automation Sep 5, 2022
aresnow1 pushed a commit to aresnow1/mars that referenced this pull request Sep 7, 2022
…de (mars-project#3239)

* ensure every subtask contains only at most one mapper

* enable test_align_execution

* fix isolate mappers for `DataFrameIndexAlign` which are not mappers

(cherry picked from commit ee152b6)
qianduoduo0904 pushed a commit to qianduoduo0904/mars that referenced this pull request Sep 22, 2022
…de (mars-project#3239)

* ensure every subtask contains only at most one mapper

* enable test_align_execution

* fix isolate mappers for `DataFrameIndexAlign` which are not mappers

(cherry picked from commit ee152b6)
UranusSeven pushed a commit to xorbitsai/mars that referenced this pull request Sep 22, 2022
…de (mars-project#3239)

* ensure every subtask contains only at most one mapper

* enable test_align_execution

* fix isolate mappers for `DataFrameIndexAlign` which are not mappers

(cherry picked from commit ee152b6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

[BUG] DataFrameIndexAlign failed on ray shuffle with AssertionError
3 participants