[Shuffle] Skip store shuffle object refs to reduce meta overhead #3209

chaokunyang · 2022-08-08T11:47:41Z

What do these changes do?

This PR skip store shuffle object refs to reduce meta overhead and supervisor serialization bottleneck by disable autoscale-in when shuffle is executing.

Related issue number

#2916

Check code requirements

tests added / passed (if needed)
Ensure all linting tests pass, see here for how to run them

mars/services/scheduling/supervisor/autoscale.py

zhongchun

LGTM.

fyrestone · 2022-08-12T03:13:09Z

mars/services/scheduling/api/oscar.py

+        cluster_api = await ClusterAPI.create(address)
+        supervisor_address = (await cluster_api.get_supervisors())[0]
+        autoscaler = await mo.actor_ref(
+            AutoscalerActor.default_uid(), address=supervisor_address


Why not use address=address? The address is the current supervisor address.

If supervisor created subpools and current process is subpool, then address won't be supervisor address

But, the SubtaskQueueingActor, SubtaskManagerActor and AutoscalerActor are all created in the same address. So, the actor refs should be created from the same address, too. Please refer to: https://github.com/mars-project/mars/blob/master/mars/services/scheduling/supervisor/service.py#L122

Can you unify the address of actor ref creations of SubtaskQueueingActor, SubtaskManagerActor and AutoscalerActor?

SubtaskQueueingActor and SubtaskManagerActor may be created in the subpool. The AutoscalerActor will always be created in the supervisor main pool, it's similar to GlobaleResourceManager.

AutoscalerActor

It's different address, the address ofSubtaskQueueingActor/SubtaskManagerActor/AutoscalerActor may be subpool address, AutoscalerActor will always use main pool address.

I changed actror_ref creation of AutoscalerActor to:

cluster_api = await ClusterAPI.create(address) [autoscaler] = await cluster_api.get_supervisor_refs( [AutoscalerActor.default_uid()] )

mars/services/subtask/core.py

fyrestone · 2022-08-12T03:27:28Z

mars/services/task/analyzer/analyzer.py

+        if self._has_shuffle:
+            mapper_chunks, proxy_chunks = [], []
+            for c in result_chunks:
+                if (


This logic is only for the ShuffleFetchType.FETCH_BY_INDEX? Record the shuffle proxy subtask reduces duplicate search shuffle proxy overhead, but introduces overhead for ShuffleFetchType.FETCH_BY_KEY.

It's used by TaskStageProcessor

mars/services/task/analyzer/analyzer.py

fyrestone

LGTM

zhongchun

LGTM.

…s-project#3209) * disable shuffle in autoscale to skip shuffle meta * fix remove mapper data * refine autoscale in * fix SubtaskGraph add proxy chunks * add shuffle tests to autoscale * fxi mapper chunks check * remove unnecessary event * refine proxy_subtasks check * workaround versionner compatibility with PEP660 * fix get autoscaler * fix subtask graph building (cherry picked from commit b2d658e)

chaokunyang requested a review from a team as a code owner August 8, 2022 11:47

chaokunyang marked this pull request as draft August 8, 2022 11:48

chaokunyang force-pushed the skip_store_shuffle_object_refs_when_autoscale_disabled branch 2 times, most recently from 1f99c3d to a037db0 Compare August 9, 2022 04:57

chaokunyang marked this pull request as ready for review August 9, 2022 06:56

chaokunyang changed the title ~~[Shuffle] Skip store shuffle object refs when autoscale disabled~~ [Shuffle] Skip store shuffle object refs to reduce meta overhead Aug 9, 2022

zhongchun reviewed Aug 11, 2022

View reviewed changes

mars/services/scheduling/supervisor/autoscale.py Outdated Show resolved Hide resolved

zhongchun previously approved these changes Aug 12, 2022

View reviewed changes

fyrestone reviewed Aug 12, 2022

View reviewed changes

chaokunyang dismissed zhongchun’s stale review via 3694c88 August 12, 2022 04:39

chaokunyang force-pushed the skip_store_shuffle_object_refs_when_autoscale_disabled branch from 05c8b0b to 6170d0b Compare August 12, 2022 09:04

chaokunyang force-pushed the skip_store_shuffle_object_refs_when_autoscale_disabled branch from 6170d0b to 4d4a4b4 Compare September 5, 2022 06:59

chaokunyang added 10 commits September 5, 2022 16:06

disable shuffle in autoscale to skip shuffle meta

3a7a5f5

fix remove mapper data

d35b21d

refine autoscale in

1147919

fix SubtaskGraph add proxy chunks

e411b5e

add shuffle tests to autoscale

a7cb234

fxi mapper chunks check

3008d7b

remove unnecessary event

6dda110

refine proxy_subtasks check

799a929

workaround versionner compatibility with PEP660

969d64f

fix get autoscaler

3a93dc1

chaokunyang force-pushed the skip_store_shuffle_object_refs_when_autoscale_disabled branch from 4d4a4b4 to 3a93dc1 Compare September 5, 2022 08:06

fix subtask graph building

970bf80

fyrestone approved these changes Sep 6, 2022

View reviewed changes

zhongchun approved these changes Sep 6, 2022

View reviewed changes

chaokunyang merged commit b2d658e into mars-project:master Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Shuffle] Skip store shuffle object refs to reduce meta overhead #3209

[Shuffle] Skip store shuffle object refs to reduce meta overhead #3209

chaokunyang commented Aug 8, 2022 •

edited

Loading

zhongchun left a comment

fyrestone Aug 12, 2022

chaokunyang Aug 12, 2022

fyrestone Aug 12, 2022

chaokunyang Sep 5, 2022

chaokunyang Sep 5, 2022

fyrestone Aug 12, 2022

chaokunyang Aug 12, 2022

fyrestone left a comment

zhongchun left a comment

[Shuffle] Skip store shuffle object refs to reduce meta overhead #3209

[Shuffle] Skip store shuffle object refs to reduce meta overhead #3209

Conversation

chaokunyang commented Aug 8, 2022 • edited Loading

What do these changes do?

Related issue number

Check code requirements

zhongchun left a comment

Choose a reason for hiding this comment

fyrestone Aug 12, 2022

Choose a reason for hiding this comment

chaokunyang Aug 12, 2022

Choose a reason for hiding this comment

fyrestone Aug 12, 2022

Choose a reason for hiding this comment

chaokunyang Sep 5, 2022

Choose a reason for hiding this comment

chaokunyang Sep 5, 2022

Choose a reason for hiding this comment

fyrestone Aug 12, 2022

Choose a reason for hiding this comment

chaokunyang Aug 12, 2022

Choose a reason for hiding this comment

fyrestone left a comment

Choose a reason for hiding this comment

zhongchun left a comment

Choose a reason for hiding this comment

chaokunyang commented Aug 8, 2022 •

edited

Loading