Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOUNTY - $500] Better PartitioningStrategy #284

Open
AlexCheema opened this issue Oct 3, 2024 · 2 comments
Open

[BOUNTY - $500] Better PartitioningStrategy #284

AlexCheema opened this issue Oct 3, 2024 · 2 comments

Comments

@AlexCheema
Copy link
Contributor

AlexCheema commented Oct 3, 2024

Introduction

exo currently implements Pipeline Parallel inference. This splits up layers of a model over multiple devices and executes them sequentially, device-by-device.

There are different ways we can split up the model layers. For this purpose, exo defines something called a PartitioningStrategy:

class PartitioningStrategy(ABC):
@abstractmethod
def partition(self, topology: Topology) -> List[Partition]:
pass

This takes a Topology and gives a List[Partition].
A Partition consists of node_id, start and end. The Partitions must be continuous ranges [start, end). The first start must be 0. The last end must be 1.

There's two things going on here:

  1. It decides the order that nodes execute and send messages between each other. For example, if you return [node1, node2, node3], then node1 will execute first, followed by node2, followed by node3, which will then send an output token to node1 to continue the cycle. However, if you return [node2, node1, node3] then node2 will execute first, followed by node1, followed by node3, which will then send an output token to node2 to continue the cycle.
  2. It decides how many layers each node gets. Each node gets a number of layers proportional to end-start. For example if start=0,end=1 then that node will get all the layers. If start=0,end=0.5 for node1 and start=0.5,end=1 for node2 then node1 will get 50% of the layers and node2 will get 50% of the layers.

The default and only PartitioningStrategy right now is RingMemoryWeightedPartitioningStrategy:

class RingMemoryWeightedPartitioningStrategy(PartitioningStrategy):
def partition(self, topology: Topology) -> List[Partition]:
nodes = list(topology.all_nodes())
nodes.sort(key=lambda x: (x[1].memory, x[0]), reverse=True)
total_memory = sum(node[1].memory for node in nodes)
partitions = []
start = 0
for node in nodes:
end = round(start + (node[1].memory/total_memory), 5)
partitions.append(Partition(node[0], start, end))
start = end
return partitions

What this does is it sorts primarily by memory, secondarily by node_id. The size of each partition is proportional to the memory of the device, i.e. if deviceA has 4GB memory and deviceB has 6GB memory, deviceA will get 40% of the layers and deviceB will get 60% of the layers (modulo some rounding). Note that it's important that we sort secondarily by node_id to ensure deterministic and consistent sorting in the case that memory is the same for two devices.

The task

The task is to implement a new, improved PartitioningStrategy that takes into account more than just memory. This may require augmenting the Topology class with more information that it currently has, which will require changes across the codebase. Some things you might want to consider here are: device FLOPS and inter-node latency. There are many other things you could take into account here which I will leave to you to decide.

I have some ideas for how to do this, and there are many potential approaches however I'm looking for out of the box ideas here.

I'll leave it up to you to reason about how to lay this out, but there are two high level metrics that would make sense to optimise for (should they be optimised separately or together?):

  • Time-to-first token (latency)
  • Tokens per second (throughput)

Deliverables

  1. A set of unit tests for your new PartitioningStrategy that show it works in different cases.
  2. A set of unit tests that "simulate" different scenarios and show that this PartitioningStrategy achieves the optimal solution in each scenario.
  3. An option added to the main script to enable this PartitioningStrategy (you decide if other parameters should be added to configure the new PartitioningStrategy
@AlexCheema AlexCheema changed the title [BOUNTY - $200] Better Partitioning Strategy [BOUNTY - $500] Better Partitioning Strategy Oct 3, 2024
@AlexCheema AlexCheema changed the title [BOUNTY - $500] Better Partitioning Strategy [BOUNTY - $500] Better PartitioningStrategy Oct 3, 2024
@ivanmladek
Copy link

Will be testing the strategy over the weekend on my cluster of MacBooks.

@exo-explore exo-explore deleted a comment from DevEmilio96 Oct 4, 2024
@nabby27
Copy link

nabby27 commented Oct 6, 2024

Hi @AlexCheema, I've seen that you usually create bounties on issues, maybe you're interested in using Opire. You don't pay until someone claims the bounties with a PR.

PS: I'm the cofounder, so if you need anything, feel free to contact me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants