[BOUNTY - $500] Better PartitioningStrategy #284

AlexCheema · 2024-10-03T20:52:51Z

Introduction

exo currently implements Pipeline Parallel inference. This splits up layers of a model over multiple devices and executes them sequentially, device-by-device.

There are different ways we can split up the model layers. For this purpose, exo defines something called a PartitioningStrategy:

exo/exo/topology/partitioning_strategy.py

Lines 16 to 19 in 5e0db20

    
           class PartitioningStrategy(ABC): 
        
             @abstractmethod 
        
             def partition(self, topology: Topology) -> List[Partition]: 
        
               pass

This takes a Topology and gives a List[Partition].
A Partition consists of node_id, start and end. The Partitions must be continuous ranges [start, end). The first start must be 0. The last end must be 1.

There's two things going on here:

It decides the order that nodes execute and send messages between each other. For example, if you return [node1, node2, node3], then node1 will execute first, followed by node2, followed by node3, which will then send an output token to node1 to continue the cycle. However, if you return [node2, node1, node3] then node2 will execute first, followed by node1, followed by node3, which will then send an output token to node2 to continue the cycle.
It decides how many layers each node gets. Each node gets a number of layers proportional to end-start. For example if start=0,end=1 then that node will get all the layers. If start=0,end=0.5 for node1 and start=0.5,end=1 for node2 then node1 will get 50% of the layers and node2 will get 50% of the layers.

The default and only PartitioningStrategy right now is RingMemoryWeightedPartitioningStrategy:

exo/exo/topology/ring_memory_weighted_partitioning_strategy.py

Lines 7 to 18 in 5e0db20

    
           class RingMemoryWeightedPartitioningStrategy(PartitioningStrategy): 
        
             def partition(self, topology: Topology) -> List[Partition]: 
        
               nodes = list(topology.all_nodes()) 
        
               nodes.sort(key=lambda x: (x[1].memory, x[0]), reverse=True) 
        
               total_memory = sum(node[1].memory for node in nodes) 
        
               partitions = [] 
        
               start = 0 
        
               for node in nodes: 
        
                 end = round(start + (node[1].memory/total_memory), 5) 
        
                 partitions.append(Partition(node[0], start, end)) 
        
                 start = end 
        
               return partitions

What this does is it sorts primarily by memory, secondarily by node_id. The size of each partition is proportional to the memory of the device, i.e. if deviceA has 4GB memory and deviceB has 6GB memory, deviceA will get 40% of the layers and deviceB will get 60% of the layers (modulo some rounding). Note that it's important that we sort secondarily by node_id to ensure deterministic and consistent sorting in the case that memory is the same for two devices.

The task

The task is to implement a new, improved PartitioningStrategy that takes into account more than just memory. This may require augmenting the Topology class with more information that it currently has, which will require changes across the codebase. Some things you might want to consider here are: device FLOPS and inter-node latency. There are many other things you could take into account here which I will leave to you to decide.

I have some ideas for how to do this, and there are many potential approaches however I'm looking for out of the box ideas here.

I'll leave it up to you to reason about how to lay this out, but there are two high level metrics that would make sense to optimise for (should they be optimised separately or together?):

Time-to-first token (latency)
Tokens per second (throughput)

Deliverables

A set of unit tests for your new PartitioningStrategy that show it works in different cases.
A set of unit tests that "simulate" different scenarios and show that this PartitioningStrategy achieves the optimal solution in each scenario.
An option added to the main script to enable this PartitioningStrategy (you decide if other parameters should be added to configure the new PartitioningStrategy

The text was updated successfully, but these errors were encountered:

ivanmladek · 2024-10-04T13:56:53Z

Will be testing the strategy over the weekend on my cluster of MacBooks.

nabby27 · 2024-10-06T18:12:19Z

Hi @AlexCheema, I've seen that you usually create bounties on issues, maybe you're interested in using Opire. You don't pay until someone claims the bounties with a PR.

PS: I'm the cofounder, so if you need anything, feel free to contact me

AlexCheema changed the title ~~[BOUNTY - $200] Better Partitioning Strategy~~ [BOUNTY - $500] Better Partitioning Strategy Oct 3, 2024

AlexCheema changed the title ~~[BOUNTY - $500] Better Partitioning Strategy~~ [BOUNTY - $500] Better PartitioningStrategy Oct 3, 2024

exo-explore deleted a comment from DevEmilio96 Oct 4, 2024

DevEmilio96 mentioned this issue Oct 14, 2024

Adaptive Flops Partitioning Strategy #346

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BOUNTY - $500] Better PartitioningStrategy #284

[BOUNTY - $500] Better PartitioningStrategy #284

AlexCheema commented Oct 3, 2024 •

edited

Loading

ivanmladek commented Oct 4, 2024

nabby27 commented Oct 6, 2024

[BOUNTY - $500] Better PartitioningStrategy #284

[BOUNTY - $500] Better PartitioningStrategy #284

Comments

AlexCheema commented Oct 3, 2024 • edited Loading

Introduction

The task

Deliverables

ivanmladek commented Oct 4, 2024

nabby27 commented Oct 6, 2024

AlexCheema commented Oct 3, 2024 •

edited

Loading