GitHub - denwong47/ants: Distributed systems without leader. All workers hosts the same API, and will relay the request to others via gRPC if busy.

Ants

A naive implementation of distributed system to do arbitrary work.

In short, this is a cluster of API servers hosting identical image, each capable of handling requests; but if a node is busy, it will forward the request to another node. The list of nodes is stored as a min heap, prioritizing the node that has not been called for the longest time. There is no leader in this system; each node is equal.

Motivation

The motivation for this project is for self-hosted LLMs which tends to occupy a whole consumer GPU for a long time. They also tend to be memory intensive, not to mention that their failure modes during concurrent calls are not consistent - Out of memory, segfault, CUDA errors, etc. This project aims to create a simple system that can balance the load between nodes, reducing the chance of a node being overwhelmed into recovery mode, causing even more downtime.

The architecture is also designed to be homelabs friendly - it is assumed that each node already hosts some sort of IoT services, which will ping a loopback first to keep things local. This is why multiple API endpoints are used instead of a single entrypoint with a load balancer - which is a single point of failure.

Aspirations

Pre-requisites

The protobuf crate requires protoc to be installed. See Protocol Buffer Compiler Installation for more information.

Simple demo

In two separate terminals, run the following commands:

cargo run --bin serve --features=example -- --port 5355 --grpc-port 50051

cargo run --bin serve --features=example -- --port 5356 --grpc-port 50052

This will host two identical nodes, each listening on a different port. In practice, these would be container images hosted within the same cluster on different machines. In the following example, we will have 3 nodes setup.

Upon initiation, each of the node will send out multicasts to announce their presence, to which any existing nodes will reply by the same. This will allow each of the node to build up a list of nodes that they can relay work to.

Each of these nodes can receive requests on their respective ports. The grpc-port is used for gRPC communication between the nodes. If any of them are occupied with a request, it will forward the request to other nodes.

In Python, or whatever flavour of cURL you desire, send multiple concurrent requests to any of the node asking for work to be done:

from concurrent.futures import ThreadPoolExecutor
import requests

with ThreadPoolExecutor() as executor:
    responses = list(executor.map(
        lambda x: requests.post(
            "http://localhost:5355/send",
            json={"body": f"test {x}"}
        ).json(),
        range(5)
    ))

By sending the same node multiple requests, you can see that the first request will be handled by the first node, and the second request will be forwarded to the second node, resulting in a different worker tag in the response:

[{'success': True,
  'worker': 'worker://0.0.0.0:50051',
  'body': 'Work done: test 0'},
 {'success': True,
  'worker': 'worker://0.0.0.0:50052',
  'body': 'Work done: test 1'},
 {'success': True,
  'worker': 'worker://0.0.0.0:50053',
  'body': 'Work done: test 2'},
 {'success': True,
  'worker': 'worker://0.0.0.0:50052',
  'body': 'Work done: test 3'},
 {'success': True,
  'worker': 'worker://0.0.0.0:50053',
  'body': 'Work done: test 4'}]

Since we have more work than we have nodes, some work will have to wait until some nodes can be recycled. This is done transparently in the background - the only drawback is longer latency for those calls.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
bin		bin
packages		packages
proto		proto
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ants

Motivation

Aspirations

Pre-requisites

Simple demo

About

Releases

Packages

Languages

denwong47/ants

Folders and files

Latest commit

History

Repository files navigation

Ants

Motivation

Aspirations

Pre-requisites

Simple demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages