Preble

Preble is a load balancer for effecient prefix caching systems. PrePrint release at https://arxiv.org/abs/2407.00023

Installation

You can install the package using pip:

Code Structure

The multi_node directory contains the code for running as a separate abstraction layer to SGLang/vLLM in a distributed setting. This code is responsible for coordinating and managing the execution of the distributed system.

Editable Installation

pip3 install -e .
pip install -e "python[all]"
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/

Regular Pip Installation:

pip3 install preble
pip install git+https://github.com/wuklab/preble.git#egg=preble[all]
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/

We release a custom version of sglang that supports chunked prefill

Programatically starting the server

We can support providing a list of runtime urls

from preble.main import start_server

start_server(
    runtime_selection_policy="custom",
    runtime_urls="http://127.0.0.1:30000/generate,http://127.0.0.1:30001/generate",
    host='127.0.0.1',
    port=8000,
    model="mistralai/Mistral-7B-v0.1"
)

We can also support dynamically loading the models to seperate cuda devices

from preble.main import start_server_and_load_models

start_server_and_load_models(
    model_name="mistralai/Mistral-7B-v0.1",
    devices=[0, 1],
    host="127.0.0.1",
    port=8000
)

The server can be run via:

python3 multi_node/server/server.py <server/deploy_and_run>

server runs the server given a list of urls
deploy_and_run generates two endpoints

CLI Configuration

    runtime_selection_policy: The policy to select the runtime (e.g., custom, round_robin).
    runtime_urls: Comma-separated list of runtime URLs.
    host: The host address for the server.
    port: The port number for the server.
    model: The model to be used (e.g., mistralai/Mistral-7B-v0.1).

Citation And Acknowledgment

The code is forked of sglang

pypi build and install instructions

Currently uploaded at: python setup.py bdist_wheel twine upload --repository testpypi dist/* --verbose python3 -m pip install --index-url https://test.pypi.org/simple/ preble

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
assets		assets
benchmark		benchmark
docs		docs
examples		examples
nsdi_plots		nsdi_plots
playground		playground
preble.egg-info		preble.egg-info
preble		preble
python		python
real_ckpt_all_in_one/rebalancer		real_ckpt_all_in_one/rebalancer
scheduling		scheduling
scripts		scripts
test		test
LICENSE		LICENSE
README.md		README.md
arg_utils.py		arg_utils.py
debug.txt		debug.txt
eventsim.py		eventsim.py
id_rsa.pub		id_rsa.pub
log		log
log1		log1
log_cache		log_cache
log_my		log_my
log_nocache		log_nocache
model_equation_aio_regression.py		model_equation_aio_regression.py
model_equation_fitting.py		model_equation_fitting.py
mycserverlog		mycserverlog
output.json		output.json
profile.txt		profile.txt
profile_bandwidth.py		profile_bandwidth.py
sample_server_async_call.py		sample_server_async_call.py
setup.py		setup.py
setup.sh		setup.sh
sim.py		sim.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preble

Installation

Code Structure

Programatically starting the server

Citation And Acknowledgment

pypi build and install instructions

About

Releases

Packages

Languages

License

lihuahua123/jointserve

Folders and files

Latest commit

History

Repository files navigation

Preble

Installation

Code Structure

Programatically starting the server

Citation And Acknowledgment

pypi build and install instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages