Skip to content

MachineLearningSystem/DynaPipe

 
 

Repository files navigation

DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism


What is DynaPipe?

We investigate a critical yet underexplored issue: the pipeline inter-stage bubble problem introduced by sampling operations. To address this challenge, we propose DynaPipe, a novel runtime dynamic layer redistribution scheme. By adaptively adjusting the computational load across pipeline stages, DynaPipe ensures more balanced task distribution, effectively aligning the pipeline and mitigating inter-stage imbalances. Compared with state-of-the-art pipeline inference frameworks, DynaPipe achieves notable performance gains and significantly improves overall efficiency.

Install DynaPipe

pip install --verbose -e .

Launch online serving

# To enable prefix caching, add "--enable-prefix-caching"
# To enable pipeline parallelism, add "--pp $PP_DEGREE"
python -m gllm.entrypoints.api_server --port $PORT --model-path $MODEL_PATH --enable-adjust-ayers

Online benchmark with gllm or vllm

python benchmarks/benchmark_serving.py --backend $BACKEND --model $MODEL \
        --dataset-name $DATASET_NAME --dataset-path $DATASET_PATH \
        --num-prompts $NUM_PROMPTS --port $PORT --trust-remote-code \
        --request-rate $REQUEST_RATE

Acknowledgements

This project builds upon the foundational work of gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.8%
  • Cuda 9.3%
  • CMake 5.7%
  • C 1.3%
  • C++ 0.9%