v0.1.20
v0.1.20 - SGLang integration
SGLang is a fast serving framework for large language models and vision language models.
Starting
# [Optional] Pre-pull the image
harbor pull sglang
# Download with HF CLI
harbor hf download google/gemma-2-2b-it
# Set the model to run using HF specifier
harbor sglang model google/gemma-2-2b-it
# See original CLI help for available options
harbor run sglang --help
# Set the extra arguments via "harbor args"
harbor sglang args --context-length 2048 --disable-cuda-graph
Full Changelog: v0.1.19...v0.1.20