Skip to content

5.2. Harbor Boost

av edited this page Sep 24, 2024 · 25 revisions

Handle: boost URL: http://localhost:34131/

Screenshot of boost bench g1 and rcn optimizer modules compared to original LLMs. BBH256 task, run with Harbor Bench

boost is a service that acts as optimizing LLM proxy. It takes your inputs, and pre-processes them before sending them to the downstream API.

It allows implementing workflows like below:

  • When "random" is mentioned in the message, klmbr will rewrite 35% of message characters to increase the entropy and produce more diverse completion
  • Launch self-reflection reasoning chain when the message ends with a question mark
  • Expand the conversation context with the "inner monologue" of the model, where it can iterate over your question a few times before giving the final answer

Pre-processing can include:

  • prompt re-writing
  • reasoning chains
  • context injection
  • any other input-only transformation

boost operates at the OpenAI-compatible API level, so can be used with any LLM backend that accepts OpenAI API requests.

You don't have to use Harbor to run boost. See the Standalone Usage section for more information.


Modules

The service includes a set of modules that can be enabled/disabled and configured via the Harbor CLI or the .env file.

Starting

# [Optional] pre-build the image
harbor build boost

# Start the service
harbor up boost

boost is automatically connected to the LLM backends integrated with Harbor. It has its own API which will serve "boosted" models.

# Get the URL for the boost service
harbor url boost

# Open default boost enpdoint in the browser
harbor open boost

When running with Harbor's Open WebUI, "boosted" models will be available there automatically.

Configuration

Configuration is done via the Harbor CLI, harbor config or the .env file. All three ways are interchangeable, you can read more about them in the User Guide.

# Enable/Disable a module
harbor boost modules add <module>
harbor boost modules rm <module>

# Set a parameter
harbor boost <module> <parameter>
harbor boost <module> <parameter> <value>

# See boost/module help entries
# for more info
harbor boost --help
harbor boost klmbr --help
harbor boost rcn --help
harbor boost g1 --help

Boost configuration

You can adjust certain aspects of the boost service that are shared between all the modules. This includes the API behavior and specifics of the module execution. Please find supported configuration options below.

# Adjust the port that Boost will linked to on the host
harbor config set boost.host.port 34131

# Additional OpenAI-compatible APIs to boost
harbor boost urls add http://localhost:11434/v1
harbor boost urls rm http://localhost:11434/v1
harbor boost urls rm 0 # by index
harobr boost urls ls

# Keys for the OpenAI-compatible APIs to boost. Semicolon-separated list.
# ⚠️ These are index-matched with the URLs. Even if the API doesn't require a key,
# you still need to provide a placeholder for it.
harbor boost keys add sk-ollama
harbor boost keys rm sk-ollama
harbor boost keys rm 0 # by index
harbor boost keys ls

Below are additional configuration options that do not have an alias in the Harbor CLI (so you need to use harbor config directly). For example harbor config set boost.intermediate_output true.

boost.intermediate_output

When set to true, the boost output the intermediate steps of the module, not only the final result, providing more dynamic feedback to the user.

Intermediate output includes status messages, internal monologue, and other non-final completions. Note that it doesn't mean "all output" from the module, as the module source can still decide to not emit specific things at all, or inverse - emit them even if this setting is off.

Example of the intermediate output from the g1 module - underlying reasoning steps:

example of intermediate output from g1 boost module

boost.status.style

A module can call llm.emit_status during its processing, which will be streamed as a "status" or "progress" message to the user. This setting controls the format of this message, which will be dependent on what's supported by the frontend where boost response is displayed.

Options:

md:codeblock "\n```boost\n{status}\n```\n",
md:h1        "\n\n# {status}\n\n",
md:h2        "\n\n## {status}\n\n",
md:h3        "\n\n### {status}\n\n",
plain        "\n\n{status}\n\n",
none         ""

The default is md:codeblock and looks like this in the WebUI:

screenshot of status in the webui

boost.base_models

Depending on the configuration of your setup, your LLM backend might or might not be connected to the UI directly. If not (or using boost as a standalone service), you can toggle this option on for the boost to serve them as is.

# Now "unboosted" models will also be available
# via the boost API
harbor config boost.base_models true

boost.model_filter

When specified, boost will only serve models matching the filter. The filter is a key/value expression that'll be matched against the model metadata. See examples below:

# Only boost models with the "llama" in the name
harbor config set boost.model_filter id.contains=llama
# Only boost models matching the regex
harbor config set boost.model_filter id.regex=.+q8_0$
# Only boost a model with the exact ID
harbor config set boost.model_filter id=llama3.1:8b

This filter runs after the boosted models (per module) are added, so you can filter them out as well.

Modules configuration

You can configure modules using either harbor boost modules alias or by editing the HARBOR_BOOST_MODULES variable in the .env file.

# Enable the module
harbor boost modules add <module>
# Disable the module
harbor boost modules rm <module>
# List enabled modules
harbor boost modules ls

Note that new Harbor releases might introduce new modules, so the default value of this setting could change in the future. Check out Harbor Profiles for a way to save and restore your configuration.

Boost Modules & Configuration

boost is built from modules implementing specific optmisation workflows. Those aren't limited to the reasoning or prompt re-writing, but can include any transformation that can help the downstream model to perform better.

Modules can be enabled/disabled and configured via the Harbor CLI or the .env file manually. You'll need to restart the boost service for the changes to take effect.

# Enable/Disable a module
harbor boost modules add <module>
harbor boost modules rm <module>

Tip

You can use Harbor profiles to quickly rollback to the default configuration.

# Save current changes, if needed
harbor profile save <name>
# Rollback to the default configuration
harbor profile use default

rcn - recursive certainty validation

RCN is an original technique based on two principles: context expansion and self-validation. It works by first expanding the context of the input by asking the model to explain the meaning of the every word in the prompt. Then, a completion is generated, then model is asked to validte how sure it is that the answer is correct. After two iterations, model is asked to give a final answer.

# Enable the module
harbor boost modules add rcn

Parameters

  • strat - strategy for selection of the messages to rewrite. Default is match
    • all - match all messages
    • first - match first message regardless of the role
    • last - match last message regardless of the role
    • any - match one random message
    • percentage - match a percentage of random messages from the conversation
    • user - match all user messages
    • match - use a filter to match messages
  • strat_params - parameters (filter) for the selection strategy. Default matches all user messages
    • percentage - for percentage strat - the percentage of messages to match, default is 50
    • index - for match strat - the index of the message to match
    • role - for match strat - the role of the message to match
    • substring - for match strat - will match messages containing the substring

Example

# Configure message selection
# to match last user message
harbor boost rcn strat match
harbor boost rcn strat_params set role user
harbor boost rcn strat_params set index -1
klmbr - boost LLM creativity

Handle: klmbr

klmbr screenshot

Boosts model creativity by applying character-level random rewrites to the input. Read a full overview of the technique in the source repo.

Every LLM will respond to rewrites in a different way. Some models will generate more diverse completions, while others might start generating completely random sequences. Default parameters are tuned for Llama 3.1 8B, you might want to adjust them when running with a different model.

Parameters

  • percentage - amount of characters to rewrite in the input. Default is 35
  • mods - types of rewrites to apply. Default is all, available options:
    • capitalize - swaps character capitalization
    • diacritic - adds a random diacritic to the character
    • leetspeak - replaces characters with leetspeak equivalents
    • remove_vowel - removes vowels from the input
  • strat - strategy for selection of the messages to rewrite. Default is match
    • all - match all messages
    • first - match first message regardless of the role
    • last - match last message regardless of the role
    • any - match one random message
    • percentage - match a percentage of random messages from the conversation
    • user - match all user messages
    • match - use a filter to match messages
  • strat_params - parameters (filter) for the selection strategy. Default matches all user messages
    • percentage - for percentage strat - the percentage of messages to match, default is 50
    • index - for match strat - the index of the message to match
    • role - for match strat - the role of the message to match
    • substring - for match strat - will match messages containing the substring

Examples

# Reduce the rewrite percentage
harbor boost klmbr percentage 20

# Enable/disable rewrite modules
harbor boost klmbr mods rm all
harbor boost klmbr mods add capitalize
harbor boost klmbr mods add diacritic
harbor boost klmbr mods add leetspeak
harbor boost klmbr mods add remove_vowel

# Change the selection strategy
# 1. Match all user messages
harbor boost klmbr strat match
harbor boost klmbr strat_params role user
# 2. Match the last message (regardless of the role)
harbor boost klmbr strat match
harbor boost klmbr strat_params index -1
# 3. Match messages containing a substring
harbor boost klmbr strat match
harbor boost klmbr strat_params substring "random"

g1 - o1-like reasoning chains

Dynamic Chain-of-Thought pattern.

See original implementation for Grok. Harbor also has a dedicated ol1 service (UI only) that implements the same technique.

# Enable the module
harbor boost modules add g1

Parameters

  • max_steps - Maximum amount of iterations for self-reflection, default is 15
  • strat - strategy for selection of the messages to rewrite. Default is match
    • all - match all messages
    • first - match first message regardless of the role
    • last - match last message regardless of the role
    • any - match one random message
    • percentage - match a percentage of random messages from the conversation
    • user - match all user messages
    • match - use a filter to match messages
  • strat_params - parameters (filter) for the selection strategy. Default matches all user messages
    • percentage - for percentage strat - the percentage of messages to match, default is 50
    • index - for match strat - the index of the message to match
    • role - for match strat - the role of the message to match
    • substring - for match strat - will match messages containing the substring

API

boost works as an OpenAI-compatible API proxy. It'll query configured downstream services for which models they serve and provide "boosted" wrappers in its own API.

See the http catalog entry for some sample requests.

GET /v1/models

List boosted models. boost will serve additional models as per enabled modules. For example:

[
  {
    // Original, unmodified model proxy
    "id": "llama3.1:8b",
    // ...
  },
  {
    // LLM with klmbr technique applied
    "id": "klmbr-llama3.1:8b",
    // ...
  },
  {
    // LLM with rcn technique applied
    "id": "rcn-llama3.1:8b",
    // ...
  }
]

POST /v1/chat/completions

Chat completions endpoint.

  • Supports all paramaters from the downstream API, for example json format for Ollama
  • Supports streaming completions

Custom Modules

It's possible to create custom modules for boost, using the Chat abstractions. Here's an example of how that can look:

# Simulated converstaion chain, where
# the final tail is served back to the user
original_question = "How far is the moon? (from Alpha Centauri)"
llm = LLM()

# We simulate a multi-step conversation
# with the user, to improve the quality of the answer
tip_chat = Chat(
  llm=llm,
  tail=ChatNode(
    role="system",
    content="You are a space expert. No, you really are!".strip()
  )
)
tip_chat.user(original_question)
tip_chat.user("I will tip you $5 for every correct answer you give me.")
# Advance by 1 assistant message
await tip_chat.advance()
tip_chat.user("I can fine you $500 if the answer above is wrong. Are you sure?")
await tip_chat.advance()
tip_chat.user("Ok, could you please write a summary answer, so I can understand it better? Reply with the summary only and nothing else.")

# The final result can be served via API
return llm.stream_chat_completion(tip_chat)

Standalone usage

You can run boost as a standalone Docker container. See harbor-boost package in GitHub Container Registry.

# [Optional] pre-pull the image
docker pull ghcr.io/av/harbor-boost:latest

Configuration

boost can be configured via environment variables, here's a reference of what's currently supported, with respective defaults.

# OpenAI-compatible APIs to boost. Semicolon-separated list
# Example: "http://localhost:11434/v1;http://localhost:8014/openai"
# ⚠️ Even if the API doesn't require a key, you still need to provide
# a placeholder in "BOOST_OPENAI_KEYS" for it
HARBOR_BOOST_OPENAI_URLS              ""

# Keys for the OpenAI-compatible APIs to boost. Semicolon-separated list,
# must be index-matched with the URLs.
# Example: "key1;key2"
# ⚠️ You need to provide placeholder keys even if the API doesn't require them
HARBOR_BOOST_OPENAI_KEYS              ""

# Boost modules to enable. Semicolon-separated list
# Example: "klmbr;rcn;g1"
HARBOR_BOOST_MODULES                  ""

# Base models to serve via the boost API
HARBOR_BOOST_BASE_MODELS              false

# Filter models that will be served by the boost API
# Runs after the boost own models are added, so you can filter them as well
# Examples: "id.contains=llama", "id.regex=.+q8_0$", "id=llama3.1:8b"
HARBOR_BOOST_MODEL_FILTER             ""

# Enable intermediate output for the boost modules
# "Intermediate" means everything except the final result.
# For example, status messages or internal monologue
# Note that it doesn't mean "all output" from the module,
# as module source can still decide to not emit specific things at all
# or inverse - emit them even if this setting is off
HARBOR_BOOST_INTERMEDIATE_OUTPUT      true

# Module specific configs:
# Klmbr
HARBOR_BOOST_KLMBR_PERCENTAGE         35
HARBOR_BOOST_KLMBR_MODS               all
HARBOR_BOOST_KLMBR_STRAT              match
HARBOR_BOOST_KLMBR_STRAT_PARAMS       role=user

# RCN
HARBOR_BOOST_RCN_STRAT                match
HARBOR_BOOST_RCN_STRAT_PARAMS         role=user,index=-1

# G1
HARBOR_BOOST_G1_STRAT                 match
HARBOR_BOOST_G1_STRAT_PARAMS          role=user,index=-1
HARBOR_BOOST_G1_MAX_STEPS             15

See the main portion of the guide for detailed explanation of these variables. You can also find the most complete overview of the supported variables in the source.

Example

# Start the container
docker run \
  # 172.17.0.1 is the default IP of the host, when running on Linux
  # So, the example below is for local ollama
  -e "HARBOR_BOOST_OPENAI_URLS=http://172.17.0.1:11434/v1" \
  -e "HARBOR_BOOST_OPENAI_KEYS=sk-ollama" \
  # Configuration for the boost modules
  -e "HARBOR_BOOST_MODULES=klmbr;rcn;g1" \
  -e "HARBOR_BOOST_KLMBR_PERCENTAGE=60" \
  -p 8004:8000 \
  ghcr.io/av/harbor-boost:latest

# In the separate terminal (or detach the container)
curl http://localhost:8004/health
curl http://localhost:8004/v1/models
Clone this wiki locally