-
-
Notifications
You must be signed in to change notification settings - Fork 71
5.2. Harbor Boost
Handle:
boost
URL: http://localhost:34131/
boost
is a service that acts as optimizing LLM proxy. It takes your inputs, and pre-processes them before sending them to the downstream API.
It allows implementing workflows like below:
- When "random" is mentioned in the message,
klmbr
will rewrite 35% of message characters to increase the entropy and produce more diverse completion - Launch self-reflection reasoning chain when the message ends with a question mark
- Expand the conversation context with the "inner monologue" of the model, where it can iterate over your question a few times before giving the final answer
Pre-processing can include:
- prompt re-writing
- reasoning chains
- context injection
- any other input-only transformation
boost
operates at the OpenAI-compatible API level, so can be used with any LLM backend that accepts OpenAI API requests.
The service includes a set of modules that can be enabled/disabled and configured via the Harbor CLI or the .env
file.
# [Optional] pre-build the image
harbor build boost
# Start the service
harbor up boost
boost
is automatically connected to the LLM backends integrated with Harbor. It has its own API which will serve "boosted" models.
# Get the URL for the boost service
harbor url boost
# Open default boost enpdoint in the browser
harbor open boost
When running with Harbor's Open WebUI, "boosted" models will be available there automatically.
Configuration is done via the Harbor CLI or the .env
file. You can enable/disable modules and configure their parameters.
# Enable/Disable a module
harbor boost modules add <module>
harbor boost modules rm <module>
# Set a parameter
harbor boost <module> <parameter>
harbor boost <module> <parameter> <value>
# See boost/module help entries
# for more info
harbor boost --help
harbor boost klmbr --help
harbor boost rcn --help
harbor boost g1 --help
boost
is built from modules implementing specific optmisation workflows. Those aren't limited to the reasoning or prompt re-writing, but can include any transformation that can help the downstream model to perform better.
Modules can be enabled/disabled and configured via the Harbor CLI or the .env
file manually. You'll need
to restart the boost
service for the changes to take effect.
# Enable/Disable a module
harbor boost modules add <module>
harbor boost modules rm <module>
Tip
You can use Harbor profiles to quickly rollback to the default configuration.
# Save current changes, if needed
harbor profile save <name>
# Rollback to the default configuration
harbor profile use default
RCN is an original technique based on two principles: context expansion and self-validation. It works by first expanding the context of the input by asking the model to explain the meaning of the every word in the prompt. Then, a completion is generated, and then model is asked to validte how sure it is that the answer is correct. After two iterations, model is asked to give a final answer.
# Enable the module
harbor boost modules add rcn
Parameters
-
strat
- strategy for selection of the messages to rewrite. Default ismatch
-
all
- match all messages -
first
- match first message regardless of the role -
last
- match last message regardless of the role -
any
- match one random message -
percentage
- match a percentage of random messages from the conversation -
user
- match all user messages -
match
- use a filter to match messages
-
-
strat_params
- parameters (filter) for the selection strategy. Default matches all user messages-
percentage
- forpercentage
strat - the percentage of messages to match, default is50
-
index
- formatch
strat - the index of the message to match -
role
- formatch
strat - the role of the message to match -
substring
- formatch
strat - will match messages containing the substring
-
Example
# Configure message selection
# to match last user message
harbor boost rcn strat match
harbor boost rcn strat_params set role user
harbor boost rcn strat_params set index -1
Handle:
klmbr
Boosts model creativity by applying character-level random rewrites to the input. Read a full overview of the technique in the source repo.
Every LLM will respond to rewrites in a different way. Some models will generate more diverse completions, while others might start generating completely random sequences. Default parameters are tuned for Llama 3.1 8B, you might want to adjust them when running with a different model.
Parameters
-
percentage
- amount of characters to rewrite in the input. Default is35
-
mods
- types of rewrites to apply. Default isall
, available options:-
capitalize
- swaps character capitalization -
diacritic
- adds a random diacritic to the character -
leetspeak
- replaces characters with leetspeak equivalents -
remove_vowel
- removes vowels from the input
-
-
strat
- strategy for selection of the messages to rewrite. Default ismatch
-
all
- match all messages -
first
- match first message regardless of the role -
last
- match last message regardless of the role -
any
- match one random message -
percentage
- match a percentage of random messages from the conversation -
user
- match all user messages -
match
- use a filter to match messages
-
-
strat_params
- parameters (filter) for the selection strategy. Default matches all user messages-
percentage
- forpercentage
strat - the percentage of messages to match, default is50
-
index
- formatch
strat - the index of the message to match -
role
- formatch
strat - the role of the message to match -
substring
- formatch
strat - will match messages containing the substring
-
Examples
# Reduce the rewrite percentage
harbor boost klmbr percentage 20
# Enable/disable rewrite modules
harbor boost klmbr mods rm all
harbor boost klmbr mods add capitalize
harbor boost klmbr mods add diacritic
harbor boost klmbr mods add leetspeak
harbor boost klmbr mods add remove_vowel
# Change the selection strategy
# 1. Match all user messages
harbor boost klmbr strat match
harbor boost klmbr strat_params role user
# 2. Match the last message (regardless of the role)
harbor boost klmbr strat match
harbor boost klmbr strat_params index -1
# 3. Match messages containing a substring
harbor boost klmbr strat match
harbor boost klmbr strat_params substring "random"
Dynamic Chain-of-Thought pattern.
See original implementation for Grok. Harbor also has a dedicated ol1
service (UI only) that implements the same technique.
# Enable the module
harbor boost modules add g1
Parameters
-
max_steps
- Maximum amount of iterations for self-reflection, default is 15 -
strat
- strategy for selection of the messages to rewrite. Default ismatch
-
all
- match all messages -
first
- match first message regardless of the role -
last
- match last message regardless of the role -
any
- match one random message -
percentage
- match a percentage of random messages from the conversation -
user
- match all user messages -
match
- use a filter to match messages
-
-
strat_params
- parameters (filter) for the selection strategy. Default matches all user messages-
percentage
- forpercentage
strat - the percentage of messages to match, default is50
-
index
- formatch
strat - the index of the message to match -
role
- formatch
strat - the role of the message to match -
substring
- formatch
strat - will match messages containing the substring
-
boost
works as an OpenAI-compatible API proxy. It'll query configured downstream services for which models they serve and provide "boosted" wrappers in its own API.
See the http catalog entry for some sample requests.
GET /v1/models
List boosted models. boost
will serve additional models as per enabled modules. For example:
[
{
// Original, unmodified model proxy
"id": "llama3.1:8b",
// ...
},
{
// LLM with klmbr technique applied
"id": "klmbr-llama3.1:8b",
// ...
},
{
// LLM with rcn technique applied
"id": "rcn-llama3.1:8b",
// ...
}
]
POST /v1/chat/completions
Chat completions endpoint.
- Supports all paramaters from the downstream API, for example
json
format for Ollama - Supports streaming completions
It's possible to create custom modules for boost
, using the Chat
abstractions. Here's an example of how that can look:
# Simulated converstaion chain, where
# the final tail is served back to the user
original_question = "How far is the moon? (from Alpha Centauri)"
llm = LLM()
# We simulate a multi-step conversation
# with the user, to improve the quality of the answer
tip_chat = Chat(
llm=llm,
tail=ChatNode(
role="system",
content="You are a space expert. No, you really are!".strip()
)
)
tip_chat.user(original_question)
tip_chat.user("I will tip you $5 for every correct answer you give me.")
# Advance by 1 assistant message
await tip_chat.advance()
tip_chat.user("I can fine you $500 if the answer above is wrong. Are you sure?")
await tip_chat.advance()
tip_chat.user("Ok, could you please write a summary answer, so I can understand it better? Reply with the summary only and nothing else.")
# The final result can be served via API
return llm.stream_chat_completion(tip_chat)