Skip to content
Open
27 changes: 23 additions & 4 deletions docs/lemonade-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,8 @@ lemonade launch AGENT [--model MODEL_NAME] [options]
| `--model MODEL_NAME` | Model name to launch with. If omitted, you will be prompted to select one. | No |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

server log

2026-04-01 15:16:07.111 [Info] (Router) Model loaded successfully. Total loaded: 1
2026-04-01 15:16:07.111 [Info] (Server) Model loaded successfully: user.Qwen3.5-35B-A3B-NoThinking
2026-04-01 15:16:07.111 [Info] (Server) POST /api/v1/responses - Streaming
2026-04-01 15:16:07.119 [Error] (Process) srv    operator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}
2026-04-01 15:16:07.119 [Info] (Process) srv  log_server_r: done request: POST /v1/responses 127.0.0.1 500
2026-04-01 15:16:07.119 [Error] (StreamingProxy) Backend returned error: 500
2026-04-01 15:16:07.333 [Info] (Server) Model already loaded: user.Qwen3.5-35B-A3B-NoThinking
2026-04-01 15:16:07.333 [Info] (Server) POST /api/v1/responses - Streaming
2026-04-01 15:16:07.340 [Error] (Process) srv    operator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}
2026-04-01 15:16:07.340 [Info] (Process) srv  log_server_r: done request: POST /v1/responses 127.0.0.1 500
2026-04-01 15:16:07.340 [Error] (StreamingProxy) Backend returned error: 500
2026-04-01 15:16:07.770 [Info] (Server) Model already loaded: user.Qwen3.5-35B-A3B-NoThinking
2026-04-01 15:16:07.770 [Info] (Server) POST /api/v1/responses - Streaming
2026-04-01 15:16:07.776 [Error] (Process) srv    operator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}
2026-04-01 15:16:07.776 [Info] (Process) srv  log_server_r: done request: POST /v1/responses 127.0.0.1 500
2026-04-01 15:16:07.776 [Error] (StreamingProxy) Backend returned error: 500
2026-04-01 15:16:08.660 [Info] (Server) Model already loaded: user.Qwen3.5-35B-A3B-NoThinking
2026-04-01 15:16:08.660 [Info] (Server) POST /api/v1/responses - Streaming
2026-04-01 15:16:08.666 [Error] (Process) srv    operator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}
2026-04-01 15:16:08.666 [Info] (Process) srv  log_server_r: done request: POST /v1/responses 127.0.0.1 500
2026-04-01 15:16:08.666 [Error] (StreamingProxy) Backend returned error: 500
2026-04-01 15:16:10.128 [Info] (Server) Model already loaded: user.Qwen3.5-35B-A3B-NoThinking
2026-04-01 15:16:10.128 [Info] (Server) POST /api/v1/responses - Streaming
2026-04-01 15:16:10.134 [Error] (Process) srv    operator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}
2026-04-01 15:16:10.134 [Info] (Process) srv  log_server_r: done request: POST /v1/responses 127.0.0.1 500
2026-04-01 15:16:10.134 [Error] (StreamingProxy) Backend returned error: 500
2026-04-01 15:16:13.647 [Info] (Server) Model already loaded: user.Qwen3.5-35B-A3B-NoThinking
2026-04-01 15:16:13.647 [Info] (Server) POST /api/v1/responses - Streaming
2026-04-01 15:16:13.652 [Error] (Process) srv    operator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}
2026-04-01 15:16:13.652 [Info] (Process) srv  log_server_r: done request: POST /v1/responses 127.0.0.1 500
2026-04-01 15:16:13.652 [Error] (StreamingProxy) Backend returned error: 500

Codex isnt working for me, any tips?

Copy link
Copy Markdown
Collaborator Author

@sawansri sawansri Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be related to: ggml-org/llama.cpp#20733. There's a draft PR open to address this: ggml-org/llama.cpp#21174.

Looks like an upstream issue with the qwen3.5 model family. I've been testing with GLM 4.7 Flash and Nemotron 3 Nano, which probably explains why I haven't hit this issue.

Copy link
Copy Markdown
Member

@jeremyfowers jeremyfowers Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll try again with GLM. Should Qwen3.5 not be a recommended recipe for Codex then?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me with GLM! So yeah the recommended recipe list just needs to be adjusted.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me with GLM! So yeah the recommended recipe list just needs to be adjusted.

Interestingly, Qwen 3 Coder Next works fine for me, issue is mainly limited to the Qwen 3.5 family. Will remove them from the recommended list.

For those that have already downloaded Qwen 3.5 models, do you think we should add a warning for them as well?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Added warnings for codex users

| `--directory DIR` | Remote recipes directory used only if you choose recipe import at prompt | No |
| `--recipe-file FILE` | Remote recipe JSON filename used only if you choose recipe import at prompt | No |
| `--provider,-p [PROVIDER]` | Codex only: select provider name for Codex config; Lemonade does not read or modify `config.toml` (defaults to `lemonade`) | No |
| `--agent-args ARGS` | Custom arguments to pass directly to the launched agent process | `""` |
| `--ctx-size SIZE` | Context size for the model | `4096` |
| `--llamacpp BACKEND` | LlamaCpp backend to use | Auto-detected |
| `--llamacpp-args ARGS` | Custom arguments to pass to llama-server (must not conflict with managed args) | `""` |
Expand All @@ -461,22 +463,39 @@ lemonade launch AGENT [--model MODEL_NAME] [options]
- `--directory` and `--recipe-file` are only used for remote recipe import at prompt time.
- For local recipe files, run `lemonade import <LOCAL_RECIPE_JSON>` first, then launch with the imported model id.
- `--api-key` is propagated to the launched agent process.
- For `codex`, launch now injects a Lemonade model provider by default so host/port settings are honored.
- `--provider` is passed directly to Codex as `model_provider`; provider resolution/errors are handled by Codex.
- `--agent-args` is parsed and appended to the launched agent command.
- Supported agents: `claude`, `codex`

**Examples:**

```bash
# Launch an agent with default model settings
lemonade launch claude --model Qwen3-0.6B-GGUF
lemonade launch claude --model Qwen3.5-0.8B-GGUF

# Launch an agent with custom context size
lemonade launch claude --model Qwen3-0.6B-GGUF --ctx-size 8192
lemonade launch claude --model Qwen3.5-0.8B-GGUF --ctx-size 32768

# Launch an agent with a specific llama.cpp backend
lemonade launch codex --model Qwen3-0.6B-GGUF --llamacpp vulkan
lemonade launch codex --model Qwen3.5-0.8B-GGUF --llamacpp vulkan

# Launch codex using provider from your Codex config.toml (default provider: lemonade)
lemonade launch codex --model Qwen3.5-0.8B-GGUF -p

# Launch codex using a custom provider name from your Codex config.toml
lemonade launch codex --model Qwen3.5-0.8B-GGUF --provider my-provider

# Launch an agent with custom llama.cpp arguments
lemonade launch claude --model Qwen3-0.6B-GGUF --ctx-size 4096 --llamacpp-args "--flash-attn on --no-mmap"
lemonade launch claude --model Qwen3.5-0.8B-GGUF --ctx-size 32768 --llamacpp-args "--flash-attn on --no-mmap"

# Pass additional arguments directly to the agent
lemonade launch claude --model Qwen3.5-0.8B-GGUF --agent-args "--approval-mode never"

# Resume from previous session
lemonade launch codex --model Qwen3.5-0.8B-GGUF --agent-args "resume SESSION_ID"

lemonade launch claude --model Qwen3.5-0.8B-GGUF --agent-args "--resume SESSION_ID"

# Launch and allow optional prompt-driven recipe import using prefilled remote recipe flags
lemonade launch claude --directory coding-agents --recipe-file Qwen3.5-35B-A3B-NoThinking.json
Expand Down
48 changes: 40 additions & 8 deletions src/cpp/cli/agent_launcher.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,18 @@ std::string build_server_base_url(const std::string& host, int port) {
return "http://" + normalize_server_host(host) + ":" + std::to_string(port);
}

void append_codex_config_arg(std::vector<std::string>& args, const std::string& config_value) {
args.push_back("-c");
args.push_back(config_value);
}

void append_codex_config_args(std::vector<std::string>& args,
const std::vector<std::string>& config_values) {
for (const auto& config_value : config_values) {
append_codex_config_arg(args, config_value);
}
}

void configure_claude_agent(const std::string& base_url,
const std::string& model,
const std::string& api_key,
Expand Down Expand Up @@ -130,6 +142,7 @@ void configure_claude_agent(const std::string& base_url,
void configure_codex_agent(const std::string& base_url,
const std::string& model,
const std::string& api_key,
const AgentLaunchOptions& launch_options,
AgentConfig& config) {
const std::string resolved_api_key = api_key.empty() ? kDefaultAgentApiKey : api_key;

Expand All @@ -146,17 +159,35 @@ void configure_codex_agent(const std::string& base_url,
add_windows_npm_fallbacks(config.fallback_paths, "codex");

config.env_vars = {
{"OPENAI_BASE_URL", base_url + "/v1/"},
{"OPENAI_API_KEY", resolved_api_key},
{"LEMONADE_API_KEY", resolved_api_key}
};
config.extra_args = {
"--oss",
"-m",
model,
"--config",
"web_search=\"disabled\""

const std::string responses_base_url = base_url + "/v1";
const std::string provider_name = launch_options.codex_model_provider.empty()
? "lemonade"
: launch_options.codex_model_provider;


std::vector<std::string> codex_config_values = {
"model_provider=\"" + provider_name + "\"",
"show_raw_agent_reasoning=true",
"web_search=\"disabled\"",
"analytics.enabled=false",
"feedback.enabled=false"
};

if (!launch_options.codex_use_user_config) {
codex_config_values.insert(codex_config_values.begin(),
"model_providers." + provider_name + "={ name='Lemonade', base_url='" + responses_base_url +
"', wire_api='responses', env_key='OPENAI_API_KEY', requires_openai_auth=false, supports_websockets=false }");
}

config.extra_args = {};
append_codex_config_args(config.extra_args, codex_config_values);
config.extra_args.push_back("-m");
config.extra_args.push_back(model);

config.install_instructions = "Install Codex CLI and ensure 'codex' is on PATH.";
}

Expand All @@ -167,6 +198,7 @@ bool build_agent_config(const std::string& agent,
int port,
const std::string& model,
const std::string& api_key,
const AgentLaunchOptions& launch_options,
AgentConfig& config,
std::string& error_message) {
const std::string base = build_server_base_url(host, port);
Expand All @@ -177,7 +209,7 @@ bool build_agent_config(const std::string& agent,
}

if (agent == "codex") {
configure_codex_agent(base, model, api_key, config);
configure_codex_agent(base, model, api_key, launch_options, config);
return true;
}

Expand Down
34 changes: 32 additions & 2 deletions src/cpp/cli/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include <lemon/utils/process_manager.h>
#include <lemon/utils/path_utils.h>
#include <lemon/utils/network_beacon.h>
#include <lemon/utils/custom_args.h>
#include <CLI/CLI.hpp>
#include <iostream>
#include <string>
Expand Down Expand Up @@ -36,7 +37,6 @@

#include "lemon/utils/aixlog.hpp"


static const std::vector<std::string> VALID_LABELS = {
"coding",
"embeddings",
Expand Down Expand Up @@ -107,6 +107,9 @@ struct CliConfig {
bool yes = false;
int scan_duration = 30;
bool json_output = false;
bool codex_use_user_config = false;
std::string codex_model_provider = "lemonade";
std::string agent_args;
};

// Open a URL via the OS without invoking a shell (avoids shell injection).
Expand Down Expand Up @@ -335,11 +338,22 @@ static int handle_launch_command(lemonade::LemonadeClient& client, CliConfig& co
}

lemon_tray::AgentConfig agent_config;
lemon_tray::AgentLaunchOptions launch_options;
std::string config_error;

if (config.codex_use_user_config) {
if (config.agent != "codex") {
LOG(ERROR, "AgentBuilder") << "--provider is only supported for the codex agent." << std::endl;
return 1;
}
}

launch_options.codex_use_user_config = config.codex_use_user_config;
launch_options.codex_model_provider = config.codex_model_provider;

// Build agent config
if (!lemon_tray::build_agent_config(config.agent, config.host, config.port, config.model,
config.api_key,
config.api_key, launch_options,
agent_config, config_error)) {
LOG(ERROR, "AgentBuilder") << "Failed to build agent config: " << config_error << std::endl;
return 1;
Expand All @@ -351,6 +365,11 @@ static int handle_launch_command(lemonade::LemonadeClient& client, CliConfig& co
std::cout << "Launch auth: API key provided and propagated to the launched agent." << std::endl;
}

if (!config.agent_args.empty()) {
std::vector<std::string> user_args = lemon::utils::parse_custom_args(config.agent_args);
agent_config.extra_args.insert(agent_config.extra_args.end(), user_args.begin(), user_args.end());
}

// Find agent binary
const std::string agent_binary = lemon_tray::find_agent_binary(agent_config);
if (agent_binary.empty()) {
Expand Down Expand Up @@ -884,6 +903,7 @@ int main(int argc, char* argv[]) {
export_cmd->add_option("--output", config.output_file, "Output file path (prints to stdout if not specified)")->type_name("PATH");

// Launch options
CLI::Option* provider_opt = nullptr;
launch_cmd->add_option("agent", config.agent, "Agent name to launch")
->type_name("AGENT")
->check(CLI::IsMember(SUPPORTED_AGENTS));
Expand All @@ -893,13 +913,23 @@ int main(int argc, char* argv[]) {
->type_name("DIR");
launch_cmd->add_option("--recipe-file", config.recipe_file,
"Remote recipe JSON filename used only if you choose recipe import at prompt")->type_name("FILE");
provider_opt = launch_cmd->add_option("--provider,-p", config.codex_model_provider,
"Use model provider name for Codex instead of Lemonade-injected provider definition")
->type_name("PROVIDER")
->default_val(config.codex_model_provider)
->expected(0, 1);
launch_cmd->add_option("--agent-args", config.agent_args,
"Custom arguments to pass directly to the launched agent process")
->type_name("ARGS")
->default_val(config.agent_args);
lemon::RecipeOptions::add_cli_options(*launch_cmd, config.recipe_options);

// Scan options
scan_cmd->add_option("--duration", config.scan_duration, "Scan duration in seconds")->default_val(config.scan_duration)->type_name("SECONDS");

// Parse arguments
CLI11_PARSE(app, argc, argv);
config.codex_use_user_config = (provider_opt != nullptr && provider_opt->count() > 0);

// Auto-discover local server via UDP beacon if the default connection fails
// Skip when: no command given, scan command, or user explicitly set --host/--port
Expand Down
53 changes: 49 additions & 4 deletions src/cpp/cli/model_selection.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,26 @@ std::string normalize_agent_key(const std::string& agent_name) {
return key;
}

bool starts_with_case_insensitive(const std::string& value, const std::string& prefix) {
if (prefix.size() > value.size()) {
return false;
}

for (size_t i = 0; i < prefix.size(); ++i) {
const char lhs = static_cast<char>(std::tolower(static_cast<unsigned char>(value[i])));
const char rhs = static_cast<char>(std::tolower(static_cast<unsigned char>(prefix[i])));
if (lhs != rhs) {
return false;
}
}

return true;
}

bool is_qwen35_family_model(const lemonade::ModelInfo& model) {
return starts_with_case_insensitive(model.id, "Qwen3.5");
}

std::vector<std::string> preferred_recipe_directories_for_agent(const std::string& agent_name) {
const std::string agent = normalize_agent_key(agent_name);
if (agent == "claude" || agent == "codex") {
Expand Down Expand Up @@ -144,13 +164,21 @@ bool prompt_model_name_input(std::string& model_out) {
}

std::vector<const lemonade::ModelInfo*> filter_recommended_launch_models(
const std::vector<lemonade::ModelInfo>& models) {
const std::vector<lemonade::ModelInfo>& models,
const std::string& agent_name) {
std::vector<const lemonade::ModelInfo*> filtered;
filtered.reserve(models.size());
const bool exclude_qwen35_for_codex = normalize_agent_key(agent_name) == "codex";

for (const auto& model : models) {
if (is_recommended_for_launch(model)) {
filtered.push_back(&model);
if (!is_recommended_for_launch(model)) {
continue;
}
if (exclude_qwen35_for_codex && is_qwen35_family_model(model)) {
continue;
}

filtered.push_back(&model);
}
return filtered;
}
Expand All @@ -167,6 +195,7 @@ bool prompt_launch_recipe_first(lemonade::LemonadeClient& client,

MenuState state = MenuState::RecipeDirectories;
std::string selected_recipe_dir;
const bool is_codex_agent = normalize_agent_key(agent_name) == "codex";
bool use_preferred_recipe_dir = false;
std::string preferred_recipe_dir;
bool remote_dirs_loaded = false;
Expand Down Expand Up @@ -296,6 +325,14 @@ bool prompt_launch_recipe_first(lemonade::LemonadeClient& client,
<< "' to import and use:" << std::endl;
}

if (is_codex_agent) {
std::cout
<< "\nWarning: Qwen 3.5 family models currently do not work with Codex due to "
<< "a llama.cpp incompatibility. Track upstream: "
<< "https://github.com/ggml-org/llama.cpp/issues/20733\n"
<< std::endl;
}

if (in_preferred_recipe_dir) {
std::cout << " 0) Browse downloaded models" << std::endl;
} else {
Expand Down Expand Up @@ -371,6 +408,14 @@ bool prompt_launch_recipe_first(lemonade::LemonadeClient& client,
}
}

if (is_codex_agent) {
std::cout
<< "\nWarning: Qwen 3.5 family models currently do not work with Codex due to "
<< "a llama.cpp incompatibility. Track upstream: "
<< "https://github.com/ggml-org/llama.cpp/issues/20733\n"
<< std::endl;
}

std::cout << "Browse downloaded llamacpp models:" << std::endl;
std::cout << " 0) Browse recommended models (download may be required)" << std::endl;
for (size_t i = 0; i < downloaded_llamacpp_models.size(); ++i) {
Expand Down Expand Up @@ -422,7 +467,7 @@ bool prompt_launch_recipe_first(lemonade::LemonadeClient& client,
}

std::vector<const lemonade::ModelInfo*> recommended_all =
filter_recommended_launch_models(all_models);
filter_recommended_launch_models(all_models, agent_name);

std::cout << "Browse recommended models (llamacpp + hot + tool-calling):" << std::endl;
std::cout << " 0) Back to downloaded models" << std::endl;
Expand Down
6 changes: 6 additions & 0 deletions src/cpp/include/lemon_cli/agent_launcher.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,19 @@ struct AgentConfig {
std::string install_instructions;
};

struct AgentLaunchOptions {
bool codex_use_user_config = false;
std::string codex_model_provider = "lemonade";
};

// Build launcher configuration for a supported agent.
// Returns true on success, false if agent is unknown.
bool build_agent_config(const std::string& agent,
const std::string& host,
int port,
const std::string& model,
const std::string& api_key,
const AgentLaunchOptions& launch_options,
AgentConfig& config,
std::string& error_message);

Expand Down
Loading
Loading