Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
9f91446
Local models
Feb 3, 2026
10011e7
Update crates/goose-server/src/routes/local_inference.rs
DOsinga Feb 4, 2026
3a06f66
fix
Feb 4, 2026
b66e38f
fmt
Feb 4, 2026
c6af773
Merge branch 'local-models-candle' of https://github.com/block/goose …
Feb 4, 2026
193bb0f
WIP: local inference debugging - tokenization fix
Feb 5, 2026
51c73b2
Merge origin/main to get candle 0.9
Feb 5, 2026
e8bfddf
Make streaming work. tiny model is kinda broken
Feb 6, 2026
848b8b0
Simplify?
Feb 6, 2026
94f1203
Only show the "download models" bubble if none have been downloaded
jh-block Feb 6, 2026
293def7
Handle EOS during streaming using all EOS tokens from the template
jh-block Feb 6, 2026
a62b0d8
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 9, 2026
d4f8a20
Update fetch_supported_models for new trait method signature
jh-block Feb 9, 2026
2477a18
Fix duplicated output in Code Mode by filtering content by audience
jh-block Feb 10, 2026
bf3193d
Switched local inference to use llama.cpp and added HF model download
jh-block Feb 10, 2026
1aee23e
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 10, 2026
821402c
merge fixes
jh-block Feb 10, 2026
c702360
Generate session names for local inference conversations
jh-block Feb 12, 2026
7f7a449
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 12, 2026
0327007
fmt
jh-block Feb 12, 2026
328084c
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 12, 2026
fb35c6d
Fix for API changes from main
jh-block Feb 12, 2026
ac24160
feat(local-inference): UI improvements for featured models (#7179)
spencrmartin Feb 13, 2026
90477d2
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 13, 2026
efb2582
api autogen
jh-block Feb 13, 2026
ccc594d
Add support for models that use Jinja chat templates (e.g. GLM-4.7)
jh-block Feb 13, 2026
8c1e343
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 13, 2026
3fbc494
Improve GLM4.7 tool call parsing, and record local "LLM requests" diag
jh-block Feb 13, 2026
81d8d91
fix: updating to main and starter guard screen (#7241)
michaelneale Feb 16, 2026
6687f66
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 18, 2026
459e2f5
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 18, 2026
a391529
Revert "feat(local-inference): UI improvements for featured models (#…
jh-block Feb 18, 2026
0c42188
Improve local inference settings UI
jh-block Feb 18, 2026
235f7d4
Simplify local model selection by deriving state from context
jh-block Feb 18, 2026
8d952b1
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 18, 2026
49c6882
refactor: extract shared helpers from stream() in local_inference
jh-block Feb 18, 2026
67968b6
Fix effective_context_size using max instead of min to cap context
jh-block Feb 18, 2026
bdb3a58
Replace fragile hand-rolled strip_xml_tags with regex-based implement…
jh-block Feb 18, 2026
6c9398a
Remove extra blank line in model.rs
jh-block Feb 18, 2026
ad5e125
Replace hardcoded tool names with constants in local_inference.rs
jh-block Feb 18, 2026
a46ad2e
Add unit and integration tests for local inference provider
jh-block Feb 18, 2026
876160b
Refactor quant_info() to use a static data table instead of verbose m…
jh-block Feb 18, 2026
0ce8e61
Refactor: extract emulator and native tool paths from stream()
jh-block Feb 18, 2026
276f8b2
Extract tool_parsing submodule from local_inference
jh-block Feb 18, 2026
2662dc6
Extract inference_context submodule from local_inference
jh-block Feb 18, 2026
aeff502
Extract emulator submodule from local_inference
jh-block Feb 18, 2026
3d95500
Extract native_path submodule and emulator tool descriptions
jh-block Feb 18, 2026
39dfe47
removed accidentally added file
jh-block Feb 18, 2026
bdebd61
Use prompt for no tools (#7349)
DOsinga Feb 19, 2026
b453822
feat: improve download progress display (#7351)
jh-block Feb 19, 2026
2ddc657
Merge remote-tracking branch 'origin/main' into local-models-candle
jh-block Feb 19, 2026
18e86c7
Changes from Pi's code review
jh-block Feb 19, 2026
c06d8ac
Consolidate duplicate context-size logic into shared context_cap func…
jh-block Feb 19, 2026
d4d7b70
Use FEATURED_MODELS constant in metadata() instead of duplicating the…
jh-block Feb 19, 2026
6d61d37
Use atomic writes and advisory file locks for model registry
jh-block Feb 19, 2026
a476834
Use blocking_lock() instead of block_on(lock()) in generation path
jh-block Feb 19, 2026
296290a
Move download_manager to top-level module shared by dictation and loc…
jh-block Feb 19, 2026
d9cc3c4
rustfmt
jh-block Feb 19, 2026
9a650b7
Move config setting out of DownloadManager into caller
jh-block Feb 19, 2026
c2ca822
Remove unused GOOSE_CONTEXT_SIZE env var from subprocess setup
jh-block Feb 19, 2026
1d76d99
Clarify MOIM context limit check with named constant and comment
jh-block Feb 19, 2026
5f84d04
Remove unused small_model field from SystemPromptContext
jh-block Feb 19, 2026
d19ff34
Use string literal .len() for hold-back constants
jh-block Feb 19, 2026
2678765
Fix TypeScript errors in LocalModelSetup and McpAppRenderer
jh-block Feb 19, 2026
9f62853
Fix MOIM context limit threshold and model list test
jh-block Feb 19, 2026
634ee59
cargo fmt
jh-block Feb 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

205 changes: 205 additions & 0 deletions crates/goose-cli/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -854,6 +854,13 @@ enum Command {
#[command(subcommand)]
command: TermCommand,
},
/// Manage local inference models
#[command(about = "Manage local inference models", visible_alias = "lm")]
LocalModels {
#[command(subcommand)]
command: LocalModelsCommand,
},

/// Generate completions for various shells
#[command(about = "Generate the autocompletion script for the specified shell")]
Completion {
Expand All @@ -875,6 +882,38 @@ enum Command {
},
}

#[derive(Subcommand)]
enum LocalModelsCommand {
/// Search HuggingFace for GGUF models
#[command(about = "Search HuggingFace for GGUF models")]
Search {
/// Search query
query: String,

/// Maximum number of results
#[arg(short, long, default_value = "10")]
limit: usize,
},

/// Download a model from HuggingFace
#[command(about = "Download a GGUF model (e.g. bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M)")]
Download {
/// Model spec in user/repo:quantization format
spec: String,
},

/// List downloaded local models
#[command(about = "List downloaded local models")]
List,

/// Delete a downloaded model
#[command(about = "Delete a downloaded local model")]
Delete {
/// Model ID to delete
id: String,
},
}

#[derive(Subcommand)]
enum TermCommand {
/// Print shell initialization script
Expand Down Expand Up @@ -964,6 +1003,7 @@ fn get_command_name(command: &Option<Command>) -> &'static str {
Some(Command::Recipe { .. }) => "recipe",
Some(Command::Web { .. }) => "web",
Some(Command::Term { .. }) => "term",
Some(Command::LocalModels { .. }) => "local-models",
Some(Command::Completion { .. }) => "completion",
Some(Command::ValidateExtensions { .. }) => "validate-extensions",
None => "default_session",
Expand Down Expand Up @@ -1400,6 +1440,170 @@ async fn handle_term_subcommand(command: TermCommand) -> Result<()> {
}
}

async fn handle_local_models_command(command: LocalModelsCommand) -> Result<()> {
use goose::providers::local_inference::hf_models;
use goose::providers::local_inference::local_model_registry::{
display_name_from_repo, get_registry, model_id_from_repo, LocalModelEntry,
};

match command {
LocalModelsCommand::Search { query, limit } => {
println!("Searching HuggingFace for '{}'...", query);
let results = hf_models::search_gguf_models(&query, limit).await?;

if results.is_empty() {
println!("No GGUF models found.");
return Ok(());
}

for model in &results {
println!(
"\n{} (by {}) — {} downloads",
model.model_name, model.author, model.downloads
);
for file in &model.gguf_files {
let size = if file.size_bytes > 0 {
format!(
"{:.1}GB",
file.size_bytes as f64 / (1024.0 * 1024.0 * 1024.0)
)
} else {
"unknown".to_string()
};
println!(" {} — {}", file.quantization, size);
}
println!(
" Download: goose local-models download {}:<quantization>",
model.repo_id
);
}
}
LocalModelsCommand::Download { spec } => {
println!("Resolving {}...", spec);
let (repo_id, file) = hf_models::resolve_model_spec(&spec).await?;
let model_id = model_id_from_repo(&repo_id, &file.quantization);
let display_name = display_name_from_repo(&repo_id, &file.quantization);
let local_path =
goose::config::paths::Paths::in_data_dir("models").join(&file.filename);

println!(
"Downloading {} ({})...",
display_name,
if file.size_bytes > 0 {
format!(
"{:.1}GB",
file.size_bytes as f64 / (1024.0 * 1024.0 * 1024.0)
)
} else {
"unknown size".to_string()
}
);

// Register
let entry = LocalModelEntry {
id: model_id.clone(),
display_name: display_name.clone(),
repo_id: repo_id.clone(),
filename: file.filename.clone(),
quantization: file.quantization.clone(),
local_path: local_path.clone(),
source_url: file.download_url.clone(),
settings: Default::default(),
size_bytes: file.size_bytes,
};

{
let mut registry = get_registry()
.lock()
.map_err(|_| anyhow::anyhow!("Failed to acquire registry lock"))?;
registry.add_model(entry)?;
}

// Download
let manager = goose::download_manager::get_download_manager();
manager
.download_model(
format!("{}-model", model_id),
file.download_url,
local_path,
None,
)
.await?;

// Poll progress
loop {
if let Some(progress) = manager.get_progress(&format!("{}-model", model_id)) {
match progress.status {
goose::download_manager::DownloadStatus::Downloading => {
print!(
"\r {:.1}% ({:.0}MB / {:.0}MB)",
progress.progress_percent,
progress.bytes_downloaded as f64 / (1024.0 * 1024.0),
progress.total_bytes as f64 / (1024.0 * 1024.0),
);
use std::io::Write;
std::io::stdout().flush().ok();
}
goose::download_manager::DownloadStatus::Completed => {
println!("\nDownloaded: {} (id: {})", display_name, model_id);
break;
}
goose::download_manager::DownloadStatus::Failed => {
let err = progress.error.unwrap_or_default();
anyhow::bail!("Download failed: {}", err);
}
goose::download_manager::DownloadStatus::Cancelled => {
println!("\nDownload cancelled.");
break;
}
}
}
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
}
}
LocalModelsCommand::List => {
let registry = get_registry()
.lock()
.map_err(|_| anyhow::anyhow!("Failed to acquire registry lock"))?;
let models = registry.list_models();

if models.is_empty() {
println!("No local models downloaded.");
return Ok(());
}

println!("{:<40} {:<20} {:<10} Downloaded", "ID", "Name", "Quant");
println!("{}", "-".repeat(80));
for m in models {
println!(
"{:<40} {:<20} {:<10} {}",
m.id,
m.display_name,
m.quantization,
if m.is_downloaded() { "✓" } else { "✗" }
);
}
}
LocalModelsCommand::Delete { id } => {
let mut registry = get_registry()
.lock()
.map_err(|_| anyhow::anyhow!("Failed to acquire registry lock"))?;

if let Some(entry) = registry.get_model(&id) {
if entry.local_path.exists() {
std::fs::remove_file(&entry.local_path)?;
}
registry.remove_model(&id)?;
println!("Deleted model: {}", id);
} else {
println!("Model not found: {}", id);
}
}
}

Ok(())
}

async fn handle_default_session() -> Result<()> {
if !Config::global().exists() {
return handle_configure().await;
Expand Down Expand Up @@ -1530,6 +1734,7 @@ pub async fn cli() -> anyhow::Result<()> {
no_auth,
}) => crate::commands::web::handle_web(port, host, open, auth_token, no_auth).await,
Some(Command::Term { command }) => handle_term_subcommand(command).await,
Some(Command::LocalModels { command }) => handle_local_models_command(command).await,
Some(Command::ValidateExtensions { file }) => {
use goose::agents::validate_extensions::validate_bundled_extensions;
match validate_bundled_extensions(&file) {
Expand Down
20 changes: 19 additions & 1 deletion crates/goose-server/src/openapi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use goose::agents::ExtensionConfig;
use goose::config::permission::PermissionLevel;
use goose::config::ExtensionEntry;
use goose::conversation::Conversation;
use goose::dictation::download_manager::{DownloadProgress, DownloadStatus};
use goose::download_manager::{DownloadProgress, DownloadStatus};
use goose::model::ModelConfig;
use goose::permission::permission_confirmation::{Permission, PrincipalType};
use goose::providers::base::{ConfigKey, ModelInfo, ProviderMetadata, ProviderType};
Expand Down Expand Up @@ -424,6 +424,15 @@ derive_utoipa!(Icon as IconSchema);
super::routes::dictation::get_download_progress,
super::routes::dictation::cancel_download,
super::routes::dictation::delete_model,
super::routes::local_inference::list_local_models,
super::routes::local_inference::search_hf_models,
super::routes::local_inference::get_repo_files,
super::routes::local_inference::download_hf_model,
super::routes::local_inference::get_local_model_download_progress,
super::routes::local_inference::cancel_local_model_download,
super::routes::local_inference::delete_local_model,
super::routes::local_inference::get_model_settings,
super::routes::local_inference::update_model_settings,
),
components(schemas(
super::routes::config_management::UpsertConfigQuery,
Expand Down Expand Up @@ -592,6 +601,15 @@ derive_utoipa!(Icon as IconSchema);
goose::dictation::providers::DictationProvider,
super::routes::dictation::DictationProviderStatus,
super::routes::dictation::WhisperModelResponse,
super::routes::local_inference::LocalModelResponse,
super::routes::local_inference::ModelDownloadStatus,
super::routes::local_inference::DownloadModelRequest,
goose::providers::local_inference::hf_models::HfModelInfo,
goose::providers::local_inference::hf_models::HfGgufFile,
goose::providers::local_inference::hf_models::HfQuantVariant,
super::routes::local_inference::RepoVariantsResponse,
goose::providers::local_inference::local_model_registry::ModelSettings,
goose::providers::local_inference::local_model_registry::SamplingConfig,
DownloadProgress,
DownloadStatus,
))
Expand Down
Loading
Loading