Skip to content

download: add option to skip_download#23059

Merged
ngxson merged 5 commits into
ggml-org:masterfrom
ngxson:xsn/opt_skip_download
May 29, 2026
Merged

download: add option to skip_download#23059
ngxson merged 5 commits into
ggml-org:masterfrom
ngxson:xsn/opt_skip_download

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented May 14, 2026

Overview

Add a new flag skip_download to the common_params_handle_models function. This is a clean up for the upcoming model download / management API (cc @allozaur ). It is useful to know if a download is required before running a model.

Its meaning:

  • offline = false --> normal case, ETag is validated and if mismatch, redownload the GGUF
  • offline = false and skip_download = true --> validation will be performed, but skip download if ETag mismatch
  • offline = true --> no validation or download will be performed (also implies skip_download)

Note:

  • I wanted to expose this as a new need_download field in /v1/models, but it takes too much time to validate the ETag, so in the end I did not add it. Probably will revisit this in another PR.

Requirements

@ngxson ngxson requested a review from angt May 14, 2026 15:15
@ngxson ngxson requested review from a team as code owners May 14, 2026 15:16
@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented May 19, 2026

hey @angt , can you have a quick look at this PR? thanks!

@angt
Copy link
Copy Markdown
Member

angt commented May 19, 2026

I think skip_download should also skip the download if the file doesn't exist ?

@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented May 19, 2026

hmm yes that was my intention, file not found is the same as ETag mismatch case, right? (or maybe I missed something?)

@angt
Copy link
Copy Markdown
Member

angt commented May 20, 2026

Maybe i badly tested it 🤔

I did this:

$ git df
diff --git a/common/arg.cpp b/common/arg.cpp
index 85ef58296..61e7dfa8c 100644
--- a/common/arg.cpp
+++ b/common/arg.cpp
@@ -347,7 +347,7 @@ static handle_model_result common_params_handle_model(struct common_params_model
     common_download_opts opts;
     opts.bearer_token  = params.hf_token;
     opts.offline       = params.offline;
-    opts.skip_download = params.skip_download;
+    opts.skip_download = true; //params.skip_download;

     if (!model.docker_repo.empty()) {
         model.path = common_docker_resolve_model(model.docker_repo);
diff --git a/common/download.cpp b/common/download.cpp
index 5a5704fe1..aa50a8337 100644
--- a/common/download.cpp
+++ b/common/download.cpp
@@ -287,6 +287,10 @@ static int common_download_file_single_online(const std::string & url,
                                               const std::string & path,
                                               const common_download_opts & opts,
                                               bool skip_etag) {
+    if (opts.skip_download) {
+        LOG_WRN("SKIP DOWNLOAD\n");
+    }
+
     static const int max_attempts        = 3;
     static const int retry_delay_seconds = 2;

and got:

$ LLAMA_CACHE=nothing ./build/bin/llama-server -hf unsloth/Qwen3.5-0.8B-GGUF
0.00.104.406 W SKIP DOWNLOAD
0.00.104.481 W SKIP DOWNLOAD
Downloading mmproj-BF16.gguf ─────────────────────────────────────── 100%
Downloading Qwen3.5-0.8B-Q4_K_M.gguf ─────────────────────────────── 100%

@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented May 23, 2026

@angt I addressed the problem in ec6a687 , could you take a look? Thanks!

@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented May 29, 2026

I need to merge this to unblock some other task. @angt could you review & give the 2nd approval?

@ngxson ngxson merged commit 06d26df into ggml-org:master May 29, 2026
44 of 49 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 29, 2026
* origin/master:
vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826)
graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864)
server: remove obsolete scripts (ggml-org#23870)
ci : update macos release to use macos-26 runner (ggml-org#23878)
download: add option to skip_download (ggml-org#23059)
mtmd: Add DeepSeekOCR 2 Support (ggml-org#20975)
CUDA: Check PTX version on host side to guard PDL dispatch (ggml-org#23530)
server: bump timeout to 3600s (ggml-org#23842)
model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (ggml-org#23346)
llama: use f16 mask for FA to save VRAM (ggml-org#23764)
sync : ggml
ggml : bump version to 0.13.1 (ggml/1523)
ngram-mod : Add missing include (ggml-org#23857)
llama: add llm_graph_input_mtp (ggml-org#23643)
app : move licences to llama-app (ggml-org#23824)
cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml-org#23825)
meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear (ggml-org#23480)
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* download: add option to skip_download

* fix

* fix 2

* if file doesn't exist, respect skip_download flag
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
* download: add option to skip_download

* fix

* fix 2

* if file doesn't exist, respect skip_download flag
@ggerganov
Copy link
Copy Markdown
Member

ggerganov commented Jun 4, 2026

@ngxson This change caused a regression in the use case where we load a separate MTP draft model file. I use this command:

./bin/llama serve \
  -hf ggml-org/Qwen3.6-35B-A3B-GGUF:Q8_0 \
  --spec-type draft-mtp \
  --spec-draft-hf ggml-org/Qwen3.6-35B-A3B-GGUF \
  --spec-draft-model mtp-Qwen3.6-35B-A3B-Q4_0.gguf

After this change, something causes the mtp- model file to be downloaded two times at the same time:

image

Which later leads to corruption of the data:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants