Ollama: Roo overrides Modelfile num_ctx by forcing options.num_ctx from modelInfo.contextWindow

Ollama: Roo overrides Modelfile num_ctx by forcing options.num_ctx from modelInfo.contextWindow

App Version
v3.27.0 (main 0ce4e891f)

API Provider
Ollama

Model Used
Any Ollama model with a Modelfile num_ctx smaller than Roo’s inferred context (e.g., llama3.1 with num_ctx 4096)

Steps to Reproduce
1) Create an Ollama model with a reduced context:
   Modelfile:
   ```
   FROM llama3.1
   PARAMETER num_ctx 4096
   ```
   Build and start: `ollama create my-small-ctx -f Modelfile` then ensure Ollama is running.
2) In Roo Code, set provider to “Ollama” and select model my-small-ctx.
3) Start a chat and send a prompt large enough to exceed the Modelfile’s num_ctx (e.g., >4k tokens).
4) Observe GPU/CPU memory usage and/or server behavior.

Outcome Summary
Roo sends options.num_ctx equal to its inferred modelInfo.contextWindow, overriding the Modelfile’s num_ctx. This can allocate a much larger context window than intended (e.g., 128k/200k), causing failures or OOM on budget GPUs.

Technical Evidence
- Roo sets num_ctx on chat requests in native Ollama handler:
  - Request creation at [src/api/providers/native-ollama.ts:186](src/api/providers/native-ollama.ts:186)
  - num_ctx assignment at [src/api/providers/native-ollama.ts:193](src/api/providers/native-ollama.ts:193)
- modelInfo.contextWindow is inferred from Ollama /api/show via [parseOllamaModel()](src/api/providers/fetchers/ollama.ts:40)
  - contextWindow assignment at [src/api/providers/fetchers/ollama.ts:47](src/api/providers/fetchers/ollama.ts:47)
  - maxTokens mirror at [src/api/providers/fetchers/ollama.ts:51](src/api/providers/fetchers/ollama.ts:51)
- Default fallback is [ollamaDefaultModelInfo](packages/types/src/providers/ollama.ts:6) with [contextWindow 200,000](packages/types/src/providers/ollama.ts:8)
- Non‑streaming path [completePrompt()](src/api/providers/native-ollama.ts:271) does not include num_ctx
- Provider selection routes “ollama” → [NativeOllamaHandler](src/api/index.ts:110)

Expected Behavior
- Roo should not force num_ctx unless the user explicitly configures it.
- By default, Ollama should honor the Modelfile’s num_ctx or server-side defaults.

Actual Behavior
- Roo overrides the Modelfile by sending options.num_ctx = modelInfo.contextWindow on each chat request, ignoring user-defined num_ctx in the Modelfile.

Proposed Fix (non‑breaking)
- Remove num_ctx from default chat options in [NativeOllamaHandler.createMessage()](src/api/providers/native-ollama.ts:193) so Ollama respects the Modelfile.
- Add an explicit optional override (e.g., ApiHandlerOptions.ollamaNumCtx?: number) in [ApiHandlerOptions](src/shared/api.ts:10); include num_ctx only when set.
- Optionally improve display-only inference in [parseOllamaModel()](src/api/providers/fetchers/ollama.ts:40) to prefer a configured num_ctx (e.g., from parameters or model_info.num_ctx), but do not enforce it in requests.

Notes
- Matches reports of unexpected large context allocation on consumer GPUs.
- Opt‑in num_ctx keeps power users’ control while respecting defaults for everyone else.

Related
- #7159
- #7160
- #7342


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ollama: Roo overrides Modelfile num_ctx by forcing options.num_ctx from modelInfo.contextWindow #7797

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ollama: Roo overrides Modelfile num_ctx by forcing options.num_ctx from modelInfo.contextWindow #7797

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions