feat: Add local LLM improvements for reasoning models and Docker startup by csfet9 · Pull Request #88 · vectorize-io/hindsight

csfet9 · 2026-01-02T00:27:07Z

Summary

Reasoning model support: Strip thinking tags from local LLM responses, enabling Qwen3, DeepSeek, and other reasoning models
Docker retry-start script: Wait for dependencies (LLM Studio, database) before starting Hindsight

Changes

Reasoning Model Support (`llm_wrapper.py`)

Strips thinking tags from local LLM responses:

<think>...</think>
<thinking>...</thinking>
<reasoning>...</reasoning>
|startthink|...|endthink|

This enables reasoning models like Qwen3 to work with Hindsight's JSON extraction pipeline. Non-breaking change - only affects responses that contain these tags.

Docker Retry Start Script (`retry-start.sh`)

New startup script that waits for dependencies:

Checks LLM Studio at /v1/models endpoint
Checks database connectivity (skipped for embedded pg0)
Configurable retries via HINDSIGHT_RETRY_MAX (default: infinite)
Configurable interval via HINDSIGHT_RETRY_INTERVAL (default: 10s)

Prevents startup failures when LLM Studio takes time to load models.

Test plan

Tested reasoning model support with Qwen3 8B on LM Studio
Verified thinking tags are stripped correctly
Tested retry-start with LLM Studio dependency
Verified embedded pg0 detection works
Health endpoint returns healthy after startup

## Reasoning Model Support - Strip thinking tags from local LLM responses (<think>, <thinking>, <reasoning>, |startthink|/|endthink|) - Enables Qwen3, DeepSeek, and other reasoning models to work with JSON extraction - Non-breaking: only affects responses that contain thinking tags ## Docker Retry Start Script - New retry-start.sh waits for dependencies before starting Hindsight - Checks LLM Studio availability at /v1/models endpoint - Checks database connectivity (skipped for embedded pg0) - Configurable via HINDSIGHT_RETRY_MAX and HINDSIGHT_RETRY_INTERVAL env vars - Prevents startup failures when LLM Studio isn't ready yet Tested on Apple Silicon M4 Max with Qwen3 8B via LM Studio.

nicoloboschi · 2026-01-02T15:26:31Z

docker/standalone/retry-start.sh

@@ -0,0 +1,78 @@
+#!/bin/bash
+# Retry wrapper - waits for dependencies before starting hindsight


can you modify the existing file start-all (instead of creating a new one) and indicate what is the problem you are trying to solve?

nicoloboschi · 2026-01-02T15:27:15Z

hindsight-api/hindsight_api/engine/llm_wrapper.py

+                        # Strip reasoning model thinking tags (various formats)
+                        # Supports: <think>, <thinking>, <reasoning>, |startthink|/|endthink|
+                        if content:
+                            original_len = len(content)


can we use the flag in the llm call to not include thinking tokens in the output? this way the user will not pay for those output tokens and we don't have to leverage on this heuristic and problematic algorithm

…AIT_FOR_DEPS)

- Remove stale pg0 instance data after pre-caching binaries to avoid port conflicts (was using hardcoded port 5555 from build time) - Remove unused cache copy logic from start-all.sh - Add database backup instructions to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

nicoloboschi requested changes Jan 2, 2026

View reviewed changes

csfet9 and others added 4 commits January 3, 2026 01:45

refactor: make thinking token stripping opt-in via env var

e65a645

refactor: merge retry logic into start-all.sh (opt-in via HINDSIGHT_W…

4cc8e2a

…AIT_FOR_DEPS)

Merge main into feature/local-llm-improvements

c7eee98

nicoloboschi approved these changes Jan 5, 2026

View reviewed changes

nicoloboschi merged commit eea0f27 into vectorize-io:main Jan 5, 2026
16 of 23 checks passed

salmanmkc mentioned this pull request Mar 13, 2026

Upgrade GitHub Actions to latest versions #553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add local LLM improvements for reasoning models and Docker startup#88

feat: Add local LLM improvements for reasoning models and Docker startup#88
nicoloboschi merged 5 commits intovectorize-io:mainfrom
csfet9:feature/local-llm-improvements

csfet9 commented Jan 2, 2026 •

edited

Loading

Uh oh!

nicoloboschi Jan 2, 2026

Uh oh!

nicoloboschi Jan 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,78 @@
		#!/bin/bash
		# Retry wrapper - waits for dependencies before starting hindsight

Conversation

csfet9 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Reasoning Model Support (llm_wrapper.py)

Docker Retry Start Script (retry-start.sh)

Test plan

Uh oh!

nicoloboschi Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

nicoloboschi Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

csfet9 commented Jan 2, 2026 •

edited

Loading

Reasoning Model Support (`llm_wrapper.py`)

Docker Retry Start Script (`retry-start.sh`)