feat: Add local LLM improvements for reasoning models and Docker startup#88
Merged
nicoloboschi merged 5 commits intovectorize-io:mainfrom Jan 5, 2026
Merged
Conversation
## Reasoning Model Support - Strip thinking tags from local LLM responses (<think>, <thinking>, <reasoning>, |startthink|/|endthink|) - Enables Qwen3, DeepSeek, and other reasoning models to work with JSON extraction - Non-breaking: only affects responses that contain thinking tags ## Docker Retry Start Script - New retry-start.sh waits for dependencies before starting Hindsight - Checks LLM Studio availability at /v1/models endpoint - Checks database connectivity (skipped for embedded pg0) - Configurable via HINDSIGHT_RETRY_MAX and HINDSIGHT_RETRY_INTERVAL env vars - Prevents startup failures when LLM Studio isn't ready yet Tested on Apple Silicon M4 Max with Qwen3 8B via LM Studio.
nicoloboschi
requested changes
Jan 2, 2026
docker/standalone/retry-start.sh
Outdated
| @@ -0,0 +1,78 @@ | |||
| #!/bin/bash | |||
| # Retry wrapper - waits for dependencies before starting hindsight | |||
Collaborator
There was a problem hiding this comment.
can you modify the existing file start-all (instead of creating a new one) and indicate what is the problem you are trying to solve?
| # Strip reasoning model thinking tags (various formats) | ||
| # Supports: <think>, <thinking>, <reasoning>, |startthink|/|endthink| | ||
| if content: | ||
| original_len = len(content) |
Collaborator
There was a problem hiding this comment.
can we use the flag in the llm call to not include thinking tokens in the output? this way the user will not pay for those output tokens and we don't have to leverage on this heuristic and problematic algorithm
- Remove stale pg0 instance data after pre-caching binaries to avoid port conflicts (was using hardcoded port 5555 from build time) - Remove unused cache copy logic from start-all.sh - Add database backup instructions to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
nicoloboschi
approved these changes
Jan 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Reasoning Model Support (
llm_wrapper.py)Strips thinking tags from local LLM responses:
<think>...</think><thinking>...</thinking><reasoning>...</reasoning>|startthink|...|endthink|This enables reasoning models like Qwen3 to work with Hindsight's JSON extraction pipeline. Non-breaking change - only affects responses that contain these tags.
Docker Retry Start Script (
retry-start.sh)New startup script that waits for dependencies:
/v1/modelsendpointHINDSIGHT_RETRY_MAX(default: infinite)HINDSIGHT_RETRY_INTERVAL(default: 10s)Prevents startup failures when LLM Studio takes time to load models.
Test plan