Skip to content

Feature/LLM-Providers-UI-Polished#736

Merged
coleam00 merged 31 commits intocoleam00:mainfrom
Chillbruhhh:feature/LLM-Providers-Ui-Polished
Oct 5, 2025
Merged

Feature/LLM-Providers-UI-Polished#736
coleam00 merged 31 commits intocoleam00:mainfrom
Chillbruhhh:feature/LLM-Providers-Ui-Polished

Conversation

@Chillbruhhh
Copy link
Copy Markdown

@Chillbruhhh Chillbruhhh commented Sep 25, 2025

Pull Request

Summary

Changes Made

enhanced the ui in rag settings by updating an polishing the LLm provider section, split it and added embedding provider while enabling full cross provider selection for chat and embedding models, including local x remote api providers.

fixed some minor issues involving provider not updating when api key is added/changed or deleted.
fixed a progress issue i noticed when crawling
verified credentials are initialized on startup.

Screenshot 2025-09-25 064028

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance improvement

Affected Services

  • Frontend (React UI)
  • Server (FastAPI backend)

Testing

  • All existing tests pass
  • Manually tested affected user flows
  • Docker builds succeed for all services

Test Evidence

# Example: python -m pytest tests/
# Example: cd archon-ui-main && npm run test

Checklist

  • My code follows the service architecture patterns
  • If using an AI coding assistant, I used the CLAUDE.md rules
  • All new and existing tests pass locally
  • My changes generate no new warnings
  • I have verified no regressions in existing features

Breaking Changes

Additional Notes

Screenshot 2025-09-25 080436 Screenshot 2025-09-25 080503 Screenshot 2025-09-25 080448 Screenshot 2025-09-25 080431

Summary by CodeRabbit

  • New Features

    • Per-context Chat vs Embedding provider and model selection UI with provider health, metrics, Ollama tooling, and new embedding provider adapters (including Google).
  • Bug Fixes

    • Improved progress reporting during extraction and more reliable crawl cancellation via async orchestration; cancellation and lifecycle handling tightened.
  • Chores

    • Added react-icons dependency; credential changes now emit update notifications to refresh UI.
  • Migrations

    • Added idempotent provider API key placeholders.
  • Tests

    • Tightened assertions for storage call argument expectations.

Chillbruhhh and others added 18 commits September 19, 2025 15:02
- Add requires_max_completion_tokens() function for GPT-5, o1, o3, Grok-3 series
- Add prepare_chat_completion_params() for reasoning model compatibility
- Implement max_tokens → max_completion_tokens conversion for reasoning models
- Add temperature handling for reasoning models (must be 1.0 default)
- Enhanced provider validation and API key security in provider endpoints
- Streamlined retry logic (3→2 attempts) for faster issue detection
- Add failure tracking and circuit breaker analysis for debugging
- Support OpenRouter format detection (openai/gpt-5-nano, openai/o1-mini)
- Improved Grok provider empty response handling with structured fallbacks
- Enhanced contextual embedding with provider-aware model selection

Core provider functionality:
- OpenRouter, Grok, Anthropic provider support with full embedding integration
- Provider-specific model defaults and validation
- Secure API connectivity testing endpoints
- Provider context passing for code generation workflows

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…, need to fix health check, asnyico tasks errors, and contextual embeddings error
…es the underlying AsyncOpenAI / httpx.AsyncClient while the loop is still alive, with defensive logging if shutdown happens late (python/src/server/services/llm_provider_service.py:14, python/src/server/ services/llm_provider_service.py:520).
…eparate try/finally blocks; fallback clients now close without logging spurious Error creating LLM client when downstream code raises (python/src/server/services/llm_provider_service.py:335-556). - Close logic now sanitizes provider names consistently and awaits whichever aclose/close coroutine the SDK exposes, keeping the loop shut down cleanly (python/src/server/services/llm_provider_service.py:530-559). Robust JSON Parsing - Added _extract_json_payload to strip code fences / extra text returned by Ollama before json.loads runs, averting the markdown-induced decode errors you saw in logs (python/src/server/services/storage/code_storage_service.py:40-63). - Swapped the direct parse call for the sanitized payload and emit a debug preview when cleanup alters the content (python/src/server/ services/storage/code_storage_service.py:858-864).
…k to Ollama (matching existing tests) and so unsupported providers still raise the legacy ValueError the suite expects. The fallback now reuses _get_optimal_ollama_instance and rethrows ValueError(OpenAI API key not found and Ollama fallback failed) when it cant connect. Adjusted test_code_extraction_source_id.py to accept the new optional argument on the mocked extractor (and confirm its None when present).
…API key validation to call create_embedding with the provider argument and removed the hard-coded OpenAI fallback (python/src/server/api_routes/knowledge_api.py). - Broadened embedding provider detection so prefixed OpenRouter/OpenAI model names route through the correct client (python/src/server/ services/embeddings/embedding_service.py, python/src/server/services/llm_provider_service.py). - Removed the duplicate helper definitions from llm_provider_service.py, eliminating the stray docstring that was causing the import-time syntax error.
…ues and no nitpicks with the updates! what was done: Credential service now persists the provider under the uppercase key LLM_PROVIDER, matching the read path (no new EMBEDDING_PROVIDER usage introduced). Embedding batch creation stops inserting blank strings, logging failures and skipping invalid items before they ever hit the provider (python/src/server/services/embeddings/embedding_service.py). Contextual embedding prompts use real newline characters everywhereboth when constructing the batch prompt and when parsing the models response (python/src/server/services/embeddings/contextual_embedding_service.py). Embedding provider routing already recognizes OpenRouter-prefixed OpenAI models via is_openai_embedding_model; no further change needed there. Embedding insertion now skips unsupported vector dimensions instead of forcing them into the 1536-column, and the backoff loop uses await asyncio.sleep so we no longer block the event loop (python/src/server/services/storage/code_storage_service.py). RAG settings props were extended to include LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME, and the debug log no longer prints API-key prefixes (the rest of the TanStack refactor/EMBEDDING_PROVIDER support remains deferred).
…models and parse regardless of json output or not. this commit creates a robust way for archons parsing to work throughly with openrouter automatically, regardless of the model youre using, to ensure proper functionality with out breaking any generation capabilities!
… system fully capabale of mix and matching llm providers (local and non local) for chat & embeddings. updated the ragsettings.tsx ui mainly, along with core functionality
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 25, 2025

Walkthrough

Adds explicit embedding-provider support across frontend and backend: new dependency, per-context chat vs embedding provider/model UI and credential notifications; adapter-based embedding providers (OpenAI/Google); async, lock-guarded crawling orchestration; and threading embedding_provider through extraction, storage, and progress reporting.

Changes

Cohort / File(s) Summary
Frontend: dependency
archon-ui-main/package.json
Added react-icons@^5.5.0.
Frontend: RAG settings & creds
archon-ui-main/src/components/settings/RAGSettings.tsx, archon-ui-main/src/services/credentialsService.ts
Add EMBEDDING_PROVIDER to RAG settings and credentials mapping; per-context chat vs embedding provider/model state; Ollama URL normalization, health/metrics, context-aware UI/actions; add notifyCredentialUpdate/archon:credentials-updated; default EMBEDDING_PROVIDER: "openai".
Backend: embedding provider adapters
python/src/server/services/embeddings/embedding_service.py
Add adapter abstraction plus OpenAI-compatible and Google adapters; normalize awaiting; route batch embedding creation through adapters; preserve existing embedding error types.
Backend: storage & extraction plumbing
python/src/server/services/storage/code_storage_service.py, python/src/server/services/crawling/code_extraction_service.py, python/src/server/services/crawling/document_storage_operations.py
Add optional embedding_provider parameter and thread it through document storage → extraction → storage → embedding creation; refine progress scaling/clamping and block-completion heuristics in extraction.
Backend: crawling orchestration (async + lock)
python/src/server/services/crawling/crawling_service.py, python/src/server/api_routes/knowledge_api.py
Make orchestration registry APIs async and lock-guarded; await register/get_active/unregister across call sites; propagate embedding_provider into crawl flows; stop_crawl_task awaits orchestration lookup/unregister.
Backend: credential resolution
python/src/server/services/credential_service.py
For service_type == "embedding" prefer explicit EMBEDDING_PROVIDER (validated against embedding-capable providers) and fallback to "openai" with logging; removed prior auto-detection helper.
Backend: tests
python/tests/test_code_extraction_source_id.py
Tightened mocked-call assertions to expect an extra trailing None argument (call arity change).
Migrations
migration/0.1.0/009_add_provider_placeholders.sql, migration/complete_setup.sql
Add provider API key placeholders (OPENROUTER_API_KEY, ANTHROPIC_API_KEY, GROK_API_KEY) and adjust GOOGLE_API_KEY entry; idempotent inserts via ON CONFLICT DO NOTHING.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant UI as RAGSettings (FE)
  participant CredFE as CredentialsService (FE)
  participant API as Backend API
  participant CredBE as CredentialService (BE)
  participant Crawl as CrawlingService
  participant Extract as CodeExtractionService
  participant Store as CodeStorageService
  participant EmbSvc as EmbeddingService
  participant Adapter as ProviderAdapter
  participant Provider as External Provider

  UI->>CredFE: Save EMBEDDING_PROVIDER + models
  CredFE-->>UI: Dispatch archon:credentials-updated
  UI->>API: Start crawl/extract (may include embedding_provider)
  API->>Crawl: orchestrate crawl
  Crawl->>CredBE: await get_active_provider("embedding")
  CredBE-->>Crawl: EMBEDDING_PROVIDER (explicit or default)
  Crawl->>Extract: extract_and_store_code_examples(..., embedding_provider)
  Extract->>Store: _store_code_examples(..., embedding_provider)
  Store->>EmbSvc: create_embeddings_batch(texts,..., embedding_provider)
  EmbSvc->>Adapter: instantiate adapter for provider
  Adapter->>Provider: create embeddings (HTTP/API)
  Provider-->>Adapter: vectors
  Adapter-->>EmbSvc: vectors
  EmbSvc-->>Store: vectors (persist)
  Store-->>Extract: storage result
  Extract-->>Crawl: progress/complete
  Crawl-->>API: status/complete
  API-->>UI: status/results
Loading
sequenceDiagram
  autonumber
  participant UI as RAGSettings (FE)
  participant Ollama as Ollama Instance

  rect rgb(240,248,255)
  note right of UI: Context switch: Chat vs Embedding provider/model
  UI->>UI: Load per-context provider/model from storage
  UI->>Ollama: Poll /api/tags and /api/version (per-context URLs)
  Ollama-->>UI: Status & model metrics
  UI->>UI: Update status badges, actions, and context UI
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • leex279
  • coleam00
  • Wirasm

Poem

I nibble through configs, hop in delight,
I split chat and vectors by day and night.
Adapters hum softly, Ollama blinks bright,
Keys and models settle, embeddings take flight—
A rabbit’s cheer: vectors done right! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description uses the repository’s template headers but leaves the Summary section blank and writes the Changes Made section as a free-form paragraph instead of the expected bulleted list; moreover, the Test Evidence section contains only placeholder examples rather than actual commands and results, and it does not explicitly confirm whether there are breaking changes. Please populate the Summary section with a concise overview, convert the Changes Made into clear bullet points, add real test commands and output in the Test Evidence section, and explicitly state any breaking changes or confirm that none exist.
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title indicates that this pull request delivers a feature polishing the LLM Providers UI, which aligns with the primary frontend focus, but it does not mention the new embedding provider section or the accompanying backend enhancements.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c00e79 and b5930ba.

📒 Files selected for processing (1)
  • python/src/server/services/embeddings/embedding_service.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Never accept corrupted data in batch/continuation paths: skip failed items entirely (e.g., do not store zero embeddings, null FKs, or malformed JSON)
Use specific exception types; avoid catching bare Exception
Preserve full stack traces in logs (use exc_info=True with Python logging)
Never return None to indicate failure; raise an exception with details
Authentication/authorization failures must halt the operation and be clearly surfaced
Service startup, missing configuration, database connection, or critical dependency failures should crash fast with clear errors
During crawling/batch/background tasks and WebSocket events, continue processing other items but log failures with context
Include context (operation intent, relevant IDs/URLs/data) in error messages
Pydantic should raise on data corruption or validation errors; do not accept invalid inputs

Files:

  • python/src/server/services/embeddings/embedding_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/services/**/*.py: For batch operations, report both success count and a detailed failure list
On external API calls, retry with exponential backoff and fail with a clear message after retries
Place business logic in service layer modules (API Route → Service → Database pattern)

Files:

  • python/src/server/services/embeddings/embedding_service.py
python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting (errors, warnings, unused imports)

Files:

  • python/src/server/services/embeddings/embedding_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use MyPy for type checking to ensure type safety

python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code

Files:

  • python/src/server/services/embeddings/embedding_service.py
🧬 Code graph analysis (1)
python/src/server/services/embeddings/embedding_service.py (4)
python/src/server/config/logfire_config.py (1)
  • safe_span (151-172)
python/src/server/services/llm_provider_service.py (2)
  • get_embedding_model (600-675)
  • get_llm_client (313-548)
python/src/server/services/credential_service.py (4)
  • get_credential (160-177)
  • get_credential (552-554)
  • get_active_provider (426-501)
  • get_credentials_by_category (323-369)
python/src/server/services/embeddings/embedding_exceptions.py (2)
  • EmbeddingAPIError (86-99)
  • EmbeddingRateLimitError (61-72)
🔇 Additional comments (8)
python/src/server/services/embeddings/embedding_service.py (8)

71-82: LGTM! Clean adapter abstraction.

The abstract interface provides a clear contract for embedding providers.


84-104: LGTM! OpenAI-compatible adapter implementation is correct.

The adapter correctly delegates to the client's embeddings API and handles optional dimensions.


155-185: LGTM! Google model normalization and dimensionality handling implemented correctly.

The code properly normalizes Google model IDs (addressing a past review comment) and validates dimensions against provider-specific supported sets with appropriate fallback.


203-217: LGTM! Embedding normalization is robust.

The normalization logic correctly handles edge cases (zero-norm, numpy errors) and falls back gracefully.


220-230: LGTM! Helper functions are clean and correct.

The adapter selection logic and async/sync value normalization utility are well-implemented.


353-382: LGTM! Embedding provider configuration and adapter setup is well-structured.

The flow correctly retrieves the embedding provider configuration, creates the appropriate client, loads settings, and instantiates the adapter.


463-473: LGTM! EmbeddingRateLimitError retry logic added correctly.

The new retry logic for EmbeddingRateLimitError with exponential backoff is well-implemented and will enable proper handling of rate limits from Google and other providers.


475-489: LGTM! Batch failure handling follows "skip, don't corrupt" principle.

Failed batches are properly tracked without corrupting data, and processing continues with remaining batches.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

@Chillbruhhh
Copy link
Copy Markdown
Author

@leex279 @coleam00

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
python/src/server/services/crawling/code_extraction_service.py (2)

198-205: Summary phase: handle 0..1 vs 0..100; clamp to 20–90%.

Same unit-mismatch risk as above. Normalize and clamp.

Apply this diff:

 async def summary_progress(data: dict):
   # Scale progress to 20-90% range
-  raw_progress = data.get("progress", data.get("percentage", 0))
-  scaled_progress = 20 + int(raw_progress * 0.7)  # 20-90%
-  data["progress"] = scaled_progress
+  raw = data.get("progress", data.get("percentage", 0))
+  try:
+    raw_num = float(raw)
+  except (TypeError, ValueError):
+    raw_num = 0.0
+  if 0.0 <= raw_num <= 1.0:
+    raw_num *= 100.0
+  # 20-90% with clamping
+  scaled_progress = min(90, max(20, 20 + int(raw_num * 0.7)))
+  data["progress"] = scaled_progress
   await progress_callback(data)

217-224: Storage phase: handle 0..1 vs 0..100; clamp to 90–100%.

Normalize and clamp to avoid progress jumping outside target range.

Apply this diff:

 async def storage_progress(data: dict):
   # Scale progress to 90-100% range
-  raw_progress = data.get("progress", data.get("percentage", 0))
-  scaled_progress = 90 + int(raw_progress * 0.1)  # 90-100%
-  data["progress"] = scaled_progress
+  raw = data.get("progress", data.get("percentage", 0))
+  try:
+    raw_num = float(raw)
+  except (TypeError, ValueError):
+    raw_num = 0.0
+  if 0.0 <= raw_num <= 1.0:
+    raw_num *= 100.0
+  # 90-100% with clamping
+  scaled_progress = min(100, max(90, 90 + int(raw_num * 0.1)))
+  data["progress"] = scaled_progress
   await progress_callback(data)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)

861-881: Page-load Ollama checks ignore embedding-only Ollama selection.

This effect only runs when LLM_PROVIDER === 'ollama'. If chat is remote but embedding is Ollama, embedding connectivity is never auto-checked on load.

Apply:

-    if (!hasRunInitialTestRef.current && 
-        ragSettings.LLM_PROVIDER === 'ollama' && 
-        Object.keys(ragSettings).length > 0) {
+    if (
+      !hasRunInitialTestRef.current &&
+      (ragSettings.LLM_PROVIDER === 'ollama' || embeddingProvider === 'ollama') &&
+      Object.keys(ragSettings).length > 0
+    ) {

Also include embeddingProvider in deps:

-  }, [ragSettings.LLM_PROVIDER, llmInstanceConfig.url, llmInstanceConfig.name, 
-      embeddingInstanceConfig.url, embeddingInstanceConfig.name]); 
+  }, [ragSettings.LLM_PROVIDER, embeddingProvider, llmInstanceConfig.url, llmInstanceConfig.name,
+      embeddingInstanceConfig.url, embeddingInstanceConfig.name]);

896-905: Do not log secrets (even partially).

This console.log prints the first 10 characters of an API key. Remove it to avoid leaking secrets via logs.

Apply:

-        console.log('🔍 OpenAI status check:', { 
-          openAIKey, 
-          keyValue: keyValue ? `${keyValue.substring(0, 10)}...` : keyValue, 
-          hasValue: !!keyValue, 
-          hasOpenAIKey,
-          openAIConnected,
-          isChecking,
-          allCredentials: Object.keys(apiCredentials)
-        });
+        // Intentionally avoid logging API key material.

2388-2414: Embedding model defaults tied to LLM provider (bug).

Use EMBEDDING_PROVIDER to derive embedding defaults; otherwise display/placeholder can be wrong when providers differ.

Apply:

-function getDisplayedEmbeddingModel(ragSettings: any): string {
-  const provider = ragSettings.LLM_PROVIDER || 'openai';
+function getDisplayedEmbeddingModel(ragSettings: any): string {
+  const provider = ragSettings.EMBEDDING_PROVIDER || ragSettings.LLM_PROVIDER || 'openai';
🧹 Nitpick comments (11)
python/src/server/services/crawling/code_extraction_service.py (1)

160-166: DRY: Extract a helper to normalize progress values.

Three wrappers (and also the storage callback in _store_code_examples) repeat the same fallback and should accept both 0..1 and 0..100. Centralize to reduce drift.

You can add inside CodeExtractionService:

def _normalize_progress_0_100(self, data: dict[str, Any]) -> float:
    raw = data.get("progress", data.get("percentage", 0))
    try:
        val = float(raw)
    except (TypeError, ValueError):
        return 0.0
    return val * 100.0 if 0.0 <= val <= 1.0 else val

Then use it in the three wrappers and in _store_code_examples’ storage_callback, applying appropriate clamping per phase.

Also applies to: 198-205, 217-224

archon-ui-main/src/services/credentialsService.ts (1)

79-87: Credential update event is a good integration hook.

Nice browser-guard and CustomEvent dispatch. Consider documenting the event name and detail shape in a shared constants/types module for discoverability.

archon-ui-main/src/components/settings/RAGSettings.tsx (9)

118-120: DEFAULT_OLLAMA_URL should live in config.

Extract to a config util/env (e.g., getApiUrl-like) to avoid hardcoding environment-specific hostnames.


349-397: Ollama health detection may check the wrong base URL.

You merge LLM and Embedding URLs and pick the first non-empty, which can test the wrong instance when only one provider is Ollama. Scope detection to the active context.

Apply:

-  useEffect(() => {
-    const needsDetection = chatProvider === 'ollama' || embeddingProvider === 'ollama';
-    if (!needsDetection) {
-      setOllamaServerStatus('unknown');
-      return;
-    }
-
-    const baseUrl = (
-      ragSettings.LLM_BASE_URL?.trim() ||
-      llmInstanceConfig.url?.trim() ||
-      ragSettings.OLLAMA_EMBEDDING_URL?.trim() ||
-      embeddingInstanceConfig.url?.trim() ||
-      DEFAULT_OLLAMA_URL
-    );
-    const normalizedUrl = baseUrl.replace('/v1', '').replace(/\/$/, '');
+  useEffect(() => {
+    const isOllamaActive =
+      (activeSelection === 'chat' && chatProvider === 'ollama') ||
+      (activeSelection === 'embedding' && embeddingProvider === 'ollama');
+    if (!isOllamaActive) {
+      setOllamaServerStatus('unknown');
+      return;
+    }
+    const baseUrl =
+      activeSelection === 'chat'
+        ? (ragSettings.LLM_BASE_URL?.trim() || llmInstanceConfig.url?.trim() || DEFAULT_OLLAMA_URL)
+        : (ragSettings.OLLAMA_EMBEDDING_URL?.trim() || embeddingInstanceConfig.url?.trim() || DEFAULT_OLLAMA_URL);
+    const normalizedUrl = baseUrl.replace('/v1', '').replace(/\/$/, '');

964-984: Provider alert should reflect active context.

Currently tied to LLM provider only; embed alerts won’t show when embedding is selected.

Apply:

-  const selectedProviderKey = isProviderKey(ragSettings.LLM_PROVIDER)
-    ? (ragSettings.LLM_PROVIDER as ProviderKey)
-    : undefined;
-  const selectedProviderStatus = selectedProviderKey ? getProviderStatus(selectedProviderKey) : undefined;
+  const selectedProviderKey = activeSelection === 'chat' ? chatProvider : embeddingProvider;
+  const selectedProviderStatus = getProviderStatus(selectedProviderKey);

330-336: Listen for credential updates to refresh status immediately.

Hook into archon:credentials-updated to avoid waiting for the interval/manual actions.

Add after this block:

useEffect(() => {
  const handler = () => reloadApiCredentials();
  window.addEventListener('archon:credentials-updated', handler as EventListener);
  return () => window.removeEventListener('archon:credentials-updated', handler as EventListener);
}, []);

1458-1494: Save persists instance fields; consider refetching provider status.

Optional: after saving, trigger reloadApiCredentials and a one-shot status re-test for current providers.


121-154: Prop types: avoid inline shape and any; reuse RagSettings.

Import RagSettings and type setRagSettings as React.Dispatch<React.SetStateAction> to prevent drift.

Apply:

+import type { RagSettings as RagSettingsType } from '../../services/credentialsService';

-interface RAGSettingsProps {
-  ragSettings: {
-    MODEL_CHOICE: string;
-    USE_CONTEXTUAL_EMBEDDINGS: boolean;
-    CONTEXTUAL_EMBEDDINGS_MAX_WORKERS: number;
-    USE_HYBRID_SEARCH: boolean;
-    USE_AGENTIC_RAG: boolean;
-    USE_RERANKING: boolean;
-    LLM_PROVIDER?: string;
-    LLM_BASE_URL?: string;
-    LLM_INSTANCE_NAME?: string;
-    EMBEDDING_MODEL?: string;
-    EMBEDDING_PROVIDER?: string;
-    OLLAMA_EMBEDDING_URL?: string;
-    OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
-    // Crawling Performance Settings
-    CRAWL_BATCH_SIZE?: number;
-    CRAWL_MAX_CONCURRENT?: number;
-    CRAWL_WAIT_STRATEGY?: string;
-    CRAWL_PAGE_TIMEOUT?: number;
-    CRAWL_DELAY_BEFORE_HTML?: number;
-    // Storage Performance Settings
-    DOCUMENT_STORAGE_BATCH_SIZE?: number;
-    EMBEDDING_BATCH_SIZE?: number;
-    DELETE_BATCH_SIZE?: number;
-    ENABLE_PARALLEL_BATCHES?: boolean;
-    // Advanced Settings
-    MEMORY_THRESHOLD_PERCENT?: number;
-    DISPATCHER_CHECK_INTERVAL?: number;
-    CODE_EXTRACTION_BATCH_SIZE?: number;
-    CODE_SUMMARY_MAX_WORKERS?: number;
-  };
-  setRagSettings: (settings: any) => void;
-}
+interface RAGSettingsProps {
+  ragSettings: RagSettingsType;
+  setRagSettings: React.Dispatch<React.SetStateAction<RagSettingsType>>;
+}

200-205: Instance config defaults use host.docker.internal.

Non-Docker Linux environments may not resolve this hostname. Consider reading defaults from env/config and showing a hint if unreachable.


503-526: Provider status polling cadence.

60s is reasonable. If this runs across multiple tabs, consider BroadcastChannel to coordinate and reduce duplicate polling.


352-363: Health check URL normalization.

.replace('/v1','') is fine; you may prefer a regex anchored to '/v1/?$' to avoid unintended replacements.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63a92cf and 810c80f.

⛔ Files ignored due to path filters (1)
  • archon-ui-main/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (6)
  • archon-ui-main/package.json (1 hunks)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (21 hunks)
  • archon-ui-main/src/services/credentialsService.ts (7 hunks)
  • python/src/server/services/crawling/code_extraction_service.py (3 hunks)
  • python/src/server/services/credential_service.py (1 hunks)
  • python/src/server/services/embeddings/embedding_service.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Never accept corrupted data in batch/continuation paths: skip failed items entirely (e.g., do not store zero embeddings, null FKs, or malformed JSON)
Use specific exception types; avoid catching bare Exception
Preserve full stack traces in logs (use exc_info=True with Python logging)
Never return None to indicate failure; raise an exception with details
Authentication/authorization failures must halt the operation and be clearly surfaced
Service startup, missing configuration, database connection, or critical dependency failures should crash fast with clear errors
During crawling/batch/background tasks and WebSocket events, continue processing other items but log failures with context
Include context (operation intent, relevant IDs/URLs/data) in error messages
Pydantic should raise on data corruption or validation errors; do not accept invalid inputs

Files:

  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/services/**/*.py: For batch operations, report both success count and a detailed failure list
On external API calls, retry with exponential backoff and fail with a clear message after retries
Place business logic in service layer modules (API Route → Service → Database pattern)

Files:

  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting (errors, warnings, unused imports)

Files:

  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use MyPy for type checking to ensure type safety

python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code

Files:

  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Disallow implicit any in TypeScript

archon-ui-main/src/**/*.{ts,tsx}: Frontend TypeScript must use strict mode with no implicit any
Use TanStack Query for all data fetching; avoid prop drilling
Use database values directly in the frontend; avoid mapping layers between BE and FE types

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,py}

📄 CodeRabbit inference engine (CLAUDE.md)

In code comments, avoid meta terms like SIMPLIFIED, ENHANCED, LEGACY, CHANGED, REMOVED; comment only on functionality and reasoning (do not mention beta/global rules)

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (2)
python/src/server/services/embeddings/embedding_service.py (4)
python/src/server/config/logfire_config.py (1)
  • safe_span (151-172)
python/src/server/services/llm_provider_service.py (2)
  • get_embedding_model (600-675)
  • get_llm_client (313-548)
python/src/server/services/credential_service.py (3)
  • get_credential (160-177)
  • get_credential (545-547)
  • get_active_provider (426-494)
python/src/server/services/embeddings/embedding_exceptions.py (1)
  • EmbeddingAPIError (86-99)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
archon-ui-main/src/features/ui/primitives/button.tsx (1)
  • Button (11-130)
archon-ui-main/src/services/credentialsService.ts (1)
  • credentialsService (604-604)
🪛 GitHub Actions: Continuous Integration
python/src/server/services/embeddings/embedding_service.py

[error] 304-304: Embedding creation failed: TypeError: object MagicMock can't be used in 'await' expression (likely due to improper mocking of async dependency in tests).

🔇 Additional comments (16)
python/src/server/services/crawling/code_extraction_service.py (1)

160-166: Retain existing extraction_progress scaling Upstream emits raw progress on a 0–100 scale, so int(raw_progress * 0.2) correctly maps to 0–20; no normalization for 0–1 fractions is needed.

Likely an incorrect or invalid review comment.

archon-ui-main/src/services/credentialsService.ts (5)

26-26: RagSettings: EMBEDDING_PROVIDER addition looks good.


236-237: String-field mapping includes EMBEDDING_PROVIDER.

Good catch; ensures seamless persistence and hydration.


294-297: Notify on updateCredential.

Event emission post-update is correct and sequenced after JSON parsing.


320-323: Notify on createCredential.

Consistent with update flow.


342-343: Notify on deleteCredential.

Good for live UI sync.

archon-ui-main/src/components/settings/RAGSettings.tsx (9)

7-9: Icon imports are pack-scoped.

Good usage of react-icons with scoped imports; preserves tree-shaking.


18-19: Embedding-capable providers list.

Looks correct and constrains the embedding grid appropriately.


133-134: Prop surface extended with EMBEDDING_PROVIDER.

Matches backend interface; good.


177-186: Independent chat vs embedding provider state.

Solid approach with one- and two‑way sync guards.


242-259: Per‑provider model persistence effects.

Looks good; updates only when relevant model changes.

Also applies to: 260-275


398-442: One-/two-way sync guards.

Ref logic prevents loops; good separation for chat vs embedding updates.


985-1043: Auto-retry LLM testing looks solid.

Timeout refs are cleared; suppression avoids toast spam. Good.


1045-1101: Auto-retry Embedding testing looks solid.

Mirrors LLM behavior; cleanup is correct.


545-564: AbortSignal.timeout: cross-browser support.

Safari support can lag. Consider a fallback with manual AbortController timeout.

Please verify in your target browsers. If you see TypeError, swap to a custom AbortController timeout wrapper.

archon-ui-main/package.json (1)

53-53: react-icons@5.5.0 upgrade verified

Build completes without TS/JSX typing errors (IconType unchanged) and pack-scoped imports remain tree-shakable (no unexpected bundle bloat).

Comment thread archon-ui-main/src/services/credentialsService.ts
Comment thread python/src/server/services/credential_service.py
Comment thread python/src/server/services/embeddings/embedding_service.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/src/server/services/embeddings/embedding_service.py (1)

380-421: Retry Google rate limits with exponential backoff

Batches hitting Google 429s currently fail without retry. Add a branch mirroring the OpenAI path.

                                 except openai.RateLimitError as e:
                                     error_message = str(e)
                                     if "insufficient_quota" in error_message:
@@
                                             raise  # Will be caught by outer try
+                                except EmbeddingRateLimitError as e:
+                                    retry_count += 1
+                                    if retry_count < max_retries:
+                                        wait_time = 2**retry_count
+                                        search_logger.warning(
+                                            f"Rate limit (Google) for batch {batch_index}, "
+                                            f"waiting {wait_time}s before retry {retry_count}/{max_retries}"
+                                        )
+                                        await asyncio.sleep(wait_time)
+                                    else:
+                                        raise
🧹 Nitpick comments (6)
python/src/server/services/embeddings/embedding_service.py (6)

123-131: Plumb dimensions into Google requests

Dimensions are currently ignored by Google adapter. Pass them through to _fetch_single_embedding so they can be applied (see companion change).

-                embeddings = await asyncio.gather(
-                    *(
-                        self._fetch_single_embedding(http_client, google_api_key, model, text)
-                        for text in texts
-                    )
-                )
+                embeddings = await asyncio.gather(
+                    *(
+                        self._fetch_single_embedding(http_client, google_api_key, model, text, dimensions)
+                        for text in texts
+                    )
+                )

317-323: Remove unreachable branch

After normalizing to "openai" on Line 318, the subsequent None/empty check cannot trigger.

-            if not embedding_provider:
-                search_logger.error("No embedding provider configured")
-                raise ValueError("No embedding provider configured. Please set EMBEDDING_PROVIDER environment variable.")

324-325: Clarify log source of provider selection

Message may be misleading when provider is overridden via function arg.

-            search_logger.info(f"Using embedding provider: '{embedding_provider}' (from EMBEDDING_PROVIDER setting)")
+            search_logger.info(f"Using embedding provider: '{embedding_provider}'")

343-346: Avoid repeated model lookups inside retry loop

Compute embedding_model once per batch to reduce overhead and stabilize behavior across retries.

                     batch_index = i // batch_size
+                    # Resolve once per batch
+                    embedding_model = await get_embedding_model(provider=embedding_provider)
@@
-                                try:
-                                    # Create embeddings for this batch
-                                    embedding_model = await get_embedding_model(provider=embedding_provider)
+                                try:
+                                    # Create embeddings for this batch

Also applies to: 368-373


325-341: Skip creating an OpenAI-compatible client when provider is Google

Google adapter doesn’t use the client; avoid creating/closing it for performance and fewer moving parts.

Example refactor:

  • If embedding_provider == "google": don’t enter get_llm_client context; set client = None and proceed.
  • Else: use current get_llm_client path.

109-131: Consider Google batch endpoint to reduce HTTP calls

Use models/{model}:batchEmbedContents to embed the whole batch in one request; it returns embeddings aligned to inputs and supports outputDimensionality.

I can provide a concrete patch if you’d like to switch to batchEmbedContents.

Also applies to: 169-175

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 810c80f and 11388d6.

📒 Files selected for processing (1)
  • python/src/server/services/embeddings/embedding_service.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Never accept corrupted data in batch/continuation paths: skip failed items entirely (e.g., do not store zero embeddings, null FKs, or malformed JSON)
Use specific exception types; avoid catching bare Exception
Preserve full stack traces in logs (use exc_info=True with Python logging)
Never return None to indicate failure; raise an exception with details
Authentication/authorization failures must halt the operation and be clearly surfaced
Service startup, missing configuration, database connection, or critical dependency failures should crash fast with clear errors
During crawling/batch/background tasks and WebSocket events, continue processing other items but log failures with context
Include context (operation intent, relevant IDs/URLs/data) in error messages
Pydantic should raise on data corruption or validation errors; do not accept invalid inputs

Files:

  • python/src/server/services/embeddings/embedding_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/services/**/*.py: For batch operations, report both success count and a detailed failure list
On external API calls, retry with exponential backoff and fail with a clear message after retries
Place business logic in service layer modules (API Route → Service → Database pattern)

Files:

  • python/src/server/services/embeddings/embedding_service.py
python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting (errors, warnings, unused imports)

Files:

  • python/src/server/services/embeddings/embedding_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use MyPy for type checking to ensure type safety

python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code

Files:

  • python/src/server/services/embeddings/embedding_service.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: zaksnet
PR: coleam00/Archon#473
File: archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx:26-31
Timestamp: 2025-09-10T11:35:20.655Z
Learning: When reviewing large refactoring PRs like "Clean multi provider feature", focus feedback on the core objectives (provider management, API keys, model configuration, service registry) rather than unrelated UI improvements in components that were only incidentally modified during the refactor.
🧬 Code graph analysis (1)
python/src/server/services/embeddings/embedding_service.py (3)
python/src/server/services/llm_provider_service.py (2)
  • get_embedding_model (600-675)
  • get_llm_client (313-548)
python/src/server/services/credential_service.py (4)
  • get_credential (160-177)
  • get_credential (545-547)
  • get_active_provider (426-494)
  • get_credentials_by_category (323-369)
python/src/server/services/embeddings/embedding_exceptions.py (1)
  • EmbeddingAPIError (86-99)
🔇 Additional comments (1)
python/src/server/services/embeddings/embedding_service.py (1)

149-165: Normalize Google model IDs and support output dimensionality

If users enter "models/text-embedding-004", current code produces models/models/... (404). Also, when dimensions is provided, Google supports outputDimensionality — pass it through.

Apply this diff:

-    async def _fetch_single_embedding(
+    async def _fetch_single_embedding(
         self,
         http_client: httpx.AsyncClient,
         api_key: str,
         model: str,
-        text: str,
-    ) -> list[float]:
-        url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:embedContent"
+        text: str,
+        dimensions: int | None = None,
+    ) -> list[float]:
+        # Normalize model for URL/payload compatibility
+        if model.startswith("models/"):
+            url_model = model[len("models/") :]
+            payload_model = model
+        else:
+            url_model = model
+            payload_model = f"models/{model}"
+        url = f"https://generativelanguage.googleapis.com/v1beta/models/{url_model}:embedContent"
         headers = {
             "x-goog-api-key": api_key,
             "Content-Type": "application/json",
         }
-        payload = {
-            "model": f"models/{model}",
+        payload = {
+            "model": payload_model,
             "content": {"parts": [{"text": text}]},
         }
+        if dimensions is not None:
+            # Google-specific: honor requested dimensionality when supported by the model
+            payload["outputDimensionality"] = dimensions

Comment thread python/src/server/services/embeddings/embedding_service.py Outdated
Comment thread python/src/server/services/embeddings/embedding_service.py
Comment thread python/src/server/services/embeddings/embedding_service.py
@leex279 leex279 self-requested a review September 25, 2025 13:42
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)

279-301: Don’t store raw API keys in React state

You’re persisting decrypted API key values into component state (apiCredentials). This is a security posture issue: values can be inspected via devtools and linger in memory. You only need to know “has value” for UI; switch to the has_value boolean from checkCredentialStatus.

Minimal change (store booleans instead of secrets):

-        const credentials: {[key: string]: string} = {};
+        const credentials: {[key: string]: boolean} = {};
@@
-          if (result.has_value && result.value && result.value.trim().length > 0) {
-            credentials[key] = result.value;
-          }
+          credentials[key] = !!result.has_value;

Then update getProviderStatus to use the boolean:

-        const openAIKey = Object.keys(apiCredentials).find(key => key.toUpperCase() === 'OPENAI_API_KEY');
-        const keyValue = openAIKey ? apiCredentials[openAIKey] : undefined;
-        const hasOpenAIKey = openAIKey && keyValue && keyValue.trim().length > 0 && !keyValue.includes('[ENCRYPTED]');
+        const openAIKey = Object.keys(apiCredentials).find(k => k.toUpperCase() === 'OPENAI_API_KEY');
+        const hasOpenAIKey = openAIKey ? !!apiCredentials[openAIKey] : false;

Apply the same pattern for Google.


731-807: Ollama metrics may misattribute models due to URL mismatch (/v1, trailing slash)

You filter models by exact URL equality against llmInstanceConfig.url/embeddingInstanceConfig.url, but the backend typically returns base URLs without /v1. This can cause zero counts. Normalize both sides and also send normalized instance_urls in the request.

Apply:

-      const instanceUrls = [];
-      if (llmInstanceConfig.url) instanceUrls.push(llmInstanceConfig.url);
-      if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== llmInstanceConfig.url) {
-        instanceUrls.push(embeddingInstanceConfig.url);
-      }
+      const instanceUrls: string[] = [];
+      const llmUrlBase = normalizeBaseUrl(llmInstanceConfig.url);
+      const embUrlBase = normalizeBaseUrl(embeddingInstanceConfig.url);
+      if (llmUrlBase) instanceUrls.push(llmUrlBase);
+      if (embUrlBase && embUrlBase !== llmUrlBase) instanceUrls.push(embUrlBase);
@@
-        const llmChatModels = allChatModels.filter((model: any) => 
-          model.instance_url === llmInstanceConfig.url
-        );
-        const llmEmbeddingModels = allEmbeddingModels.filter((model: any) => 
-          model.instance_url === llmInstanceConfig.url
-        );
+        const llmChatModels = allChatModels.filter((model: any) =>
+          normalizeBaseUrl(model.instance_url) === llmUrlBase
+        );
+        const llmEmbeddingModels = allEmbeddingModels.filter((model: any) =>
+          normalizeBaseUrl(model.instance_url) === llmUrlBase
+        );
@@
-        const embChatModels = allChatModels.filter((model: any) => 
-          model.instance_url === embeddingInstanceConfig.url
-        );
-        const embEmbeddingModels = allEmbeddingModels.filter((model: any) => 
-          model.instance_url === embeddingInstanceConfig.url
-        );
+        const embChatModels = allChatModels.filter((model: any) =>
+          normalizeBaseUrl(model.instance_url) === embUrlBase
+        );
+        const embEmbeddingModels = allEmbeddingModels.filter((model: any) =>
+          normalizeBaseUrl(model.instance_url) === embUrlBase
+        );

Add normalizeBaseUrl as suggested above.


861-884: Prevent repeated metrics fetches on every keystroke

Effect depends on instance URLs/names, and condition embeddingProvider === 'ollama' is always true when selected, causing repeated fetches while typing or minor state changes. Guard by comparing to last snapshot and/or narrow deps.

Apply:

-  React.useEffect(() => {
-    if (
-      ragSettings.LLM_PROVIDER === 'ollama' || embeddingProvider === 'ollama'
-    ) {
-      const currentProvider = ragSettings.LLM_PROVIDER;
-      const lastProvider = lastMetricsFetchRef.current.provider;
-
-      // Only fetch if provider changed to Ollama (scenario 1: user clicks on Ollama Provider)
-      if (currentProvider !== lastProvider || embeddingProvider === 'ollama') {
-        lastMetricsFetchRef.current = {
-          provider: currentProvider,
-          llmUrl: llmInstanceConfig.url,
-          embUrl: embeddingInstanceConfig.url,
-          llmOnline: llmStatus.online,
-          embOnline: embeddingStatus.online
-        };
-        console.log('🔄 Fetching Ollama metrics - Provider selected');
-        fetchOllamaMetrics();
-      }
-    }
-  }, [ragSettings.LLM_PROVIDER, embeddingProvider, llmInstanceConfig.url, llmInstanceConfig.name,
-      embeddingInstanceConfig.url, embeddingInstanceConfig.name]); // Include embeddingProvider in deps
+  React.useEffect(() => {
+    const current = {
+      provider: ragSettings.LLM_PROVIDER,
+      embProvider: embeddingProvider,
+      llmUrl: llmInstanceConfig.url,
+      embUrl: embeddingInstanceConfig.url,
+      llmOnline: llmStatus.online,
+      embOnline: embeddingStatus.online,
+    };
+    const last = lastMetricsFetchRef.current;
+    const meaningfulChange =
+      current.provider !== last.provider ||
+      current.embProvider !== (last as any).embProvider ||
+      current.llmUrl !== last.llmUrl ||
+      current.embUrl !== last.embUrl ||
+      current.llmOnline !== last.llmOnline ||
+      current.embOnline !== last.embOnline;
+
+    if ((current.provider === 'ollama' || current.embProvider === 'ollama') && meaningfulChange) {
+      lastMetricsFetchRef.current = current as any;
+      console.log('🔄 Fetching Ollama metrics - state changed');
+      fetchOllamaMetrics();
+    }
+  }, [ragSettings.LLM_PROVIDER, embeddingProvider, llmStatus.online, embeddingStatus.online]);
python/src/server/services/crawling/code_extraction_service.py (2)

162-166: Normalize and clamp extraction-phase progress (0–20%)

Extraction phase lacks the robust parsing/clamping you added in later phases. Strings/0–1 inputs can break or under-report (int(0.5*0.2)=0). Align with summary/storage handling.

Apply:

-                raw_progress = data.get("progress", data.get("percentage", 0))
-                scaled_progress = int(raw_progress * 0.2)  # 0-20%
+                raw = data.get("progress", data.get("percentage", 0))
+                try:
+                    raw_num = float(raw)
+                except (TypeError, ValueError):
+                    raw_num = 0.0
+                if 0.0 <= raw_num <= 1.0:
+                    raw_num *= 100.0
+                # 0–20% with clamping
+                scaled_progress = min(20, max(0, int(raw_num * 0.2)))
                 data["progress"] = scaled_progress
                 await progress_callback(data)

887-898: Fix TypeError when min_length is None in indented-block path

This branch compares an int to None: len("...") >= min_length. With default min_length=None (common callsite), this crashes.

Apply:

-                elif current_block and len("\n".join(current_block)) >= min_length:
-                    # End of indented block, check if it's code
-                    code_content = "\n".join(current_block)
+                elif current_block:
+                    block_text = "\n".join(current_block)
+                    threshold = min_length if min_length is not None else await self._get_min_code_length()
+                    if len(block_text) < threshold:
+                        # Not enough content; reset and continue
+                        current_block = []
+                        current_indent = None
+                        continue
+                    # End of indented block, check if it's code
+                    code_content = block_text
python/src/server/services/embeddings/embedding_service.py (1)

371-427: Retry on provider-agnostic rate limits

The retry loop only handles openai.RateLimitError. Add handling for EmbeddingRateLimitError to back off on Google (and other) 429s.

                             while retry_count < max_retries:
                                 try:
@@
-                                except openai.RateLimitError as e:
+                                except openai.RateLimitError as e:
                                     error_message = str(e)
@@
                                         else:
                                             raise  # Will be caught by outer try
+                                except EmbeddingRateLimitError as e:
+                                    retry_count += 1
+                                    if retry_count < max_retries:
+                                        wait_time = 2**retry_count
+                                        search_logger.warning(
+                                            f"Embedding rate limit for batch {batch_index}, "
+                                            f"waiting {wait_time}s before retry {retry_count}/{max_retries}"
+                                        )
+                                        await asyncio.sleep(wait_time)
+                                    else:
+                                        raise
🧹 Nitpick comments (15)
archon-ui-main/src/services/credentialsService.ts (3)

276-287: Avoid N events: add optional notify suppression to updateCredential

When updateRagSettings saves many keys, the current per-key notifications can cause UI thrash. Add an optional flag to suppress per-call notify so we can batch.

Apply:

-  async updateCredential(credential: Credential): Promise<Credential> {
+  async updateCredential(
+    credential: Credential,
+    options?: { suppressNotify?: boolean }
+  ): Promise<Credential> {

294-297: Avoid N events: conditionally dispatch notify inside updateCredential

Gate notify by the new suppress flag. This enables batching from updateRagSettings.

-      const updated = await response.json();
-      this.notifyCredentialUpdate([credential.key]);
-      return updated;
+      const updated = await response.json();
+      if (!options?.suppressNotify) {
+        this.notifyCredentialUpdate([credential.key]);
+      }
+      return updated;

348-367: Batch “credentials-updated” notifications when saving RAG settings

Sending one event per key can trigger redundant reloads in listeners. Batch emit once with all updated keys.

   async updateRagSettings(settings: RagSettings): Promise<void> {
-    const promises = [];
+    const promises = [];
+    const updatedKeys: string[] = [];
@@
-      promises.push(
-        this.updateCredential({
-          key,
-          value: value.toString(),
-          is_encrypted: false,
-          category: "rag_strategy",
-        }),
-      );
+      updatedKeys.push(key);
+      promises.push(
+        this.updateCredential(
+          {
+            key,
+            value: value.toString(),
+            is_encrypted: false,
+            category: "rag_strategy",
+          },
+          { suppressNotify: true }
+        ),
+      );
@@
-    await Promise.all(promises);
+    await Promise.all(promises);
+    this.notifyCredentialUpdate(updatedKeys);
archon-ui-main/src/components/settings/RAGSettings.tsx (8)

118-120: DEFAULT_OLLAMA_URL portability

host.docker.internal may not resolve on all Linux hosts. Consider a fallback to http://localhost:11434/v1 when the health check fails, or make this configurable via env.


330-347: Reload credentials: add event listener for batched updates

You emit archon:credentials-updated in the service. Listen here to refresh immediately instead of waiting for the 30s interval.

Add:

useEffect(() => {
  const handler = () => { reloadApiCredentials(); };
  window.addEventListener('archon:credentials-updated', handler as EventListener);
  return () => window.removeEventListener('archon:credentials-updated', handler as EventListener);
  // eslint-disable-next-line react-hooks/exhaustive-deps
}, []);

349-396: Normalize Ollama base URL once

You normalize here, but similar logic is duplicated elsewhere. Extract a normalizeBaseUrl helper and reuse to avoid mismatches with trailing slashes or /v1 suffix.

Helper to add (outside the component):

function normalizeBaseUrl(url?: string): string {
  if (!url) return '';
  return url.replace('/v1', '').replace(/\/$/, '');
}

982-1039: Deferred retry loops — LGTM with minor nit

The 5s retry with cancellation handling is solid. Consider using the same normalizeBaseUrl helper here for baseUrl construction to keep behavior consistent.


1119-1147: Page-load connection checks — LGTM

Staggering tests and suppressing toasts avoids noise. Minor: use normalizeBaseUrl for DEFAULT_OLLAMA_URL as well for consistent logging and requests.


2354-2410: Helper typings use any — prefer RagSettings type

getDisplayedChatModel/getDisplayedEmbeddingModel accept any. Import and use the RagSettings type (or RAGSettingsProps['ragSettings']) for stronger typing and to honor strict TS settings.

Example:

import type { RagSettings as RagSettingsType } from '../../services/credentialsService';

function getDisplayedChatModel(ragSettings: RagSettingsType): string { ... }
function getDisplayedEmbeddingModel(ragSettings: RagSettingsType): string { ... }

545-564: AbortSignal.timeout browser support

AbortSignal.timeout isn’t supported in all browsers. Consider a fallback with AbortController + manual setTimeout to avoid runtime errors in older environments.


503-526: Provider connection testing cadence

Testing all providers every 60s is fine. The “checked recently” guard references providerConnectionStatus captured at effect creation, so it won’t truly skip. Either include providerConnectionStatus in deps or remove the guard (since interval is already 60s).

python/src/server/services/crawling/code_extraction_service.py (1)

412-423: Docstring arg mismatch: remove nonexistent parameter

Docstring mentions min_length but the function signature doesn’t accept it. Update docstring for accuracy.

python/src/server/services/credential_service.py (1)

533-543: Honor service_type when setting provider

set_active_provider always writes LLM_PROVIDER. Support EMBEDDING_PROVIDER when service_type == "embedding".

-    async def set_active_provider(self, provider: str, service_type: str = "llm") -> bool:
+    async def set_active_provider(self, provider: str, service_type: str = "llm") -> bool:
         """Set the active provider for a service type."""
         try:
-            # For now, we'll update the RAG strategy settings
-            return await self.set_credential(
-                "LLM_PROVIDER",
-                provider,
-                category="rag_strategy",
-                description=f"Active {service_type} provider",
-            )
+            key = "EMBEDDING_PROVIDER" if service_type == "embedding" else "LLM_PROVIDER"
+            return await self.set_credential(
+                key,
+                provider,
+                category="rag_strategy",
+                description=f"Active {service_type} provider",
+            )
         except Exception as e:
             logger.error(f"Error setting active provider {provider} for {service_type}: {e}")
             return False
python/src/server/services/embeddings/embedding_service.py (2)

109-132: Support Google outputDimensionality with provider limits

Honor requested dimensions for Google by clamping to [1, 768] and setting outputDimensionality.

     async def create_embeddings(
         self,
         texts: list[str],
         model: str,
         dimensions: int | None = None,
     ) -> list[list[float]]:
         try:
-            if dimensions is not None:
-                _ = dimensions  # Maintains adapter signature; Google controls dimensions server-side.
+            # Google supports optional outputDimensionality with a max of 768
+            out_dim = None
+            if dimensions is not None:
+                try:
+                    d = int(dimensions)
+                    out_dim = 1 if d < 1 else 768 if d > 768 else d
+                except Exception:
+                    out_dim = None
@@
-                embeddings = await asyncio.gather(
-                    *(
-                        self._fetch_single_embedding(http_client, google_api_key, model, text)
-                        for text in texts
-                    )
-                )
+                embeddings = await asyncio.gather(
+                    *(
+                        self._fetch_single_embedding(
+                            http_client, google_api_key, model, text, out_dim
+                        )
+                        for text in texts
+                    )
+                )
@@
-    async def _fetch_single_embedding(
+    async def _fetch_single_embedding(
         self,
         http_client: httpx.AsyncClient,
         api_key: str,
         model: str,
-        text: str,
+        text: str,
+        dimensions: int | None = None,
     ) -> list[float]:
@@
-        payload = {
+        payload = {
             "model": payload_model,
             "content": {"parts": [{"text": text}]},
         }
+        if isinstance(dimensions, int):
+            payload["outputDimensionality"] = dimensions

Also applies to: 149-171


330-331: Tighten log wording

This log suggests the source is EMBEDDING_PROVIDER even when an explicit override is used. Make wording neutral.

-            search_logger.info(f"Using embedding provider: '{embedding_provider}' (from EMBEDDING_PROVIDER setting)")
+            search_logger.info(f"Using embedding provider: '{embedding_provider}'")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11388d6 and aa354ca.

📒 Files selected for processing (6)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (25 hunks)
  • archon-ui-main/src/services/credentialsService.ts (7 hunks)
  • python/src/server/services/crawling/code_extraction_service.py (3 hunks)
  • python/src/server/services/credential_service.py (1 hunks)
  • python/src/server/services/embeddings/embedding_service.py (5 hunks)
  • python/src/server/services/storage/code_storage_service.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Never accept corrupted data in batch/continuation paths: skip failed items entirely (e.g., do not store zero embeddings, null FKs, or malformed JSON)
Use specific exception types; avoid catching bare Exception
Preserve full stack traces in logs (use exc_info=True with Python logging)
Never return None to indicate failure; raise an exception with details
Authentication/authorization failures must halt the operation and be clearly surfaced
Service startup, missing configuration, database connection, or critical dependency failures should crash fast with clear errors
During crawling/batch/background tasks and WebSocket events, continue processing other items but log failures with context
Include context (operation intent, relevant IDs/URLs/data) in error messages
Pydantic should raise on data corruption or validation errors; do not accept invalid inputs

Files:

  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/services/**/*.py: For batch operations, report both success count and a detailed failure list
On external API calls, retry with exponential backoff and fail with a clear message after retries
Place business logic in service layer modules (API Route → Service → Database pattern)

Files:

  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting (errors, warnings, unused imports)

Files:

  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use MyPy for type checking to ensure type safety

python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code

Files:

  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/credential_service.py
  • python/src/server/services/crawling/code_extraction_service.py
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Disallow implicit any in TypeScript

archon-ui-main/src/**/*.{ts,tsx}: Frontend TypeScript must use strict mode with no implicit any
Use TanStack Query for all data fetching; avoid prop drilling
Use database values directly in the frontend; avoid mapping layers between BE and FE types

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
  • archon-ui-main/src/services/credentialsService.ts
archon-ui-main/src/**/*.{ts,tsx,py}

📄 CodeRabbit inference engine (CLAUDE.md)

In code comments, avoid meta terms like SIMPLIFIED, ENHANCED, LEGACY, CHANGED, REMOVED; comment only on functionality and reasoning (do not mention beta/global rules)

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
  • archon-ui-main/src/services/credentialsService.ts
🧠 Learnings (2)
📓 Common learnings
Learnt from: zaksnet
PR: coleam00/Archon#473
File: archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx:26-31
Timestamp: 2025-09-10T11:35:20.655Z
Learning: When reviewing large refactoring PRs like "Clean multi provider feature", focus feedback on the core objectives (provider management, API keys, model configuration, service registry) rather than unrelated UI improvements in components that were only incidentally modified during the refactor.
📚 Learning: 2025-09-19T10:31:54.292Z
Learnt from: CR
PR: coleam00/Archon#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-19T10:31:54.292Z
Learning: Applies to python/src/server/**/*.py : Never accept corrupted data in batch/continuation paths: skip failed items entirely (e.g., do not store zero embeddings, null FKs, or malformed JSON)

Applied to files:

  • python/src/server/services/embeddings/embedding_service.py
🧬 Code graph analysis (3)
python/src/server/services/storage/code_storage_service.py (1)
python/src/server/services/embeddings/embedding_service.py (1)
  • create_embeddings_batch (260-477)
python/src/server/services/embeddings/embedding_service.py (2)
python/src/server/services/llm_provider_service.py (2)
  • get_embedding_model (600-675)
  • get_llm_client (313-548)
python/src/server/services/credential_service.py (4)
  • get_credential (160-177)
  • get_credential (552-554)
  • get_active_provider (426-501)
  • get_credentials_by_category (323-369)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
archon-ui-main/src/features/ui/primitives/button.tsx (1)
  • Button (11-130)
archon-ui-main/src/services/credentialsService.ts (1)
  • credentialsService (604-604)
🔇 Additional comments (15)
archon-ui-main/src/services/credentialsService.ts (4)

26-26: RagSettings: embedding provider added — LGTM

Adding EMBEDDING_PROVIDER aligns the service with the new UI split for chat vs embedding. No issues.


79-87: Credential update notification with browser guard — LGTM

Good use of typeof window check and CustomEvent to broadcast key updates.


196-206: Defaults aligned with UI and new embedding fields — LGTM

MODEL_CHOICE now matches the UI (gpt-4o-mini), and defaults include EMBEDDING_PROVIDER/EMBEDDING_MODEL. This also addresses the earlier review about default model mismatch.


236-239: Mapping includes EMBEDDING_PROVIDER — LGTM

Ensures persistence of the embedding provider string from credentials.

archon-ui-main/src/components/settings/RAGSettings.tsx (5)

17-19: Embedding-capable provider whitelist — LGTM

Limiting embedding providers to openai/google/ollama matches current backend support.


242-259: Provider-model persistence for chat — LGTM

Persists chat model per provider without coupling to embedding state. Good separation.


260-275: Provider-model persistence for embedding — LGTM

Mirrors chat behavior for embeddings. Looks good.


1271-1286: Provider switch updates the corresponding model — LGTM

Good UX: pulls last-used or default model per provider when switching contexts.


1532-1538: “Loading…” model counts — LGTM

Reads per-instance counts from metrics state with clear loading state. After normalizing URLs (earlier comment), these counts should be accurate.

Also applies to: 1610-1616

python/src/server/services/crawling/code_extraction_service.py (2)

200-209: LGTM: Robust summary-phase progress scaling

Good numeric normalization (0–1 vs 0–100) and clamping into 20–90%.


226-235: LGTM: Robust storage-phase progress scaling

Consistent normalization and clamping into 90–100%.

python/src/server/services/embeddings/embedding_service.py (3)

156-171: LGTM: Normalize Google model IDs

Correctly handles both "text-embedding-004" and "models/text-embedding-004" to avoid 404s.


133-147: Treat Google 429 as rate limit to enable backoff

All HTTP errors are mapped to EmbeddingAPIError; 429s should surface as EmbeddingRateLimitError.

-        except httpx.HTTPStatusError as error:
-            error_content = error.response.text
-            search_logger.error(
-                f"Google embedding API returned {error.response.status_code} - {error_content}",
-                exc_info=True,
-            )
-            raise EmbeddingAPIError(
-                f"Google embedding API error: {error.response.status_code} - {error_content}",
-                original_error=error,
-            ) from error
+        except httpx.HTTPStatusError as error:
+            error_content = error.response.text
+            if error.response.status_code == 429:
+                # Surface as rate limit to trigger unified retry/backoff
+                raise EmbeddingRateLimitError(f"Google rate limit: {error_content}") from error
+            search_logger.error(
+                f"Google embedding API returned {error.response.status_code} - {error_content}",
+                exc_info=True,
+            )
+            raise EmbeddingAPIError(
+                f"Google embedding API error: {error.response.status_code} - {error_content}",
+                original_error=error,
+            ) from error

381-383: Don’t silently drop texts when fewer embeddings are returned

zip(..., strict=False) loses items without marking failures, violating “skip, don’t corrupt”.

-                                    for text, vector in zip(batch, embeddings, strict=False):
-                                        result.add_success(vector, text)
+                                    if len(embeddings) != len(batch):
+                                        search_logger.error(
+                                            f"Adapter returned {len(embeddings)} embeddings for batch of {len(batch)}",
+                                            exc_info=True,
+                                        )
+                                    for idx, text in enumerate(batch):
+                                        if idx < len(embeddings) and isinstance(embeddings[idx], list):
+                                            result.add_success(embeddings[idx], text)
+                                        else:
+                                            result.add_failure(
+                                                text,
+                                                EmbeddingAPIError("Missing embedding from provider"),
+                                                batch_index,
+                                            )
python/src/server/services/credential_service.py (1)

441-460: Restore backward-compatible embedding provider fallback

When EMBEDDING_PROVIDER is unset, hard-defaulting to "openai" will break legacy installs that rely on LLM_PROVIDER. Fall back to LLM_PROVIDER if it’s embedding-capable before defaulting to OpenAI.

             if service_type == "embedding":
                 # First check for explicit EMBEDDING_PROVIDER setting (new split provider approach)
                 explicit_embedding_provider = rag_settings.get("EMBEDDING_PROVIDER")

                 # Validate that embedding provider actually supports embeddings
-                embedding_capable_providers = {"openai", "google", "ollama"}
+                embedding_capable_providers = {"openai", "google", "ollama"}

                 if (explicit_embedding_provider and
                     explicit_embedding_provider != "" and
                     explicit_embedding_provider in embedding_capable_providers):
                     # Use the explicitly set embedding provider
                     provider = explicit_embedding_provider
                     logger.debug(f"Using explicit embedding provider: '{provider}'")
                 else:
-                    # Fall back to OpenAI as default embedding provider for backward compatibility
-                    if explicit_embedding_provider and explicit_embedding_provider not in embedding_capable_providers:
-                        logger.warning(f"Invalid embedding provider '{explicit_embedding_provider}' doesn't support embeddings, defaulting to OpenAI")
-                    provider = "openai"
-                    logger.debug(f"No explicit embedding provider set, defaulting to OpenAI for backward compatibility")
+                    # Try legacy LLM provider first
+                    inferred = rag_settings.get("LLM_PROVIDER")
+                    if isinstance(inferred, str) and inferred.strip() in embedding_capable_providers:
+                        provider = inferred.strip()
+                        logger.debug(
+                            f"No explicit embedding provider set, defaulting to LLM provider '{provider}' for backward compatibility"
+                        )
+                    else:
+                        if explicit_embedding_provider and explicit_embedding_provider not in embedding_capable_providers:
+                            logger.warning(
+                                f"Invalid embedding provider '{explicit_embedding_provider}' doesn't support embeddings, defaulting to OpenAI"
+                            )
+                        provider = "openai"
+                        logger.debug("No explicit embedding or compatible LLM provider set, defaulting to OpenAI")

Comment thread python/src/server/services/storage/code_storage_service.py Outdated
Chillbruhhh added 2 commits September 25, 2025 10:35
…so code example storage now forwards a dedicated embedding provider override end-to-end without hijacking the embedding pipeline. this fixes code rabbits (Preserve provider override in create_embeddings_batch) suggesting
… sit in React state (archon-ui-main/src/components/

  settings/RAGSettings.tsx).
  - Normalized Ollama instance URLs and gated the metrics effect on real state changes to avoid mis-counts and duplicate
  fetches (RAGSettings.tsx).
  - Tightened crawl progress scaling and indented-block parsing to handle min_length=None safely (python/src/server/
  services/crawling/code_extraction_service.py:160, python/src/server/services/crawling/code_extraction_service.py:911).
  - Added provider-agnostic embedding rate-limit retries so Google and friends back off gracefully (python/src/server/
  services/embeddings/embedding_service.py:427).
  - Made the orchestration registry async + thread-safe and updated every caller to await it (python/src/server/services/
  crawling/crawling_service.py:34, python/src/server/api_routes/knowledge_api.py:1291).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (7)
archon-ui-main/src/components/settings/RAGSettings.tsx (7)

966-986: Make provider alert context-aware (chat vs embedding)

The alert uses ragSettings.LLM_PROVIDER only. When the activeSelection is “embedding”, the alert should reflect the embedding provider.

-  const selectedProviderKey = isProviderKey(ragSettings.LLM_PROVIDER)
-    ? (ragSettings.LLM_PROVIDER as ProviderKey)
-    : undefined;
-  const selectedProviderStatus = selectedProviderKey ? getProviderStatus(selectedProviderKey) : undefined;
+  const activeProviderKey: ProviderKey | undefined = (activeSelection === 'chat' ? chatProvider : embeddingProvider) as ProviderKey;
+  const selectedProviderStatus = activeProviderKey ? getProviderStatus(activeProviderKey) : undefined;
@@
-  if (selectedProviderKey === 'ollama') {
+  if (activeProviderKey === 'ollama') {
@@
-  } else if (selectedProviderKey && selectedProviderStatus === 'missing') {
-    providerAlertMessage = defaultProviderAlertMessages[selectedProviderKey] ?? null;
-    providerAlertClassName = providerAlertStyles[selectedProviderKey] ?? '';
+  } else if (activeProviderKey && selectedProviderStatus === 'missing') {
+    providerAlertMessage = defaultProviderAlertMessages[activeProviderKey] ?? null;
+    providerAlertClassName = providerAlertStyles[activeProviderKey] ?? '';

2308-2311: Normalize instance URLs before passing to selection modals

Minor consistency/readability improvement: reuse normalizeBaseUrl instead of ad-hoc string replacement.

-            selectedInstanceUrl={llmInstanceConfig.url.replace('/v1', '')}
+            selectedInstanceUrl={normalizeBaseUrl(llmInstanceConfig.url) ?? ''}
@@
-            selectedInstanceUrl={embeddingInstanceConfig.url.replace('/v1', '')}
+            selectedInstanceUrl={normalizeBaseUrl(embeddingInstanceConfig.url) ?? ''}

Also applies to: 2327-2329


2361-2388: Tighten types for helper: remove any and use RAGSettingsProps["ragSettings"]

Aligns with strict/no-implicit-any guidelines and improves editor assistance.

-function getDisplayedChatModel(ragSettings: any): string {
+function getDisplayedChatModel(ragSettings: RAGSettingsProps["ragSettings"]): string {

2390-2416: Tighten types for helper: remove any and use RAGSettingsProps["ragSettings"]

-function getDisplayedEmbeddingModel(ragSettings: any): string {
+function getDisplayedEmbeddingModel(ragSettings: RAGSettingsProps["ragSettings"]): string {

2419-2436: Narrow provider type in placeholders

Use ProviderKey instead of string.

-function getModelPlaceholder(provider: string): string {
+function getModelPlaceholder(provider: ProviderKey): string {

2438-2455: Narrow provider type in placeholders

Use ProviderKey instead of string.

-function getEmbeddingPlaceholder(provider: string): string {
+function getEmbeddingPlaceholder(provider: ProviderKey): string {

478-506: Consider TanStack Query for fetching and polling

Multiple ad-hoc fetches (provider status, Ollama health/metrics) would benefit from useQuery/useMutation with retry, caching, and staleTime/refetchInterval. This simplifies the polling intervals and state bookkeeping.

Would you like a small refactor PR extracting:

  • useProviderStatusQuery(provider)
  • useOllamaHealthQuery(instanceUrls)
  • useUpdateRagSettingsMutation()
    ?

Also applies to: 550-603, 736-816

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa354ca and fedf595.

📒 Files selected for processing (8)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (29 hunks)
  • python/src/server/api_routes/knowledge_api.py (2 hunks)
  • python/src/server/services/crawling/code_extraction_service.py (8 hunks)
  • python/src/server/services/crawling/crawling_service.py (7 hunks)
  • python/src/server/services/crawling/document_storage_operations.py (2 hunks)
  • python/src/server/services/embeddings/embedding_service.py (6 hunks)
  • python/src/server/services/storage/code_storage_service.py (4 hunks)
  • python/tests/test_code_extraction_source_id.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Never accept corrupted data in batch/continuation paths: skip failed items entirely (e.g., do not store zero embeddings, null FKs, or malformed JSON)
Use specific exception types; avoid catching bare Exception
Preserve full stack traces in logs (use exc_info=True with Python logging)
Never return None to indicate failure; raise an exception with details
Authentication/authorization failures must halt the operation and be clearly surfaced
Service startup, missing configuration, database connection, or critical dependency failures should crash fast with clear errors
During crawling/batch/background tasks and WebSocket events, continue processing other items but log failures with context
Include context (operation intent, relevant IDs/URLs/data) in error messages
Pydantic should raise on data corruption or validation errors; do not accept invalid inputs

Files:

  • python/src/server/services/crawling/document_storage_operations.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/crawling/crawling_service.py
  • python/src/server/api_routes/knowledge_api.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/services/**/*.py: For batch operations, report both success count and a detailed failure list
On external API calls, retry with exponential backoff and fail with a clear message after retries
Place business logic in service layer modules (API Route → Service → Database pattern)

Files:

  • python/src/server/services/crawling/document_storage_operations.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/crawling/crawling_service.py
python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting (errors, warnings, unused imports)

Files:

  • python/src/server/services/crawling/document_storage_operations.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/crawling/crawling_service.py
  • python/tests/test_code_extraction_source_id.py
  • python/src/server/api_routes/knowledge_api.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use MyPy for type checking to ensure type safety

python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code

Files:

  • python/src/server/services/crawling/document_storage_operations.py
  • python/src/server/services/crawling/code_extraction_service.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
  • python/src/server/services/crawling/crawling_service.py
  • python/src/server/api_routes/knowledge_api.py
python/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Write backend tests with Pytest (including async support)

Files:

  • python/tests/test_code_extraction_source_id.py
python/src/server/api_routes/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place HTTP route handlers in the API routes directory

Files:

  • python/src/server/api_routes/knowledge_api.py
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Disallow implicit any in TypeScript

archon-ui-main/src/**/*.{ts,tsx}: Frontend TypeScript must use strict mode with no implicit any
Use TanStack Query for all data fetching; avoid prop drilling
Use database values directly in the frontend; avoid mapping layers between BE and FE types

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,py}

📄 CodeRabbit inference engine (CLAUDE.md)

In code comments, avoid meta terms like SIMPLIFIED, ENHANCED, LEGACY, CHANGED, REMOVED; comment only on functionality and reasoning (do not mention beta/global rules)

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (2)
📓 Common learnings
Learnt from: zaksnet
PR: coleam00/Archon#473
File: archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx:26-31
Timestamp: 2025-09-10T11:35:20.655Z
Learning: When reviewing large refactoring PRs like "Clean multi provider feature", focus feedback on the core objectives (provider management, API keys, model configuration, service registry) rather than unrelated UI improvements in components that were only incidentally modified during the refactor.
📚 Learning: 2025-09-19T10:31:54.292Z
Learnt from: CR
PR: coleam00/Archon#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-19T10:31:54.292Z
Learning: Applies to python/src/server/**/*.py : Never accept corrupted data in batch/continuation paths: skip failed items entirely (e.g., do not store zero embeddings, null FKs, or malformed JSON)

Applied to files:

  • python/src/server/services/embeddings/embedding_service.py
🧬 Code graph analysis (6)
python/src/server/services/crawling/document_storage_operations.py (1)
python/src/server/services/crawling/code_extraction_service.py (1)
  • extract_and_store_code_examples (135-256)
python/src/server/services/embeddings/embedding_service.py (4)
python/src/server/config/logfire_config.py (1)
  • safe_span (151-172)
python/src/server/services/llm_provider_service.py (1)
  • get_llm_client (313-548)
python/src/server/services/credential_service.py (4)
  • get_credential (160-177)
  • get_credential (552-554)
  • get_active_provider (426-501)
  • get_credentials_by_category (323-369)
python/src/server/services/embeddings/embedding_exceptions.py (2)
  • EmbeddingAPIError (86-99)
  • EmbeddingRateLimitError (61-72)
python/src/server/services/storage/code_storage_service.py (2)
python/src/server/services/embeddings/embedding_service.py (1)
  • create_embeddings_batch (260-488)
python/src/server/services/llm_provider_service.py (1)
  • get_embedding_model (600-675)
python/src/server/services/crawling/crawling_service.py (4)
python/src/server/services/credential_service.py (1)
  • get_active_provider (426-501)
python/src/server/services/crawling/code_extraction_service.py (1)
  • extract_and_store_code_examples (135-256)
python/src/server/services/crawling/document_storage_operations.py (1)
  • extract_and_store_code_examples (347-382)
python/src/server/services/storage/storage_services.py (1)
  • code_progress_callback (180-187)
python/src/server/api_routes/knowledge_api.py (1)
python/src/server/services/crawling/crawling_service.py (2)
  • get_active_orchestration (39-42)
  • unregister_orchestration (51-54)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
archon-ui-main/src/features/ui/primitives/button.tsx (1)
  • Button (11-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (22)
python/src/server/services/embeddings/embedding_service.py (2)

133-141: Classify Google 429 responses as rate limits so the retry loop works

429s currently surface as EmbeddingAPIError, so the outer loop never retries and the entire batch fails. Surface them as EmbeddingRateLimitError instead to trigger the existing backoff flow.

         except httpx.HTTPStatusError as error:
-            error_content = error.response.text
-            search_logger.error(
-                f"Google embedding API returned {error.response.status_code} - {error_content}",
-                exc_info=True,
-            )
-            raise EmbeddingAPIError(
-                f"Google embedding API error: {error.response.status_code} - {error_content}",
-                original_error=error,
-            ) from error
+            error_content = error.response.text
+            status_code = error.response.status_code
+            if status_code == 429:
+                search_logger.warning(
+                    f"Google embedding API rate limited (429): {error_content}"
+                )
+                raise EmbeddingRateLimitError(
+                    f"Google embedding API rate limit: {error_content}"
+                ) from error
+            search_logger.error(
+                f"Google embedding API returned {status_code} - {error_content}",
+                exc_info=True,
+            )
+            raise EmbeddingAPIError(
+                f"Google embedding API error: {status_code} - {error_content}",
+                original_error=error,
+            ) from error

381-383: Ensure every text is accounted for when a provider under-returns embeddings

zip(..., strict=False) silently drops texts whenever a provider returns fewer embeddings than requested, so those inputs never get marked failed. That violates our “skip, don’t corrupt” contract and produces mismatched counts. Please handle the mismatch explicitly and record failures for missing vectors.

-                                    for text, vector in zip(batch, embeddings, strict=False):
-                                        result.add_success(vector, text)
+                                    embeddings_count = len(embeddings)
+                                    batch_count = len(batch)
+                                    if embeddings_count != batch_count:
+                                        search_logger.error(
+                                            f"Adapter returned {embeddings_count} embeddings for batch "
+                                            f"{batch_index} (expected {batch_count})",
+                                            exc_info=True,
+                                        )
+                                    for idx, text in enumerate(batch):
+                                        if idx < embeddings_count and isinstance(embeddings[idx], list):
+                                            result.add_success(embeddings[idx], text)
+                                        else:
+                                            result.add_failure(
+                                                text,
+                                                EmbeddingAPIError(
+                                                    f"Missing embedding from provider for batch "
+                                                    f"{batch_index}, index {idx}"
+                                                ),
+                                                batch_index,
+                                            )
python/tests/test_code_extraction_source_id.py (1)

114-115: Update tests to match new embedding_provider parameter arity.

The test correctly asserts that args[5] and args[6] are None, which aligns with the addition of the embedding_provider parameter to extract_and_store_code_examples in python/src/server/services/crawling/document_storage_operations.py. The function signature now has 7 parameters (including self), so args[5] and args[6] correspond to provider and embedding_provider respectively.

python/src/server/services/crawling/document_storage_operations.py (3)

355-355: LGTM! Properly added embedding_provider parameter.

The addition of the embedding_provider parameter with proper type annotation (str | None = None) follows the established pattern used throughout the codebase for provider overrides.


367-367: Documentation correctly updated for embedding_provider.

The docstring properly documents the new embedding_provider parameter, clearly explaining its purpose for overriding the embedding provider for code example embeddings.


372-380: Correctly threading embedding_provider through to code extraction service.

The call to self.code_extraction_service.extract_and_store_code_examples properly passes through all parameters including the new embedding_provider, maintaining the correct order and ensuring the embedding provider override is respected throughout the code extraction pipeline.

python/src/server/api_routes/knowledge_api.py (2)

1291-1291: LGTM! Correctly made get_active_orchestration async.

The addition of await for get_active_orchestration(progress_id) aligns with the async orchestration flow mentioned in the AI summary. This change is consistent with the broader refactoring to make orchestration functions async.


1309-1309: LGTM! Correctly made unregister_orchestration async.

The addition of await for unregister_orchestration(progress_id) properly handles the async nature of the orchestration registry operations, ensuring proper cleanup of orchestration services.

python/src/server/services/storage/code_storage_service.py (4)

1094-1094: LGTM! Added embedding_provider parameter with proper type annotation.

The addition of embedding_provider: str | None = None follows the established pattern for provider overrides used throughout the codebase.


1109-1111: Documentation properly updated for embedding_provider.

The docstring correctly documents both provider and embedding_provider parameters, clearly distinguishing their roles - provider for LLM summary generation tracking and embedding_provider for vector generation override.


1189-1190: LGTM! Correctly passing embedding_provider to create_embeddings_batch.

The change from:

result = await create_embeddings_batch(batch_texts)

to:

result = await create_embeddings_batch(batch_texts, provider=embedding_provider)

Properly preserves the embedding provider override throughout the vector creation pipeline. This addresses the past review comment about preserving provider override to avoid model/provider mismatch.


1207-1207: LGTM! Using embedding_provider for embedding model lookup.

The change to await get_embedding_model(provider=embedding_provider) ensures consistency between the embedding provider used for vector creation and the embedding model recorded in the database. This maintains proper alignment between the embedding provider and model metadata.

python/src/server/services/crawling/code_extraction_service.py (10)

143-143: LGTM! Added embedding_provider parameter with proper type annotation.

The addition of embedding_provider: str | None = None follows the established pattern and properly extends the function signature to support embedding provider overrides.


154-156: Documentation properly updated for both provider parameters.

The docstring clearly distinguishes between the two provider parameters:

  • provider: For LLM provider identifier used in summary generation
  • embedding_provider: For embedding provider override used in vector creation

This clarification helps maintain proper separation of concerns between chat/completion models and embedding models.


164-174: LGTM! Improved progress normalization logic.

The enhanced progress handling properly normalizes different progress formats:

  1. Extracts raw progress from either progress or percentage fields
  2. Handles type conversion with proper error handling
  3. Converts 0-1 range to 0-100 when needed
  4. Applies proper scaling with clamping

This provides more robust progress reporting across the extraction pipeline.


209-219: LGTM! Consistent progress normalization for summary phase.

The same robust progress normalization logic is applied to the summary phase (20-90% range), ensuring consistent progress reporting behavior across all phases of code extraction.


235-245: LGTM! Consistent progress normalization for storage phase.

The storage phase (90-100% range) uses the same robust progress handling, completing the consistent progress normalization across all three phases of code extraction.


250-256: LGTM! Properly threading embedding_provider through to storage.

The call to _store_code_examples correctly passes through the embedding_provider parameter, ensuring the provider override is preserved throughout the entire code extraction and storage pipeline.


1712-1712: LGTM! Added embedding_provider parameter to storage method.

The addition of embedding_provider: str | None = None parameter to _store_code_examples properly completes the embedding provider threading through the storage layer.


1720-1726: Documentation properly updated for storage method.

The docstring clearly documents all parameters including the new embedding_provider parameter, maintaining consistency with the broader embedding provider abstraction.


1763-1763: LGTM! Properly passing embedding_provider to add_code_examples_to_supabase.

The call to add_code_examples_to_supabase correctly includes the embedding_provider parameter, completing the end-to-end threading of embedding provider overrides from the API layer through to the database storage layer.


911-922: Fallback logic for min_length is correct
min_length is only overridden when explicitly passed; await self._get_min_code_length() runs solely if min_length is None, which matches the intended behavior.

Likely an incorrect or invalid review comment.

Comment on lines +17 to +19
// Providers that support embedding models
const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'ollama'];

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify embedding-capable providers list

OpenRouter may proxy embedding models (e.g., OpenAI embeddings) depending on backend support. If supported server-side, include it here so users can select it for embeddings.

Run to verify backend support for OpenRouter embeddings:


🏁 Script executed:

#!/bin/bash
# Search server code for OpenRouter embedding support
rg -nP -C3 '(?i)(openrouter).*(embed|embedding)|(embed|embedding).*(openrouter)'

Length of output: 2061


Add openrouter to embedding-capable providers

File: archon-ui-main/src/components/settings/RAGSettings.tsx:17

- const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'ollama'];
+ const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'openrouter', 'ollama'];
🤖 Prompt for AI Agents
In archon-ui-main/src/components/settings/RAGSettings.tsx around lines 17 to 19,
the EMBEDDING_CAPABLE_PROVIDERS constant is missing the 'openrouter' provider;
update the array of ProviderKey to include 'openrouter' (e.g., add 'openrouter'
to the list) so that OpenRouter is treated as embedding-capable by the
component.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openrouter doesnt provide embedding models

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chillbruhhh Thank you for the clarification! You're absolutely right. I apologize for the confusion. OpenRouter acts as a proxy/router for various LLM providers but doesn't provide its own embedding models. The current EMBEDDING_CAPABLE_PROVIDERS list is correct as-is with ['openai', 'google', 'ollama'].

I'll resolve my comment since the code is actually correct.


✏️ Learnings added
Learnt from: Chillbruhhh
PR: coleam00/Archon#736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.500Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: zaksnet
PR: coleam00/Archon#473
File: archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx:26-31
Timestamp: 2025-09-10T11:35:20.655Z
Learning: When reviewing large refactoring PRs like "Clean multi provider feature", focus feedback on the core objectives (provider management, API keys, model configuration, service registry) rather than unrelated UI improvements in components that were only incidentally modified during the refactor.

Comment thread archon-ui-main/src/components/settings/RAGSettings.tsx
Comment thread archon-ui-main/src/components/settings/RAGSettings.tsx
Comment thread archon-ui-main/src/components/settings/RAGSettings.tsx
Comment thread archon-ui-main/src/components/settings/RAGSettings.tsx
Comment thread python/src/server/services/crawling/crawling_service.py
Chillbruhhh added 2 commits September 25, 2025 17:11
…/Grok API key rows so fresh databases expose every

  provider by default.
  - migration/0.1.0/009_add_provider_placeholders.sql:1 backfills the same rows for existing Supabase instances and
  records the migration.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:121 introduces a shared credentialprovider map,
  reloadApiCredentials runs through all five providers, and the status poller includes the new keys.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:353 subscribes to the archon:credentials-updated browser event
  so adding/removing a key immediately refetches credential status and pings the corresponding connectivity test.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:926 now treats missing Anthropic/OpenRouter/Grok keys as
  missing, preventing stale connected badges when a key is removed.
…imple display-name map and reuses one red alert

  style.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:1016 now shows exactly one red banner when the active provider
  - Removed the old duplicate Missing API Key Configuration block, so the panel no longer stacks two warnings.
@Chillbruhhh
Copy link
Copy Markdown
Author

Chillbruhhh commented Sep 25, 2025

@leex279 fixed the issues you mentioned, added the table to migrations and complete db sql, unified the api key not recognized warning message to the same style, i had it setup to where it would do a provider health check every so often and didn't update immediately when you deleted an api key / added one without refreshing, fix that to where its instant. fixed the issue with api key warnings popping up in the embeddings area and vise versa.. shouldnt have any more issues, i didnt notice anything else on my side

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

470-473: Prevent runtime error on first provider status test

Spreading prev[provider] when undefined throws.

Apply:

   setProviderConnectionStatus(prev => ({
     ...prev,
-    [provider]: { ...prev[provider], checking: true }
+    [provider]: { ...(prev[provider] ?? {}), checking: true }
   }));
🧹 Nitpick comments (8)
archon-ui-main/src/components/settings/RAGSettings.tsx (8)

201-205: Initialize embedding provider from LLM provider when valid

Avoids initial UI mismatch when legacy settings lack EMBEDDING_PROVIDER.

-  const [embeddingProvider, setEmbeddingProvider] = useState<ProviderKey>(() =>
-    // Default to openai if no specific embedding provider is set
-    (ragSettings.EMBEDDING_PROVIDER as ProviderKey) || 'openai'
-  );
+  const [embeddingProvider, setEmbeddingProvider] = useState<ProviderKey>(() => {
+    const emb = ragSettings.EMBEDDING_PROVIDER as ProviderKey | undefined;
+    const llm = ragSettings.LLM_PROVIDER as ProviderKey | undefined;
+    return emb ?? (isProviderKey(llm) && EMBEDDING_CAPABLE_PROVIDERS.includes(llm) ? llm : 'openai');
+  });

477-479: Add a timeout to provider status fetch

Avoid indefinite hangs on stalled connections.

-      const response = await fetch(`/api/providers/${provider}/status`);
+      const response = await fetch(`/api/providers/${provider}/status`, { signal: AbortSignal.timeout(10000) });

63-82: Merge saved provider models with defaults to ensure all keys exist

Saved localStorage may be partial; merge to avoid undefined buckets.

-const loadProviderModels = (): ProviderModelMap => {
+const loadProviderModels = (): ProviderModelMap => {
   try {
     const saved = localStorage.getItem(PROVIDER_MODELS_KEY);
-    if (saved) {
-      return JSON.parse(saved);
-    }
+    const providers: ProviderKey[] = ['openai', 'google', 'openrouter', 'ollama', 'anthropic', 'grok'];
+    const defaults: ProviderModelMap = {} as ProviderModelMap;
+    providers.forEach(p => (defaults[p] = getDefaultModels(p)));
+    if (saved) {
+      const parsed = JSON.parse(saved) as Partial<ProviderModelMap>;
+      return { ...defaults, ...parsed };
+    }
   } catch (error) {
     console.error('Failed to load provider models:', error);
   }
 
-  // Return defaults for all providers if nothing saved
-  const providers: ProviderKey[] = ['openai', 'google', 'openrouter', 'ollama', 'anthropic', 'grok'];
-  const defaultModels: ProviderModelMap = {} as ProviderModelMap;
-
-  providers.forEach(provider => {
-    defaultModels[provider] = getDefaultModels(provider);
-  });
-
-  return defaultModels;
+  return (['openai', 'google', 'openrouter', 'ollama', 'anthropic', 'grok'] as ProviderKey[])
+    .reduce((acc, p) => { acc[p] = getDefaultModels(p); return acc; }, {} as ProviderModelMap);
 };

173-174: Type the setter to avoid any

Improves safety and IntelliSense.

-  setRagSettings: (settings: any) => void;
+  setRagSettings: React.Dispatch<React.SetStateAction<RAGSettingsProps["ragSettings"]>>;

2381-2383: Align OpenRouter chat default with getDefaultModels

Avoid inconsistent defaults between helpers.

-    case 'openrouter':
-      return 'anthropic/claude-3.5-sonnet';
+    case 'openrouter':
+      return 'openai/gpt-4o-mini';

2425-2431: Update Grok placeholder to match chosen default

Consistency with getDefaultModels.

-    case 'grok':
-      return 'e.g., grok-2-latest';
+    case 'grok':
+      return 'e.g., grok-3-mini';

2447-2450: Clarify OpenRouter embedding placeholder

OpenRouter doesn’t provide embeddings.

-    case 'openrouter':
-      return 'e.g., text-embedding-3-small';
+    case 'openrouter':
+      return 'OpenRouter does not provide embedding models';

Based on learnings


572-592: Prefer TanStack Query for data fetching and caching

Per guidelines, move repeated fetches (status checks, health, models) to react-query with proper keys and invalidation.

As per coding guidelines

Also applies to: 784-786, 367-371, 645-652

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dde9051 and b690695.

📒 Files selected for processing (3)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (29 hunks)
  • migration/0.1.0/009_add_provider_placeholders.sql (1 hunks)
  • migration/complete_setup.sql (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Disallow implicit any in TypeScript

archon-ui-main/src/**/*.{ts,tsx}: Frontend TypeScript must use strict mode with no implicit any
Use TanStack Query for all data fetching; avoid prop drilling
Use database values directly in the frontend; avoid mapping layers between BE and FE types

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,py}

📄 CodeRabbit inference engine (CLAUDE.md)

In code comments, avoid meta terms like SIMPLIFIED, ENHANCED, LEGACY, CHANGED, REMOVED; comment only on functionality and reasoning (do not mention beta/global rules)

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (2)
📓 Common learnings
Learnt from: zaksnet
PR: coleam00/Archon#473
File: archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx:26-31
Timestamp: 2025-09-10T11:35:20.655Z
Learning: When reviewing large refactoring PRs like "Clean multi provider feature", focus feedback on the core objectives (provider management, API keys, model configuration, service registry) rather than unrelated UI improvements in components that were only incidentally modified during the refactor.
Learnt from: Chillbruhhh
PR: coleam00/Archon#736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.500Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
📚 Learning: 2025-09-25T16:36:48.500Z
Learnt from: Chillbruhhh
PR: coleam00/Archon#736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.500Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)
archon-ui-main/src/services/credentialsService.ts (1)
  • credentialsService (604-604)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
archon-ui-main/src/features/ui/primitives/button.tsx (1)
  • Button (11-130)
🔇 Additional comments (5)
migration/complete_setup.sql (1)

102-107: LGTM — provider placeholders synced with migrations.

The complete setup now seeds all the expected provider API key rows, so fresh environments will match the incremental migrations.

migration/0.1.0/009_add_provider_placeholders.sql (1)

7-18: LGTM — idempotent seeding covers all new providers.

The migration cleanly adds the missing API key placeholders with the usual ON CONFLICT guard and logs the migration for tracking.

archon-ui-main/src/components/settings/RAGSettings.tsx (3)

17-19: Correct to exclude OpenRouter from embedding-capable providers

OpenRouter doesn’t offer embedding models. The current list is right.

Based on learnings


269-273: Fix crash when spreading possibly undefined provider state (chat models)

prev[chatProvider] can be undefined (legacy localStorage), and spreading it throws at runtime.

Apply:

           [chatProvider]: {
-            ...prev[chatProvider],
+            ...(prev[chatProvider] ?? getDefaultModels(chatProvider)),
             chatModel: ragSettings.MODEL_CHOICE
           }

286-293: Fix crash when spreading possibly undefined provider state (embedding models)

Same issue for embedding provider path.

Apply:

           [embeddingProvider]: {
-            ...prev[embeddingProvider],
+            ...(prev[embeddingProvider] ?? getDefaultModels(embeddingProvider)),
             embeddingModel: ragSettings.EMBEDDING_MODEL
           }

Comment thread archon-ui-main/src/services/credentialsService.ts Outdated
@Chillbruhhh Chillbruhhh requested a review from Wirasm September 27, 2025 12:37
@Chillbruhhh
Copy link
Copy Markdown
Author

Chillbruhhh commented Sep 28, 2025

so i fixed a dimension issue when raq querying data with googles models, didnt notice it until i started querying different dimensional data. I also looked into ollama after doing this, obviously you cant query data thats stored in a 1536 dimension if your model can only access 768 (for example) just curious what is everyone's thoughts here? should i look at a fallback or a note to users when using smaller dimensional models? I could see in the future to have an option to generate embeddings in every dimension if users wanted, but it would be a bit unnecessary, on resources, for most users since you can just configure your preferred model and stick with that.

@leex279
Copy link
Copy Markdown
Collaborator

leex279 commented Sep 29, 2025

@Chillbruhhh sry for the delay. I tested it now again and looks good to me execpt that Grok is still showing connected for me even there is no api key. Tried also in Incognito with a fresh database.

image

@Chillbruhhh
Copy link
Copy Markdown
Author

@Chillbruhhh sry for the delay. I tested it now again and looks good to me execpt that Grok is still showing connected for me even there is no api key. Tried also in Incognito with a fresh database.

image

All good ive been busy all day, however i just did the same, connected to a new database, rebuilt and all and this is what it shows me :

Screenshot 2025-09-29 214319 Screenshot 2025-09-29 214330

@Chillbruhhh
Copy link
Copy Markdown
Author

Chillbruhhh commented Sep 30, 2025

@leex279 i also tried hooking up to a vpn, currently im connected through a server in the Netherlands, i cant get the bug to replicate. what happens when you do connect an api key? does it show red (disconnected) as you mentioned before? Has cole or anyone else tested this besides you and me?

Screenshot 2025-09-29 225039

…ing update the other day, it didnt. pushing now
@coleam00
Copy link
Copy Markdown
Owner

coleam00 commented Oct 5, 2025

@Chillbruhhh Sorry I've been MIA on this PR for a while now - I will be testing this today/tomorrow and letting you know if I find the same issue as Thomas. Thank you for your continued work on this!

@coleam00
Copy link
Copy Markdown
Owner

coleam00 commented Oct 5, 2025

Okay this is working perfectly for me @Chillbruhhh and @leex279. Nice work Josh and thanks for helping with the testing Thomas! I'm going to merge this now.

@coleam00 coleam00 merged commit a580fdf into coleam00:main Oct 5, 2025
8 checks passed
@Chillbruhhh
Copy link
Copy Markdown
Author

Okay this is working perfectly for me @Chillbruhhh and @leex279. Nice work Josh and thanks for helping with the testing Thomas! I'm going to merge this now.

Yeah man, im happy to help anyway I can!

leonj1 pushed a commit to leonj1/Archon that referenced this pull request Oct 13, 2025
* Add Anthropic and Grok provider support

* feat: Add crucial GPT-5 and reasoning model support for OpenRouter

- Add requires_max_completion_tokens() function for GPT-5, o1, o3, Grok-3 series
- Add prepare_chat_completion_params() for reasoning model compatibility
- Implement max_tokens → max_completion_tokens conversion for reasoning models
- Add temperature handling for reasoning models (must be 1.0 default)
- Enhanced provider validation and API key security in provider endpoints
- Streamlined retry logic (3→2 attempts) for faster issue detection
- Add failure tracking and circuit breaker analysis for debugging
- Support OpenRouter format detection (openai/gpt-5-nano, openai/o1-mini)
- Improved Grok provider empty response handling with structured fallbacks
- Enhanced contextual embedding with provider-aware model selection

Core provider functionality:
- OpenRouter, Grok, Anthropic provider support with full embedding integration
- Provider-specific model defaults and validation
- Secure API connectivity testing endpoints
- Provider context passing for code generation workflows

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fully working model providers, addressing securtiy and code related concerns, throughly hardening our code

* added multiprovider support, embeddings model support, cleaned the pr, need to fix health check, asnyico tasks errors, and contextual embeddings error

* fixed contextual embeddings issue

* - Added inspect-aware shutdown handling so get_llm_client always closes the underlying AsyncOpenAI / httpx.AsyncClient while the loop is   still alive, with defensive logging if shutdown happens late (python/src/server/services/llm_provider_service.py:14, python/src/server/    services/llm_provider_service.py:520).

* - Restructured get_llm_client so client creation and usage live in separate try/finally blocks; fallback clients now close without         logging spurious Error creating LLM client when downstream code raises (python/src/server/services/llm_provider_service.py:335-556).    - Close logic now sanitizes provider names consistently and awaits whichever aclose/close coroutine the SDK exposes, keeping the loop      shut down cleanly (python/src/server/services/llm_provider_service.py:530-559).                                                                                                                                                                                                       Robust JSON Parsing                                                                                                                                                                                                                                                                   - Added _extract_json_payload to strip code fences / extra text returned by Ollama before json.loads runs, averting the markdown-induced   decode errors you saw in logs (python/src/server/services/storage/code_storage_service.py:40-63).                                          - Swapped the direct parse call for the sanitized payload and emit a debug preview when cleanup alters the content (python/src/server/     services/storage/code_storage_service.py:858-864).

* added provider connection support

* added provider api key not being configured warning

* Updated get_llm_client so missing OpenAI keys automatically fall back to Ollama (matching existing tests) and so unsupported providers     still raise the legacy ValueError the suite expects. The fallback now reuses _get_optimal_ollama_instance and rethrows ValueError(OpenAI  API key not found and Ollama fallback failed) when it cant connect.  Adjusted test_code_extraction_source_id.py to accept the new optional argument on the mocked extractor (and confirm its None when         present).

* Resolved a few needed code rabbit suggestion   - Updated the knowledge API key validation to call create_embedding with the provider argument and removed the hard-coded OpenAI fallback  (python/src/server/api_routes/knowledge_api.py).                                                                                           - Broadened embedding provider detection so prefixed OpenRouter/OpenAI model names route through the correct client (python/src/server/    services/embeddings/embedding_service.py, python/src/server/services/llm_provider_service.py).                                             - Removed the duplicate helper definitions from llm_provider_service.py, eliminating the stray docstring that was causing the import-time  syntax error.

* updated via code rabbit PR review, code rabbit in my IDE found no issues and no nitpicks with the updates! what was done:    Credential service now persists the provider under the uppercase key LLM_PROVIDER, matching the read path (no new EMBEDDING_PROVIDER     usage introduced).                                                                                                                          Embedding batch creation stops inserting blank strings, logging failures and skipping invalid items before they ever hit the provider    (python/src/server/services/embeddings/embedding_service.py).                                                                               Contextual embedding prompts use real newline characters everywhereboth when constructing the batch prompt and when parsing the         models response (python/src/server/services/embeddings/contextual_embedding_service.py).                                                   Embedding provider routing already recognizes OpenRouter-prefixed OpenAI models via is_openai_embedding_model; no further change needed  there.                                                                                                                                      Embedding insertion now skips unsupported vector dimensions instead of forcing them into the 1536-column, and the backoff loop uses      await asyncio.sleep so we no longer block the event loop (python/src/server/services/storage/code_storage_service.py).                      RAG settings props were extended to include LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME, and the debug log no longer prints     API-key prefixes (the rest of the TanStack refactor/EMBEDDING_PROVIDER support remains deferred).

* test fix

* enhanced Openrouters parsing logic to automatically detect reasoning models and parse regardless of json output or not. this commit creates a robust way for archons parsing to work throughly with openrouter automatically, regardless of the model youre using, to ensure proper functionality with out breaking any generation capabilities!

* updated ui llm interface, added seprate embeddings provider, made the system fully capabale of mix and matching llm providers (local and non local) for chat & embeddings. updated the ragsettings.tsx ui mainly, along with core functionality

* added warning labels and updated ollama health checks

* ready for review, fixed som error warnings and consildated ollama status health checks

* fixed FAILED test_async_embedding_service.py

* code rabbit fixes

* Separated the code-summary LLM provider from the embedding provider, so code example storage now forwards a dedicated embedding provider override end-to-end without hijacking the embedding pipeline. this fixes code rabbits (Preserve provider override in create_embeddings_batch) suggesting

* - Swapped API credential storage to booleans so decrypted keys never sit in React state (archon-ui-main/src/components/
  settings/RAGSettings.tsx).
  - Normalized Ollama instance URLs and gated the metrics effect on real state changes to avoid mis-counts and duplicate
  fetches (RAGSettings.tsx).
  - Tightened crawl progress scaling and indented-block parsing to handle min_length=None safely (python/src/server/
  services/crawling/code_extraction_service.py:160, python/src/server/services/crawling/code_extraction_service.py:911).
  - Added provider-agnostic embedding rate-limit retries so Google and friends back off gracefully (python/src/server/
  services/embeddings/embedding_service.py:427).
  - Made the orchestration registry async + thread-safe and updated every caller to await it (python/src/server/services/
  crawling/crawling_service.py:34, python/src/server/api_routes/knowledge_api.py:1291).

* Update RAGSettings.tsx - header for 'LLM Settings' is now 'LLM Provider Settings'

* (RAG Settings)

  - Ollama Health Checks & Metrics
      - Added a 10-second timeout to the health fetch so it doesn't hang.
      - Adjusted logic so metric refreshes run for embedding-only Ollama setups too.
      - Initial page load now checks Ollama if either chat or embedding provider uses it.
      - Metrics and alerts now respect which provider (chat/embedding) is currently selected.
  - Provider Sync & Alerts
      - Fixed a sync bug so the very first provider change updates settings as expected.
      - Alerts now track the active provider (chat vs embedding) rather than only the LLM provider.
      - Warnings about missing credentials now skip whichever provider is currently selected.
  - Modals & Types
      - Normalize URLs before handing them to selection modals to keep consistent data.
      - Strengthened helper function types (getDisplayedChatModel, getModelPlaceholder, etc.).

 (Crawling Service)

  - Made the orchestration registry lock lazy-initialized to avoid issues in Python 3.12 and wrapped registry commands
  (register, unregister) in async calls. This keeps things thread-safe even during concurrent crawling and cancellation.

* - migration/complete_setup.sql:101 seeds Google/OpenRouter/Anthropic/Grok API key rows so fresh databases expose every
  provider by default.
  - migration/0.1.0/009_add_provider_placeholders.sql:1 backfills the same rows for existing Supabase instances and
  records the migration.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:121 introduces a shared credentialprovider map,
  reloadApiCredentials runs through all five providers, and the status poller includes the new keys.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:353 subscribes to the archon:credentials-updated browser event
  so adding/removing a key immediately refetches credential status and pings the corresponding connectivity test.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:926 now treats missing Anthropic/OpenRouter/Grok keys as
  missing, preventing stale connected badges when a key is removed.

* - archon-ui-main/src/components/settings/RAGSettings.tsx:90 adds a simple display-name map and reuses one red alert
  style.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:1016 now shows exactly one red banner when the active provider
  - Removed the old duplicate Missing API Key Configuration block, so the panel no longer stacks two warnings.

* Update credentialsService.ts default model

* updated the google embedding adapter for multi dimensional rag querying

* thought this micro fix in the google embedding pushed with the embedding update the other day, it didnt. pushing now

---------

Co-authored-by: Chillbruhhh <joshchesser97@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants