Skip to content

Fix Issue #362: Provider-agnostic error handling for all LLM providers#650

Merged
Wirasm merged 8 commits intomainfrom
feature/provider-agnostic-error-handling
Sep 17, 2025
Merged

Fix Issue #362: Provider-agnostic error handling for all LLM providers#650
Wirasm merged 8 commits intomainfrom
feature/provider-agnostic-error-handling

Conversation

@leex279
Copy link
Copy Markdown
Collaborator

@leex279 leex279 commented Sep 13, 2025

Summary

Implements generic error handling that works for all LLM providers (OpenAI, Google AI, Anthropic, etc.) to solve Issue #362 - preventing silent failures and 90-minute debugging sessions.

Problem Solved

  • Before: Silent failures with empty results for any provider with invalid API keys
  • After: Immediate clear error messages: "Please verify your [Provider] API key in Settings"

Provider Support

OpenAI: sk-* key validation and error handling
Google AI: AIza* key validation and error handling
Anthropic: sk-ant-* key validation and error handling
Future providers: Extensible adapter pattern

Essential Files Only (5 files changed)

Backend (2 files)

  1. provider_error_adapters.py (new) - Provider-agnostic error handling framework
  2. knowledge_api.py - API key validation before crawl operations + authentication exception

Frontend (2 files)

  1. providerErrorHandler.ts (new) - Generic error parsing for any provider
  2. useKnowledgeQueries.ts - Enhanced error messages in TanStack Query hooks

Infrastructure (1 file)

  1. embedding_exceptions.py - Added EmbeddingAuthenticationError class

Key Features

🛡️ Universal validation: Works for any LLM provider
💬 Provider-specific messages: "OpenAI/Google/Anthropic API key invalid"
🔒 Security: Provider-specific data sanitization
🏗️ Architecture: Uses existing toast system and TanStack Query

Error Examples

  • OpenAI: "Please verify your OpenAI API key in Settings"
  • Google AI: "Please verify your Google API key in Settings"
  • Anthropic: "Please verify your Anthropic API key in Settings"

No Unnecessary Changes

❌ No linting cleanup
❌ No whitespace changes
❌ No unrelated file modifications
✅ Only essential error handling logic

Test Plan

  • OpenAI invalid key → Clear error message
  • Google AI invalid key → Clear error message
  • Anthropic invalid key → Clear error message
  • Generic toast notifications work for all providers
  • Existing functionality preserved

Closes

Fixes #362

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Preflight API key validation before crawling, refreshing, or uploading knowledge items.
    • New provider-aware error handling that surfaces clearer, consistent messages for LLM providers.
  • Bug Fixes

    • Improved crawl failure messaging with a sensible fallback.
    • Clearer explanations for authentication, quota, and rate-limit issues.
    • Reduced exposure of sensitive information in error outputs (masked API key details).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 13, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds provider-agnostic error parsing and user-facing messages on the frontend, introduces provider-specific error sanitizers and an authentication embedding error on the backend, and performs preflight provider API key validation before knowledge crawl/refresh/upload operations.

Changes

Cohort / File(s) Summary
Frontend: Hook error messaging
archon-ui-main/src/features/knowledge/hooks/useKnowledgeQueries.ts
Replaces direct error.message usage with getProviderErrorMessage(error) in the onError path and adds the import.
Frontend: Provider error utilities (new)
archon-ui-main/src/features/knowledge/utils/providerErrorHandler.ts, archon-ui-main/src/features/knowledge/utils/index.ts
New provider error normalization: ProviderError interface, parseProviderError, getProviderErrorMessage; re-export added to utils index.
Backend: Knowledge API preflight validation
python/src/server/api_routes/knowledge_api.py
Adds _validate_provider_api_key(provider) and invokes it before refresh_knowledge_item, crawl_knowledge_item, and upload_document; integrates credential lookup and provider error sanitization with logging.
Backend: Embedding exceptions
python/src/server/services/embeddings/embedding_exceptions.py
Adds EmbeddingAuthenticationError (masks api_key_prefix) and extends EmbeddingValidationError to accept/store an optional embedding sample.
Backend: Provider error adapters (new)
python/src/server/services/embeddings/provider_error_adapters.py
Adds ProviderErrorAdapter base, concrete adapters for OpenAI/Google/Anthropic, and ProviderErrorFactory for detection and sanitization of provider errors.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant UI as UI
  participant Hook as useKnowledgeQueries (onError)
  participant Util as providerErrorHandler.ts

  UI->>Hook: Operation fails with error
  Hook->>Util: getProviderErrorMessage(error)
  Util->>Util: parseProviderError(error)
  Util-->>Hook: User-facing message
  Hook-->>UI: Show toast/snackbar with message
  note over Util: Normalizes provider, status code, and type
Loading
sequenceDiagram
  autonumber
  participant Client as Client (UI)
  participant API as knowledge_api.py
  participant Cred as credential_service
  participant Prov as Embedding Provider
  participant Err as ProviderErrorFactory

  Client->>API: POST /knowledge/{crawl|refresh|upload}
  API->>Cred: get_active_provider("embedding")
  API->>API: _validate_provider_api_key(provider)
  API->>Prov: Minimal embedding request
  alt 401/invalid key or provider error
    Prov-->>API: Error
    API->>Err: sanitize_provider_error(message, provider)
    Err-->>API: Sanitized message
    API-->>Client: 401 Unauthorized with sanitized details
  else Success
    Prov-->>API: OK
    API->>API: Proceed with crawl/refresh/upload
    API-->>Client: 200/202 Success
  end
  note over API,Err: Provider-agnostic sanitization and logs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • coleam00

Poem

I thump my paw—no credits? Oh dear!
Now errors sing and secrets don't leak, clear.
Keys get checked before we hop and crawl,
Sanitized messages keep panic small.
I nibble logs and dance with delight—bug fixes in sight. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Fix Issue #362: Provider-agnostic error handling for all LLM providers" is a concise, single-sentence summary that accurately reflects the primary intent of the changeset. It directly relates to the main changes described (adding provider-agnostic error handling across LLM providers) and includes the issue reference for traceability. It avoids extraneous details and will be meaningful to teammates scanning commit history.
Linked Issues Check ✅ Passed The changes implement provider-agnostic error adapters, server-side API key preflight validation in knowledge_api for crawl/upload flows, front-end provider error parsing and improved TanStack Query error messages, and a new EmbeddingAuthenticationError class, which directly address the linked issue's goals to surface authentication and provider-side usage errors and prevent silent empty results in Knowledge Base/RAG flows. The PR adds logging and sanitized, provider-specific messages so invalid or exhausted API keys should no longer fail silently. Based on the provided summaries, the linked issue objectives appear to be satisfied by these changes.
Out of Scope Changes Check ✅ Passed All added and modified files are focused on provider-agnostic error handling and related exception types (backend provider adapters and preflight checks, frontend error parsing and query error handling, and embedding exception enrichment), which align with the PR objectives and linked issue; no unrelated linting, formatting, or extraneous refactors are evident. The changes appear scoped to the stated objective, with the only notable API-surface additions being new exception classes and enriched metadata for validation errors, which are relevant to error handling. Therefore there are no apparent out-of-scope changes introduced by this PR.
Description Check ✅ Passed The PR description is detailed and includes a clear summary, problem statement, file-level change list, provider support matrix, and a test plan with completed checks, and it references the linked issue and key files changed. It does not strictly match the repository template because it omits the explicit "Type of Change" and "Affected Services" checkbox sections and lacks concrete test evidence commands/logs under the Testing section. Overall the description is mostly complete for a reviewer but should be aligned to the template for consistency and automation.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 006a36d and 6960204.

📒 Files selected for processing (2)
  • archon-ui-main/src/features/knowledge/utils/index.ts (1 hunks)
  • archon-ui-main/src/features/knowledge/utils/providerErrorHandler.ts (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@leex279
Copy link
Copy Markdown
Collaborator Author

leex279 commented Sep 13, 2025

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (11)
python/src/server/services/embeddings/embedding_exceptions.py (1)

102-116: Gate API key prefix storage behind an explicit opt-in to avoid leaking secrets.

Even with a 3-char mask, persisting any portion of a key can be sensitive. Make storing the masked prefix an opt-in for debugging only.

Apply:

-class EmbeddingAuthenticationError(EmbeddingError):
+class EmbeddingAuthenticationError(EmbeddingError):
@@
-    def __init__(self, message: str, api_key_prefix: str | None = None, **kwargs):
-        super().__init__(message, **kwargs)
-        # Store masked API key prefix for debugging
-        self.api_key_prefix = api_key_prefix[:3] + "…" if api_key_prefix and len(api_key_prefix) >= 3 else None
-        if self.api_key_prefix:
-            self.metadata["api_key_prefix"] = self.api_key_prefix
+    def __init__(
+        self,
+        message: str,
+        api_key_prefix: str | None = None,
+        *,
+        include_prefix: bool = False,
+        **kwargs,
+    ):
+        super().__init__(message, **kwargs)
+        # Only store masked API key prefix when explicitly enabled (e.g., in DEBUG)
+        self.api_key_prefix = (
+            (api_key_prefix[:3] + "…") if include_prefix and api_key_prefix and len(api_key_prefix) >= 3 else None
+        )
+        if self.api_key_prefix:
+            self.metadata["api_key_prefix"] = self.api_key_prefix
archon-ui-main/src/features/knowledge/utils/providerErrorHandler.ts (3)

16-24: Capture status from common HTTP client shapes (response.status).

Improve normalization so Axios/Fetch-style errors are recognized.

 export function parseProviderError(error: any): ProviderError {
   const providerError = error as ProviderError;
   
   // Check if this is a structured provider error from backend
   if (error && typeof error === 'object') {
-    if (error.statusCode || error.status) {
-      providerError.statusCode = error.statusCode || error.status;
+    const respStatus = (error.response && error.response.status) || undefined;
+    if (error.statusCode || error.status || respStatus !== undefined) {
+      providerError.statusCode = error.statusCode ?? error.status ?? respStatus;
     }

25-38: Also handle structured errors when error.detail is an object (not a JSON string).

Backends (FastAPI) often return detail as an object. Parse both forms.

-    // Parse backend error structure
-    if (error.message && error.message.includes('detail')) {
+    // Parse backend error structure
+    if (error?.detail && typeof error.detail === 'object') {
+      const d = error.detail;
+      if (d.error_type) {
+        providerError.isProviderError = true;
+        providerError.provider = d.provider || 'LLM';
+        providerError.errorType = d.error_type;
+        providerError.message = d.message || error.message;
+      }
+    } else if (error.message && error.message.includes('detail')) {
       try {
         const parsed = JSON.parse(error.message);
         if (parsed.detail && parsed.detail.error_type) {
           providerError.isProviderError = true;
           providerError.provider = parsed.detail.provider || 'LLM';
           providerError.errorType = parsed.detail.error_type;
           providerError.message = parsed.detail.message || error.message;
         }
       } catch {
         // If parsing fails, use message as-is
       }
     }

65-71: Map generic 429 and 403 errors to friendly messages.

Covers rate-limit and forbidden cases when backend isn’t structured.

   // Handle status codes for non-structured errors
   if (parsed.statusCode === 401) {
     return "Please verify your API key in Settings.";
   }
+  if (parsed.statusCode === 429) {
+    return "Rate limit exceeded. Please wait and try again.";
+  }
+  if (parsed.statusCode === 403) {
+    return "Access forbidden. Please check your project/org permissions.";
+  }
archon-ui-main/src/features/knowledge/hooks/useKnowledgeQueries.ts (3)

277-279: Remove redundant fallback.

getProviderErrorMessage always returns a string; the || "Failed to start crawl" won’t execute.

-      const errorMessage = getProviderErrorMessage(error) || "Failed to start crawl";
+      const errorMessage = getProviderErrorMessage(error);

443-456: Reuse provider-aware messages for uploads.

Keeps UX consistent across crawl and upload errors.

-      // Display the actual error message from backend
-      const message = error instanceof Error ? error.message : "Failed to upload document";
-      showToast(message, "error");
+      // Display provider-aware message
+      const message = getProviderErrorMessage(error);
+      showToast(message, "error");

606-610: Apply provider-aware message on refresh failures too.

Aligns all flows to the same error normalization.

-      const errorMessage = error instanceof Error ? error.message : "Failed to refresh item";
+      const errorMessage = getProviderErrorMessage(error);
       showToast(errorMessage, "error");
python/src/server/services/embeddings/provider_error_adapters.py (2)

103-120: Harden adapter lookup and add types. Also fix docstring claim about Ollama.

-"""
-Provider-agnostic error handling for LLM embedding services.
-
-Supports OpenAI, Google AI, Anthropic, Ollama, and future providers
-with unified error handling and sanitization patterns.
-"""
+"""
+Provider-agnostic error handling for LLM embedding services.
+
+Supports OpenAI, Google AI, Anthropic (extensible to other providers)
+with unified error handling and sanitization patterns.
+"""
@@
-class ProviderErrorFactory:
+class ProviderErrorFactory:
@@
-    _adapters = {
+    _adapters: dict[str, ProviderErrorAdapter] = {
         "openai": OpenAIErrorAdapter(),
         "google": GoogleAIErrorAdapter(),
         "anthropic": AnthropicErrorAdapter(),
     }
@@
-    def get_adapter(cls, provider: str) -> ProviderErrorAdapter:
-        return cls._adapters.get(provider.lower(), cls._adapters["openai"])
+    def get_adapter(cls, provider: str | None) -> ProviderErrorAdapter:
+        if not provider:
+            return cls._adapters["openai"]
+        return cls._adapters.get(provider.lower(), cls._adapters["openai"])

11-16: Remove unused exception imports to satisfy linters.

-from .embedding_exceptions import (
-    EmbeddingAPIError,
-    EmbeddingAuthenticationError,
-    EmbeddingQuotaExhaustedError,
-    EmbeddingRateLimitError,
-)
+# (No direct use of exception classes here; sanitization only)
python/src/server/api_routes/knowledge_api.py (2)

651-653: Avoid per-request preflight cost; cache recent success.

Preflight on every crawl adds latency and cost. Cache a recent successful check in-memory (e.g., 5–10 minutes) and skip until expiry.

Example (outside this function):

# module-level cache
_last_api_key_ok: dict[str, float] = {}  # provider -> epoch seconds
_VALIDATION_TTL_SEC = 600

Then:

-    await _validate_provider_api_key()
+    from time import time
+    provider = None  # or read from config
+    now = time()
+    last_ok = _last_api_key_ok.get(provider or "openai", 0)
+    if now - last_ok > _VALIDATION_TTL_SEC:
+        await _validate_provider_api_key(provider)
+        _last_api_key_ok[provider or "openai"] = time()

66-106: Preflight: pass provider through; broaden auth detection; optionally handle rate-limits

File: python/src/server/api_routes/knowledge_api.py Lines: 66-106

  • Verified: create_embedding signature is async def create_embedding(text: str, provider: str | None = None) -> list[float]; call await create_embedding(text="test", provider=provider).
  • Resolve adapter first: adapter = ProviderErrorFactory.get_adapter(provider); provider_name = adapter.get_provider_name() — use provider_name for messages, provider for the call.
  • Broaden auth detection to include "unauthorized" (case-insensitive) and detect 429 / "rate limit" to raise HTTPException(status_code=429, error_type="rate_limit").
  • No TypeError fallback needed (provider param supported).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce2f871 and b5ee99a.

📒 Files selected for processing (5)
  • archon-ui-main/src/features/knowledge/hooks/useKnowledgeQueries.ts (2 hunks)
  • archon-ui-main/src/features/knowledge/utils/providerErrorHandler.ts (1 hunks)
  • python/src/server/api_routes/knowledge_api.py (2 hunks)
  • python/src/server/services/embeddings/embedding_exceptions.py (1 hunks)
  • python/src/server/services/embeddings/provider_error_adapters.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-28T13:07:24.810Z
Learnt from: Wirasm
PR: coleam00/Archon#514
File: archon-ui-main/src/services/crawlProgressService.ts:35-39
Timestamp: 2025-08-28T13:07:24.810Z
Learning: The crawlProgressService.ts in the Archon codebase should be deprecated in favor of the existing useCrawlProgressPolling hook from usePolling.ts, which already includes ETag support, 304 handling, tab visibility detection, and proper React lifecycle integration. This consolidation reduces code duplication and improves performance.

Applied to files:

  • archon-ui-main/src/features/knowledge/hooks/useKnowledgeQueries.ts
🔇 Additional comments (2)
python/src/server/services/embeddings/embedding_exceptions.py (1)

126-131: LGTM: bounded diagnostic payload.

Capturing only the first 10 embedding values is a safe, low-noise hint for triage without bloating logs/telemetry.

archon-ui-main/src/features/knowledge/hooks/useKnowledgeQueries.ts (1)

14-14: LGTM: centralizing provider-aware error messaging.

Comment on lines +56 to +58
def _sanitize_provider_error(error_message: str, provider: str = None) -> str:
"""Sanitize provider-specific error messages to prevent information disclosure."""
from ..services.embeddings.provider_error_adapters import ProviderErrorFactory

if not provider:
provider = ProviderErrorFactory.detect_provider_from_error(error_message)

return ProviderErrorFactory.sanitize_provider_error(error_message, provider)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Sanitize provider messages at every outward boundary (logs, progress, HTTP).

You added a sanitizer but it isn’t applied where errors are surfaced. Without it, secrets (e.g., API keys) can leak via tracker.error and logs.

Apply targeted sanitization in the crawl and upload exception paths:

@@ async def _perform_crawl_with_progress(
-        except Exception as e:
-            error_message = f"Crawling failed: {str(e)}"
+        except Exception as e:
+            sanitized = _sanitize_provider_error(str(e))
+            error_message = f"Crawling failed: {sanitized}"
             safe_logfire_error(
-                f"Crawl failed | progress_id={progress_id} | error={error_message} | exception_type={type(e).__name__}"
+                f"Crawl failed | progress_id={progress_id} | error={error_message} | exception_type={type(e).__name__}"
             )
@@
-            logger.error(f"Error: {error_message}")
+            logger.error(f"Error: {error_message}")
@@
-                await tracker.error(error_message)
+                await tracker.error(error_message)
@@ async def _perform_upload_with_progress(
-    except Exception as e:
-        error_msg = f"Upload failed: {str(e)}"
+    except Exception as e:
+        error_msg = f"Upload failed: {_sanitize_provider_error(str(e))}"
         await tracker.error(error_msg)
         logger.error(f"Document upload failed: {e}", exc_info=True)
         safe_logfire_error(
             f"Document upload failed | progress_id={progress_id} | filename={file_metadata.get('filename', 'unknown')} | error={str(e)}"
         )

And when returning immediate error to caller on crawl start:

@@ async def crawl_knowledge_item(request: KnowledgeItemRequest):
-    except Exception as e:
-        safe_logfire_error(f"Failed to start crawl | error={str(e)} | url={str(request.url)}")
-        raise HTTPException(status_code=500, detail=str(e))
+    except Exception as e:
+        sanitized = _sanitize_provider_error(str(e))
+        safe_logfire_error(f"Failed to start crawl | error={sanitized} | url={str(request.url)}")
+        raise HTTPException(status_code=500, detail={"error": sanitized})

Committable suggestion skipped: line range outside the PR's diff.

Comment thread python/src/server/services/embeddings/provider_error_adapters.py
Comment thread python/src/server/services/embeddings/provider_error_adapters.py
Comment thread python/src/server/services/embeddings/provider_error_adapters.py
Comment thread python/src/server/services/embeddings/provider_error_adapters.py Outdated
@leex279
Copy link
Copy Markdown
Collaborator Author

leex279 commented Sep 13, 2025

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/src/server/api_routes/knowledge_api.py (1)

910-915: Sanitize and normalize upload start failure.

Return a dict detail and redact secrets in logs.

Apply this diff:

-    except Exception as e:
-        safe_logfire_error(
-            f"Failed to start document upload | error={str(e)} | filename={file.filename} | error_type={type(e).__name__}"
-        )
-        raise HTTPException(status_code=500, detail={"error": str(e)})
+    except Exception as e:
+        sanitized = _sanitize_provider_error(str(e))
+        safe_logfire_error(
+            f"Failed to start document upload | error={sanitized} | filename={file.filename} | error_type={type(e).__name__}"
+        )
+        raise HTTPException(status_code=500, detail={"error": sanitized})
♻️ Duplicate comments (3)
python/src/server/api_routes/knowledge_api.py (3)

747-750: Sanitize error and return consistent detail shape.

This leaks raw provider messages and returns a string detail, unlike other endpoints.

Apply this diff:

-    except Exception as e:
-        safe_logfire_error(f"Failed to start crawl | error={str(e)} | url={str(request.url)}")
-        raise HTTPException(status_code=500, detail=str(e))
+    except Exception as e:
+        sanitized = _sanitize_provider_error(str(e))
+        safe_logfire_error(f"Failed to start crawl | error={sanitized} | url={str(request.url)}")
+        raise HTTPException(status_code=500, detail={"error": sanitized})

810-829: Sanitize before logging and tracker.error.

Raw errors may contain keys/URLs; ensure redaction at this outward boundary.

Apply this diff:

-        except Exception as e:
-            error_message = f"Crawling failed: {str(e)}"
+        except Exception as e:
+            sanitized = _sanitize_provider_error(str(e))
+            error_message = f"Crawling failed: {sanitized}"
             safe_logfire_error(
                 f"Crawl failed | progress_id={progress_id} | error={error_message} | exception_type={type(e).__name__}"
             )
             import traceback
 
             tb = traceback.format_exc()
             # Ensure the error is visible in logs
             logger.error(f"=== CRAWL ERROR FOR {progress_id} ===")
             logger.error(f"Error: {error_message}")
             logger.error(f"Exception Type: {type(e).__name__}")
             logger.error(f"Traceback:\n{tb}")
             logger.error("=== END CRAWL ERROR ===")
             safe_logfire_error(f"Crawl exception traceback | traceback={tb}")
             # Ensure clients see the failure
             try:
                 await tracker.error(error_message)
             except Exception:
                 pass

1021-1026: Sanitize upload errors surfaced to users/logs.

Mask provider errors before logging/tracker updates.

Apply this diff:

-    except Exception as e:
-        error_msg = f"Upload failed: {str(e)}"
+    except Exception as e:
+        error_msg = f"Upload failed: {_sanitize_provider_error(str(e))}"
         await tracker.error(error_msg)
-        logger.error(f"Document upload failed: {e}", exc_info=True)
+        logger.error(f"Document upload failed: {error_msg}", exc_info=True)
         safe_logfire_error(
-            f"Document upload failed | progress_id={progress_id} | filename={file_metadata.get('filename', 'unknown')} | error={str(e)}"
+            f"Document upload failed | progress_id={progress_id} | filename={file_metadata.get('filename', 'unknown')} | error={error_msg}"
         )
🧹 Nitpick comments (3)
python/src/server/api_routes/knowledge_api.py (3)

565-569: Run validation behind TTL and, if available, pass actual provider.

You call validation on every refresh. Use the TTL cache in _validate_provider_api_key and pass the active provider (if discoverable from config) to ensure accurate messages.


688-692: Same preflight note: rely on TTL and pass provider when known.

Avoid validating on every crawl start; let the function’s TTL short-circuit and pass the active provider for correct UX.


846-851: Same preflight note for upload: use TTL and provider if known.

Preflight on every upload can be noisy/costly; the TTL cache will mitigate this.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5ee99a and 4cf687b.

📒 Files selected for processing (1)
  • python/src/server/api_routes/knowledge_api.py (4 hunks)

Comment thread python/src/server/api_routes/knowledge_api.py Outdated
Comment thread python/src/server/api_routes/knowledge_api.py
@leex279 leex279 marked this pull request as ready for review September 13, 2025 07:49
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (7)
python/src/server/services/embeddings/provider_error_adapters.py (4)

41-49: OpenAI key redaction uses fixed length (48) — make regex robust and cover sk-proj-

Fixed-length and narrow character-classes risk leaks; also prefer redacting Bearer with \S+.

-        patterns = [
-            (r'sk-[a-zA-Z0-9]{48}', '[REDACTED_KEY]'),                 # OpenAI API keys
-            (r'https?://[^\s]*openai\.com[^\s]*', '[REDACTED_URL]'),   # OpenAI URLs
-            (r'org-[a-zA-Z0-9]{20,}', '[REDACTED_ORG]'),              # Organization IDs
-            (r'proj_[a-zA-Z0-9]{10,}', '[REDACTED_PROJECT]'),         # Project IDs
-            (r'req_[a-zA-Z0-9]{10,}', '[REDACTED_REQUEST]'),          # Request IDs
-            (r'Bearer\s+[a-zA-Z0-9._-]+', 'Bearer [REDACTED_TOKEN]'), # Bearer tokens
-        ]
+        patterns = [
+            # OpenAI API keys: sk-..., incl. sk-proj-..., variable length
+            (r'\bsk-(?:proj-[A-Za-z0-9_-]{8,}|[A-Za-z0-9_-]{16,})\b', '[REDACTED_KEY]'),
+            (r'\borg-[A-Za-z0-9]{12,}\b', '[REDACTED_ORG]'),
+            (r'\bproj_[A-Za-z0-9]{6,}\b', '[REDACTED_PROJECT]'),
+            (r'\breq_[A-Za-z0-9]{6,}\b', '[REDACTED_REQUEST]'),
+            (r'Bearer\s+\S+', 'Bearer [REDACTED_TOKEN]'),
+            (r'https?://\S*openai\.com\S*', '[REDACTED_URL]'),
+        ]

72-80: Google API key redaction uses fixed length (35) — make regex robust

Keys vary; redact with a variable-length pattern and tighten other replacements.

-        patterns = [
-            (r'AIza[a-zA-Z0-9_-]{35}', '[REDACTED_KEY]'),                     # Google AI API keys
-            (r'https?://[^\s]*googleapis\.com[^\s]*', '[REDACTED_URL]'),      # Google API URLs
-            (r'https?://[^\s]*googleusercontent\.com[^\s]*', '[REDACTED_URL]'), # Google content URLs
-            (r'projects/[a-zA-Z0-9_-]+', 'projects/[REDACTED_PROJECT]'),      # GCP project paths
-            (r'ya29\.[a-zA-Z0-9_-]+', '[REDACTED_TOKEN]'),                   # OAuth tokens
-            (r'Bearer\s+[a-zA-Z0-9._-]+', 'Bearer [REDACTED_TOKEN]'),        # Bearer tokens
-        ]
+        patterns = [
+            (r'\bAIza[0-9A-Za-z_-]{16,}\b', '[REDACTED_KEY]'),
+            (r'\bya29\.\S+', '[REDACTED_TOKEN]'),
+            (r'Bearer\s+\S+', 'Bearer [REDACTED_TOKEN]'),
+            (r'https?://\S*googleapis\.com\S*', '[REDACTED_URL]'),
+            (r'https?://\S*googleusercontent\.com\S*', '[REDACTED_URL]'),
+            (r'\bprojects/[A-Za-z0-9_-]+', 'projects/[REDACTED_PROJECT]'),
+        ]

103-108: Anthropic pattern: allow shorter keys and use word boundaries

Align with case-insensitive, variable-length approach across providers.

-        patterns = [
-            (r'sk-ant-[a-zA-Z0-9_-]{10,}', '[REDACTED_KEY]'),                 # Anthropic API keys
-            (r'https?://[^\s]*anthropic\.com[^\s]*', '[REDACTED_URL]'),        # Anthropic URLs
-            (r'Bearer\s+[a-zA-Z0-9._-]+', 'Bearer [REDACTED_TOKEN]'),         # Bearer tokens
-        ]
+        patterns = [
+            (r'\bsk-ant-[A-Za-z0-9_-]{8,}\b', '[REDACTED_KEY]'),
+            (r'Bearer\s+\S+', 'Bearer [REDACTED_TOKEN]'),
+            (r'https?://\S*anthropic\.com\S*', '[REDACTED_URL]'),
+        ]

140-161: Provider detection uses fixed-length OpenAI key match — broaden and add Gemini cue

Use variable-length key pattern and consider “gemini” as a Google cue.

-        elif ("google" in error_lower or 
-              re.search(r'AIza[a-zA-Z0-9_-]+', error_str, re.IGNORECASE) or
-              "googleapis" in error_lower or 
-              "vertex" in error_lower):
+        elif ("google" in error_lower or 
+              re.search(r'\bAIza[0-9A-Za-z_-]{16,}\b', error_str, re.IGNORECASE) or
+              "googleapis" in error_lower or 
+              "vertex" in error_lower or
+              "gemini" in error_lower):
             return "google"
-        elif ("openai" in error_lower or 
-              re.search(r'sk-[a-zA-Z0-9]{48}', error_str, re.IGNORECASE) or
-              "gpt" in error_lower):
+        elif ("openai" in error_lower or 
+              re.search(r'\bsk-(?:proj-[A-Za-z0-9_-]{8,}|[A-Za-z0-9_-]{16,})\b', error_str, re.IGNORECASE) or
+              "gpt" in error_lower):
             return "openai"
python/src/server/api_routes/knowledge_api.py (3)

762-766: Sanitize errors in HTTP response when crawl fails to start

Return sanitized JSON; don’t leak raw provider messages.

-    except Exception as e:
-        safe_logfire_error(f"Failed to start crawl | error={str(e)} | url={str(request.url)}")
-        raise HTTPException(status_code=500, detail=str(e))
+    except Exception as e:
+        sanitized = _sanitize_provider_error(str(e))
+        safe_logfire_error(f"Failed to start crawl | error={sanitized} | url={str(request.url)}")
+        raise HTTPException(status_code=500, detail={"error": sanitized})

827-845: Sanitize crawl errors before logging and pushing to progress tracker

Avoid leaking tokens in logs/UI toasts.

-        except Exception as e:
-            error_message = f"Crawling failed: {str(e)}"
+        except Exception as e:
+            sanitized = _sanitize_provider_error(str(e))
+            error_message = f"Crawling failed: {sanitized}"
             safe_logfire_error(
                 f"Crawl failed | progress_id={progress_id} | error={error_message} | exception_type={type(e).__name__}"
             )
@@
-            logger.error(f"Error: {error_message}")
+            logger.error(f"Error: {error_message}")
@@
-                await tracker.error(error_message)
+                await tracker.error(error_message)

1036-1042: Sanitize upload errors before logging and tracker updates

Same rationale as crawl path.

-    except Exception as e:
-        error_msg = f"Upload failed: {str(e)}"
-        await tracker.error(error_msg)
-        logger.error(f"Document upload failed: {e}", exc_info=True)
-        safe_logfire_error(
-            f"Document upload failed | progress_id={progress_id} | filename={file_metadata.get('filename', 'unknown')} | error={str(e)}"
-        )
+    except Exception as e:
+        sanitized = _sanitize_provider_error(str(e))
+        error_msg = f"Upload failed: {sanitized}"
+        await tracker.error(error_msg)
+        logger.error(f"Document upload failed: {sanitized}", exc_info=True)
+        safe_logfire_error(
+            f"Document upload failed | progress_id={progress_id} | filename={file_metadata.get('filename', 'unknown')} | error={sanitized}"
+        )
🧹 Nitpick comments (3)
python/src/server/services/embeddings/provider_error_adapters.py (3)

4-6: Docstring claims Ollama support but no adapter present

Either add an Ollama adapter or update the docstring to avoid misleading readers.

- Supports OpenAI, Google AI, Anthropic, Ollama, and future providers
+ Supports OpenAI, Google AI, Anthropic, and future providers

54-58: Over-broad “sensitive_words” fallback may mask useful errors

Returning a generic message on words like “server/endpoint” post-sanitization can reduce debuggability. Consider narrowing to known phrases (e.g., “internal server error”) or keep sanitized text.

-        sensitive_words = ['internal', 'server', 'endpoint']
-        if any(word in sanitized.lower() for word in sensitive_words):
+        if re.search(r'\binternal server error\b', sanitized, flags=re.IGNORECASE):
             return "OpenAI API encountered an error. Please verify your API key and quota."

124-128: Adapter registry lacks Ollama despite docstring mention

If Ollama is intended, add an adapter (even a pass-through) or remove the mention to avoid confusion. Also add a type hint for MyPy.

-    _adapters = {
+    _adapters: dict[str, ProviderErrorAdapter] = {
         "openai": OpenAIErrorAdapter(),
         "google": GoogleAIErrorAdapter(),
         "anthropic": AnthropicErrorAdapter(),
+        # "ollama": OllamaErrorAdapter(),  # TODO: implement if required
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cf687b and bc0277e.

📒 Files selected for processing (2)
  • python/src/server/api_routes/knowledge_api.py (4 hunks)
  • python/src/server/services/embeddings/provider_error_adapters.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Fail fast on service startup failures, missing configuration, database connection failures, authentication/authorization failures, critical dependencies unavailable, and invalid/corrupting data
Never accept or store corrupted data; on operation failure skip the item entirely rather than writing placeholders (e.g., zero embeddings)
For batch/background operations, continue processing but log each failure with details; track both successes and failures
Use specific exception types (avoid bare Exception), include context/IDs/URLs in messages, preserve full stack traces with logging (exc_info=True), and never return None/null to indicate failure—raise with details
Use database task status values directly: todo, doing, review, done
Target Python 3.12 style with 120-character line length; use Ruff for linting and Mypy for type checking

python/src/**/*.py: Fail fast on critical conditions: service startup failures, missing configuration/env vars, database connection/auth failures, critical dependencies unavailable
Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely and continue processing
For batch/background operations, continue processing but log detailed per-item failures; for external APIs use retries with exponential backoff and then fail clearly
Error messages must include context, use specific exception types, preserve full stack traces (logging with exc_info=True), include relevant IDs/URLs, and never return None to indicate failure—raise instead; for batch ops report success counts and detailed failures
Backend uses Python 3.12 with a 120-character line length
Avoid introducing WebSocket support in the backend; updates are handled via HTTP polling
Adhere to Ruff lint rules (e.g., no unused imports) and provide type hints to satisfy MyPy

Files:

  • python/src/server/api_routes/knowledge_api.py
  • python/src/server/services/embeddings/provider_error_adapters.py
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI exception handlers to return rich error responses with appropriate HTTP status codes and typed error payloads

Use specific exception classes and FastAPI exception handlers to produce rich JSON error responses

Files:

  • python/src/server/api_routes/knowledge_api.py
  • python/src/server/services/embeddings/provider_error_adapters.py
python/src/server/{api_routes,services}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Follow Service Layer pattern: API routes delegate to services, which handle business logic and call the database layer

Files:

  • python/src/server/api_routes/knowledge_api.py
  • python/src/server/services/embeddings/provider_error_adapters.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Backend service method naming mirrors CRUD patterns: get/create/update/delete with clear resource scoping

Files:

  • python/src/server/services/embeddings/provider_error_adapters.py
🧬 Code graph analysis (2)
python/src/server/api_routes/knowledge_api.py (1)
python/src/server/services/embeddings/embedding_service.py (1)
  • create_embedding (71-128)
python/src/server/services/embeddings/provider_error_adapters.py (1)
python/src/server/services/embeddings/embedding_exceptions.py (4)
  • EmbeddingAPIError (86-99)
  • EmbeddingAuthenticationError (102-115)
  • EmbeddingQuotaExhaustedError (46-58)
  • EmbeddingRateLimitError (61-72)

Comment thread python/src/server/api_routes/knowledge_api.py Outdated
Comment thread python/src/server/api_routes/knowledge_api.py
Comment thread python/src/server/api_routes/knowledge_api.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
python/src/server/api_routes/knowledge_api.py (4)

60-109: Map provider errors (401 vs 429), add timeout + TTL cache; guard against hangs.

Right now all failures become 401, there's no timeout on create_embedding, and each call pays validation cost. Add per-provider TTL, asyncio timeout, and map EmbeddingAuthenticationError/EmbeddingRateLimitError distinctly.

Apply:

-async def _validate_provider_api_key(provider: str = None) -> None:
+async def _validate_provider_api_key(provider: str | None = None) -> None:
@@
-    try:
-        if not provider:
-            provider = "openai"
+    try:
+        provider = provider or "openai"
@@
-        # Test API key with minimal embedding request - this will fail if key is invalid
-        from ..services.embeddings.embedding_service import create_embedding
-        test_result = await create_embedding(text="test")
+        # Test API key with minimal embedding request - this will fail if key is invalid
+        from ..services.embeddings.embedding_service import create_embedding
+        from ..services.embeddings.embedding_exceptions import (
+            EmbeddingAuthenticationError,
+            EmbeddingRateLimitError,
+        )
+        # TTL to avoid per-request cost
+        from time import monotonic
+        global _LAST_KEY_VALIDATION, _API_KEY_VALIDATION_TTL_SECONDS
+        try:
+            _LAST_KEY_VALIDATION
+        except NameError:
+            _LAST_KEY_VALIDATION = {}
+            _API_KEY_VALIDATION_TTL_SECONDS = 60
+        now = monotonic()
+        last_ok = _LAST_KEY_VALIDATION.get(provider)
+        if last_ok and (now - last_ok) < _API_KEY_VALIDATION_TTL_SECONDS:
+            logger.info(f"✅ Reusing recent {provider.title()} key validation (TTL hit)")
+            return
+
+        # Ensure the check doesn't hang
+        test_result = await asyncio.wait_for(create_embedding(text="test"), timeout=5.0)
@@
-        logger.info(f"✅ {provider.title()} API key validation successful")
+        _LAST_KEY_VALIDATION[provider] = now
+        logger.info(f"✅ {provider.title()} API key validation successful")
@@
-    except Exception as e:
-        # Sanitize error before logging to prevent sensitive data exposure
-        error_str = str(e)
-        sanitized_error = ProviderErrorFactory.sanitize_provider_error(error_str, provider or "openai")
-        logger.error(f"❌ Caught exception during API key validation: {sanitized_error}")
-        
-        # Always fail for any exception during validation - better safe than sorry
-        logger.error("🚨 API key validation failed - blocking crawl operation")
-        raise HTTPException(
-            status_code=401,
-            detail={
-                "error": "Invalid API key",
-                "message": f"Please verify your {(provider or 'openai').title()} API key in Settings before starting a crawl.",
-                "error_type": "authentication_failed",
-                "provider": provider or "openai"
-            }
-        ) from None
+    except EmbeddingAuthenticationError as e:
+        sanitized = ProviderErrorFactory.sanitize_provider_error(str(e), provider)
+        logger.error(f"❌ Auth failed during API key validation: {sanitized}")
+        raise HTTPException(
+            status_code=401,
+            detail={
+                "error": f"Invalid {provider.title()} API key",
+                "message": f"Please verify your {provider.title()} API key in Settings.",
+                "error_type": "authentication_failed",
+                "provider": provider,
+            },
+        ) from None
+    except EmbeddingRateLimitError as e:
+        sanitized = ProviderErrorFactory.sanitize_provider_error(str(e), provider)
+        logger.error(f"⏳ Rate limit during API key validation: {sanitized}")
+        raise HTTPException(
+            status_code=429,
+            detail={
+                "error": "Rate limit reached",
+                "message": "Your provider rate limit was reached. Please wait or upgrade your plan.",
+                "error_type": "rate_limited",
+                "provider": provider,
+            },
+        ) from None
+    except asyncio.TimeoutError:
+        logger.error("⏱️ API key validation timed out")
+        raise HTTPException(
+            status_code=504,
+            detail={
+                "error": "Provider timeout",
+                "message": "Validation timed out. Please retry.",
+                "error_type": "upstream_timeout",
+                "provider": provider,
+            },
+        ) from None
+    except Exception as e:
+        sanitized = ProviderErrorFactory.sanitize_provider_error(str(e), provider)
+        logger.error(f"❌ Exception during API key validation: {sanitized}")
+        raise HTTPException(
+            status_code=401,
+            detail={
+                "error": "Invalid API key",
+                "message": f"Please verify your {provider.title()} API key in Settings before starting a crawl.",
+                "error_type": "authentication_failed",
+                "provider": provider,
+            },
+        ) from None

Add near file top:

# Simple per-provider validation TTL to avoid validating on every request
_API_KEY_VALIDATION_TTL_SECONDS = 60
_LAST_KEY_VALIDATION: dict[str, float] = {}

539-544: Guard against None from credential_service.get_active_provider().

provider_config may be None, causing AttributeError on .get(...). Use a safe default and avoid crashes before validation.

Apply:

-    provider_config = await credential_service.get_active_provider("embedding")
-    provider = provider_config.get("provider", "openai")
-    await _validate_provider_api_key(provider)
+    provider_config = await credential_service.get_active_provider("embedding")
+    provider = (provider_config or {}).get("provider") or "openai"
+    await _validate_provider_api_key(provider=provider)

Optional: extract this into a small helper to DRY across routes.


663-669: Same None-guard needed here to prevent crash in crawl path.

-    provider_config = await credential_service.get_active_provider("embedding")
-    provider = provider_config.get("provider", "openai")
-    await _validate_provider_api_key(provider)
+    provider_config = await credential_service.get_active_provider("embedding")
+    provider = (provider_config or {}).get("provider") or "openai"
+    await _validate_provider_api_key(provider=provider)

825-830: Same None-guard needed for upload path.

-    provider_config = await credential_service.get_active_provider("embedding")
-    provider = provider_config.get("provider", "openai")
-    await _validate_provider_api_key(provider)
+    provider_config = await credential_service.get_active_provider("embedding")
+    provider = (provider_config or {}).get("provider") or "openai"
+    await _validate_provider_api_key(provider=provider)
🧹 Nitpick comments (6)
PRPs/decouple-task-priority-from-order.md (6)

170-175: Doc inconsistency with backfill section

You note ORDER_INCREMENT=1000 here, but the backfill thresholds earlier assume small integers. Update the backfill approach to align with this invariant (see prior comment).


202-210: Use a constrained type for priority in Pydantic

Prefer Enum or Literal instead of raw str to shift validation left and auto-generate schema constraints.

-from pydantic import BaseModel
+from pydantic import BaseModel
+from enum import Enum
+
+class TaskPriority(str, Enum):
+    low = "low"
+    medium = "medium"
+    high = "high"
+    urgent = "urgent"

 class UpdateTaskRequest(BaseModel):
@@
-    priority: str | None = None    # NEW: Direct priority field
+    priority: TaskPriority | None = None

294-303: Avoid hardcoded lists; normalize input and centralize allowed values

Derive allowed values from the Enum and normalize case to prevent subtle mismatches.

-async def validate_priority(self, priority: str) -> tuple[bool, str]:
-    """Validate task priority against allowed enum values"""
-    VALID_PRIORITIES = ["low", "medium", "high", "urgent"]
-    if priority not in VALID_PRIORITIES:
+async def validate_priority(self, priority: str | None) -> tuple[bool, str]:
+    """Validate task priority against allowed enum values"""
+    if priority is None:
+        return True, ""
+    p = str(priority).strip().lower()
+    VALID = {"low", "medium", "high", "urgent"}
+    if p not in VALID:
         return (
             False,
-            f"Invalid priority '{priority}'. Must be one of: {', '.join(VALID_PRIORITIES)}",
+            f"Invalid priority '{priority}'. Must be one of: {', '.join(sorted(VALID))}",
         )
-    return True, ""
+    return True, ""

357-360: Indexing: consider common filters

If queries usually filter by project_id/status plus priority, a composite index like (project_id, status, priority) may outperform a single-column index. Validate with EXPLAIN on real workloads.


53-60: Backfill success criterion needs concrete definition

“Appropriate priority values based on current task_order” is ambiguous. Specify whether using quantiles per project/status, legacy mapping, or defaulting to medium when unknown. This drives deterministic test cases.


492-509: Add rollback notes and enum evolution strategy

Document how to roll back (drop NOT NULL, drop column, drop index; enum drop only if unused) and how to add future enum values (ALTER TYPE ... ADD VALUE). Prevents deployment stalls.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc0277e and 7a46394.

📒 Files selected for processing (2)
  • PRPs/decouple-task-priority-from-order.md (1 hunks)
  • python/src/server/api_routes/knowledge_api.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Fail fast on service startup failures, missing configuration, database connection failures, authentication/authorization failures, critical dependencies unavailable, and invalid/corrupting data
Never accept or store corrupted data; on operation failure skip the item entirely rather than writing placeholders (e.g., zero embeddings)
For batch/background operations, continue processing but log each failure with details; track both successes and failures
Use specific exception types (avoid bare Exception), include context/IDs/URLs in messages, preserve full stack traces with logging (exc_info=True), and never return None/null to indicate failure—raise with details
Use database task status values directly: todo, doing, review, done
Target Python 3.12 style with 120-character line length; use Ruff for linting and Mypy for type checking

python/src/**/*.py: Fail fast on critical conditions: service startup failures, missing configuration/env vars, database connection/auth failures, critical dependencies unavailable
Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely and continue processing
For batch/background operations, continue processing but log detailed per-item failures; for external APIs use retries with exponential backoff and then fail clearly
Error messages must include context, use specific exception types, preserve full stack traces (logging with exc_info=True), include relevant IDs/URLs, and never return None to indicate failure—raise instead; for batch ops report success counts and detailed failures
Backend uses Python 3.12 with a 120-character line length
Avoid introducing WebSocket support in the backend; updates are handled via HTTP polling
Adhere to Ruff lint rules (e.g., no unused imports) and provide type hints to satisfy MyPy

Files:

  • python/src/server/api_routes/knowledge_api.py
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI exception handlers to return rich error responses with appropriate HTTP status codes and typed error payloads

Use specific exception classes and FastAPI exception handlers to produce rich JSON error responses

Files:

  • python/src/server/api_routes/knowledge_api.py
python/src/server/{api_routes,services}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Follow Service Layer pattern: API routes delegate to services, which handle business logic and call the database layer

Files:

  • python/src/server/api_routes/knowledge_api.py
🧠 Learnings (1)
📚 Learning: 2025-09-06T20:04:08.138Z
Learnt from: CR
PR: coleam00/Archon#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T20:04:08.138Z
Learning: Applies to archon-ui-main/src/features/**/tasks/**/*.{ts,tsx} : Use task status values directly from the database: todo, doing, review, done (no UI mapping)

Applied to files:

  • PRPs/decouple-task-priority-from-order.md
🔇 Additional comments (4)
PRPs/decouple-task-priority-from-order.md (3)

1-6: Unrelated file in this PR — please move to a separate PR

This PR is about provider-agnostic LLM error handling (#650), but this file proposes a task-priority DB/UI change. Mixing concerns complicates review and rollback. Recommend removing this file from this PR and opening a dedicated PR.


69-112: Replace brittle line-number references with code anchors

  • Replace “lines 766–772” style refs with function/class names or grep-able snippets.
  • Verified: class UpdateTaskRequest exists in python/src/server/api_routes/projects_api.py; async def update_task exists in python/src/server/services/projects/task_service.py.
  • Not found: getTaskPriorityFromTaskOrder under archon-ui-main/src/features/projects/tasks — confirm exact name/path or update the doc to reference the frontend priority symbol (e.g., TASK_PRIORITY_OPTIONS or the actual conversion function).
  • Suggested verification script to run locally:
#!/bin/bash
rg -nP 'class\s+UpdateTaskRequest\b' python/src/server/api_routes/projects_api.py -C2 || true
rg -nP '\basync\s+def\s+update_task\s*\(' python/src/server/services/projects/task_service.py -C3 || true
rg -nP 'getTaskPriorityFromTaskOrder|getTaskPriority|TASK_PRIORITY_OPTIONS|priorityFromTaskOrder' archon-ui-main/src/features/projects/tasks -g '!**/dist/**' -C2 || true

324-351: Frontend: ensure task queries return priority and optimistic updates match server response

  • Verify task fetches/mutations return the priority field (avoids UI flicker when updating).
  • Ensure optimistic update uses the exact server response shape and rolls back on error (onMutate/rollback or equivalent).

Relevant locations: archon-ui-main/src/features/projects/tasks/{types/task.ts, types/priority.ts, schemas/index.ts, components/TaskCard.tsx, components/TaskPriority.tsx, components/TaskEditModal.tsx}.

python/src/server/api_routes/knowledge_api.py (1)

25-27: Good import additions for provider resolution and sanitization.

Centralizing on credential_service and ProviderErrorFactory is the right direction.

Comment thread PRPs/decouple-task-priority-from-order.md Outdated
Comment thread PRPs/decouple-task-priority-from-order.md Outdated
Comment thread PRPs/decouple-task-priority-from-order.md Outdated
@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Sep 16, 2025

This looks ready to merge!

leex279 and others added 7 commits September 17, 2025 11:52
Implements generic error handling that works for OpenAI, Google AI,
Anthropic, and other LLM providers to prevent silent failures.

Essential files only:
1. Provider error adapters (new) - handles any LLM provider
2. Backend API key validation - detects invalid keys before operations
3. Frontend error handler - provider-aware error messages
4. Updated hooks - uses generic error handling

Core functionality:
✅ Validates API keys before expensive operations (crawl, upload, refresh)
✅ Shows clear provider-specific error messages
✅ Works with OpenAI: 'Please verify your OpenAI API key in Settings'
✅ Works with Google: 'Please verify your Google API key in Settings'
✅ Prevents 90-minute debugging sessions from Issue #362

No unnecessary changes - only essential error handling logic.

Fixes #362
- Add comprehensive logging to trace validation flow
- Ensure validation actually blocks operations on authentication failures
- Improve error detection to catch wrapped OpenAI errors
- Fail fast on any validation errors to prevent wasted operations

This should ensure invalid API keys are caught before crawl starts,
not during embedding processing after documents are crawled.
- Remove complex provider adapter imports that cause module issues
- Simplified validation that fails fast on any embedding creation error
- Enhanced logging to trace exactly what's happening
- Always block operations when API key validation fails

This ensures invalid API keys are caught immediately before
crawl operations start, preventing silent failures.
The validation was only added to new crawl endpoint but missing from:
- Knowledge item refresh endpoint (/knowledge-items/{source_id}/refresh)
- Document upload endpoint (/documents/upload)

Now all three endpoints that create embeddings will validate API keys
before starting operations, preventing silent failures on refresh/upload.
Enhanced sanitization and provider detection based on CodeRabbit feedback:

✅ Comprehensive regex patterns for all provider API keys
  - OpenAI: sk-[a-zA-Z0-9]{48} with case-insensitive matching
  - Google AI: AIza[a-zA-Z0-9_-]{35} with flexible matching
  - Anthropic: sk-ant-[a-zA-Z0-9_-]{10,} with variable length

✅ Enhanced provider detection with multiple patterns
  - Case-insensitive keyword matching (openai, google, anthropic)
  - Regex-based API key detection for reliable identification
  - Additional keywords (gpt, claude, vertex, googleapis)

✅ Improved sanitization patterns
  - Provider-specific URL sanitization (openai.com, googleapis.com, anthropic.com)
  - Organization and project ID redaction
  - OAuth token and bearer token sanitization
  - Sensitive keyword detection and generic fallback

✅ Sanitized error logging
  - All error messages sanitized before logging
  - Prevents sensitive data exposure in backend logs
  - Maintains debugging capability with redacted information

Core security improvements while maintaining simplicity for beta deployment.
…Factory

- Remove local _sanitize_provider_error implementation with inline regex patterns
- Add ProviderErrorFactory import from embeddings.provider_error_adapters
- Update _validate_provider_api_key calls to pass correct active embedding provider
- Replace sanitization call with ProviderErrorFactory.sanitize_provider_error()
- Eliminate duplicate logic and fixed-length key assumptions
- Ensure provider-specific, configurable sanitization patterns are used consistently

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@Wirasm Wirasm force-pushed the feature/provider-agnostic-error-handling branch from 180cdd7 to 006a36d Compare September 17, 2025 08:52
Copy link
Copy Markdown
Collaborator

@Wirasm Wirasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What looks good:

  • Excellent provider-agnostic approach that works for OpenAI, Google AI, Anthropic, and future providers
  • Proper integration with existing TanStack Query patterns in /features
  • Good security practices with API key sanitization
  • Smart preflight validation before expensive operations (crawl, upload, refresh)
  • Clean separation of concerns with ProviderErrorFactory

i'd still like to request 2 minor changes before merge to follow the new patterns

Two minor fixes needed before merge:

  1. Add barrel export for consistency:
    In archon-ui-main/src/features/knowledge/utils/index.ts, add:
    export * from "./providerErrorHandler";

  2. Fix TypeScript typing to match our strict patterns:
    In providerErrorHandler.ts, change line 16:
    // Change from:
    export function parseProviderError(error: any): ProviderError {
    // To:
    export function parseProviderError(error: unknown): ProviderError {

The PR is already rebased on latest main and tests are passing. Once these two minor issues are fixed, this is ready for squash and merge.

@Wirasm Wirasm mentioned this pull request Sep 17, 2025
- Add barrel export for providerErrorHandler in utils/index.ts
- Change TypeScript typing from 'any' to 'unknown' for strict type safety
@Wirasm Wirasm self-requested a review September 17, 2025 10:12
Copy link
Copy Markdown
Collaborator

@Wirasm Wirasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, ready to merge

@Wirasm Wirasm merged commit 9f2d70a into main Sep 17, 2025
7 of 8 checks passed
@Wirasm Wirasm deleted the feature/provider-agnostic-error-handling branch September 17, 2025 10:14
leonj1 pushed a commit to leonj1/Archon that referenced this pull request Oct 13, 2025
…providers (coleam00#650)

* feat: Provider-agnostic error handling for Issue coleam00#362

Implements generic error handling that works for OpenAI, Google AI,
Anthropic, and other LLM providers to prevent silent failures.

Essential files only:
1. Provider error adapters (new) - handles any LLM provider
2. Backend API key validation - detects invalid keys before operations
3. Frontend error handler - provider-aware error messages
4. Updated hooks - uses generic error handling

Core functionality:
✅ Validates API keys before expensive operations (crawl, upload, refresh)
✅ Shows clear provider-specific error messages
✅ Works with OpenAI: 'Please verify your OpenAI API key in Settings'
✅ Works with Google: 'Please verify your Google API key in Settings'
✅ Prevents 90-minute debugging sessions from Issue coleam00#362

No unnecessary changes - only essential error handling logic.

Fixes coleam00#362

* fix: Enhance API key validation with detailed logging and error handling

- Add comprehensive logging to trace validation flow
- Ensure validation actually blocks operations on authentication failures
- Improve error detection to catch wrapped OpenAI errors
- Fail fast on any validation errors to prevent wasted operations

This should ensure invalid API keys are caught before crawl starts,
not during embedding processing after documents are crawled.

* fix: Simplify API key validation to always fail on exceptions

- Remove complex provider adapter imports that cause module issues
- Simplified validation that fails fast on any embedding creation error
- Enhanced logging to trace exactly what's happening
- Always block operations when API key validation fails

This ensures invalid API keys are caught immediately before
crawl operations start, preventing silent failures.

* fix: Add API key validation to refresh and upload endpoints

The validation was only added to new crawl endpoint but missing from:
- Knowledge item refresh endpoint (/knowledge-items/{source_id}/refresh)
- Document upload endpoint (/documents/upload)

Now all three endpoints that create embeddings will validate API keys
before starting operations, preventing silent failures on refresh/upload.

* security: Implement core security fixes from CodeRabbit review

Enhanced sanitization and provider detection based on CodeRabbit feedback:

✅ Comprehensive regex patterns for all provider API keys
  - OpenAI: sk-[a-zA-Z0-9]{48} with case-insensitive matching
  - Google AI: AIza[a-zA-Z0-9_-]{35} with flexible matching
  - Anthropic: sk-ant-[a-zA-Z0-9_-]{10,} with variable length

✅ Enhanced provider detection with multiple patterns
  - Case-insensitive keyword matching (openai, google, anthropic)
  - Regex-based API key detection for reliable identification
  - Additional keywords (gpt, claude, vertex, googleapis)

✅ Improved sanitization patterns
  - Provider-specific URL sanitization (openai.com, googleapis.com, anthropic.com)
  - Organization and project ID redaction
  - OAuth token and bearer token sanitization
  - Sensitive keyword detection and generic fallback

✅ Sanitized error logging
  - All error messages sanitized before logging
  - Prevents sensitive data exposure in backend logs
  - Maintains debugging capability with redacted information

Core security improvements while maintaining simplicity for beta deployment.

* fix: Replace ad-hoc error sanitization with centralized ProviderErrorFactory

- Remove local _sanitize_provider_error implementation with inline regex patterns
- Add ProviderErrorFactory import from embeddings.provider_error_adapters
- Update _validate_provider_api_key calls to pass correct active embedding provider
- Replace sanitization call with ProviderErrorFactory.sanitize_provider_error()
- Eliminate duplicate logic and fixed-length key assumptions
- Ensure provider-specific, configurable sanitization patterns are used consistently

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: Remove accidentally committed PRP file

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: address code review feedback

- Add barrel export for providerErrorHandler in utils/index.ts
- Change TypeScript typing from 'any' to 'unknown' for strict type safety

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
@coderabbitai coderabbitai Bot mentioned this pull request Nov 8, 2025
6 tasks
coleam00 added a commit that referenced this pull request Apr 7, 2026
The @archon/core mock in workflow.test.ts was missing generateAndSetTitle,
causing it to be undefined in tests and potentially throwing TypeError before
executeWorkflow was reached. Also adds mocks for @archon/core/db/messages and
@archon/core/db/workflows, and a new test asserting generateAndSetTitle is
called with the correct arguments during workflowRunCommand.

Changes:
- Add generateAndSetTitle mock to @archon/core mock block
- Add @archon/core/db/messages mock (addMessage)
- Add @archon/core/db/workflows mock (getActiveWorkflowRun, failWorkflowRun, findLastFailedRun, resumeWorkflowRun)
- Add test: should call generateAndSetTitle with workflow name and user message

Fixes #650
coleam00 added a commit that referenced this pull request Apr 7, 2026
…i-title

fix: add generateAndSetTitle to CLI workflow test mock (#650)
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
The @archon/core mock in workflow.test.ts was missing generateAndSetTitle,
causing it to be undefined in tests and potentially throwing TypeError before
executeWorkflow was reached. Also adds mocks for @archon/core/db/messages and
@archon/core/db/workflows, and a new test asserting generateAndSetTitle is
called with the correct arguments during workflowRunCommand.

Changes:
- Add generateAndSetTitle mock to @archon/core mock block
- Add @archon/core/db/messages mock (addMessage)
- Add @archon/core/db/workflows mock (getActiveWorkflowRun, failWorkflowRun, findLastFailedRun, resumeWorkflowRun)
- Add test: should call generateAndSetTitle with workflow name and user message

Fixes coleam00#650
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
…e-650-cli-title

fix: add generateAndSetTitle to CLI workflow test mock (coleam00#650)
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
The @archon/core mock in workflow.test.ts was missing generateAndSetTitle,
causing it to be undefined in tests and potentially throwing TypeError before
executeWorkflow was reached. Also adds mocks for @archon/core/db/messages and
@archon/core/db/workflows, and a new test asserting generateAndSetTitle is
called with the correct arguments during workflowRunCommand.

Changes:
- Add generateAndSetTitle mock to @archon/core mock block
- Add @archon/core/db/messages mock (addMessage)
- Add @archon/core/db/workflows mock (getActiveWorkflowRun, failWorkflowRun, findLastFailedRun, resumeWorkflowRun)
- Add test: should call generateAndSetTitle with workflow name and user message

Fixes coleam00#650
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
…e-650-cli-title

fix: add generateAndSetTitle to CLI workflow test mock (coleam00#650)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [Bug]: Missing Error handling on Open AI API key Error 429

2 participants