RAG: 401 on invalid OpenAI key; prevent silent empty results (#371/#362)#521
RAG: 401 on invalid OpenAI key; prevent silent empty results (#371/#362)#521TimothiousAI wants to merge 2 commits intomainfrom
Conversation
WalkthroughAdds EmbeddingAuthenticationError and propagates it through embedding_service, rag_service, and knowledge_api to return HTTP 401 on authentication failures. Adjusts embedding calls to use a single-vector API parameter ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as knowledge_api
participant RAG as rag_service
participant EMB as embedding_service
participant PROVIDER as Embedding Provider
Client->>API: perform_rag_query / search_code_examples
API->>RAG: search(query)
RAG->>EMB: create_embedding(text=query)
EMB->>PROVIDER: embed request
alt Authentication failure
PROVIDER-->>EMB: AuthenticationError
EMB-->>RAG: EmbeddingAuthenticationError
RAG-->>API: EmbeddingAuthenticationError
API-->>Client: 401 {"error":"Invalid API key"}
else Success
PROVIDER-->>EMB: embedding vector
EMB-->>RAG: vector
RAG->>RAG: optionally rerank if reranking_strategy is not None
RAG-->>API: results (includes reranking_applied)
API-->>Client: 200 results
end
sequenceDiagram
autonumber
participant RAG as rag_service
participant Strat as reranking_strategy (optional)
RAG->>RAG: Check if reranking_strategy is not None
alt strategy is None
RAG-->>RAG: Skip reranking (reranking_applied=false)
else strategy provided
RAG->>Strat: Rerank results
Strat-->>RAG: Reranked results
RAG-->>RAG: Set reranking_applied=true
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Assessment against linked issues
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Fixes #362 |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/src/server/services/search/rag_service.py (1)
342-349: Fix: ‘reranking_applied’ may be reported True even when reranking fails.Track success with a flag, not just strategy presence, and expose it in response and span.
- # Apply reranking if we have a strategy - if self.reranking_strategy is not None and results: - try: - results = await self.reranking_strategy.rerank_results( - query, results, content_key="content" - ) - except Exception as e: - logger.warning(f"Code reranking failed: {e}") + # Apply reranking if we have a strategy + reranking_applied = False + if self.reranking_strategy is not None and results: + try: + results = await self.reranking_strategy.rerank_results( + query, results, content_key="content" + ) + reranking_applied = True + except Exception as e: + logger.warning(f"Code reranking failed: {e}") @@ response_data = { "query": query, "source_filter": source_id, "search_mode": "hybrid" if use_hybrid_search else "vector", - "reranking_applied": self.reranking_strategy is not None, + "reranking_applied": reranking_applied, "results": formatted_results, "count": len(formatted_results), } @@ span.set_attribute("results_found", len(formatted_results)) span.set_attribute("hybrid_used", use_hybrid_search) - span.set_attribute("reranking_used", self.reranking_strategy is not None) + span.set_attribute("reranking_used", self.reranking_strategy is not None) + span.set_attribute("reranking_applied", reranking_applied)Also applies to: 367-378
🧹 Nitpick comments (3)
python/src/server/api_routes/knowledge_api.py (1)
34-37: Remove unused import.
EmbeddingQuotaExhaustedErrorisn’t used in this module. Clean it up to satisfy Ruff (F401) and guidelines.-from ..services.embeddings.embedding_exceptions import ( - EmbeddingAuthenticationError, - EmbeddingQuotaExhaustedError, -) +from ..services.embeddings.embedding_exceptions import EmbeddingAuthenticationErrorpython/src/server/services/embeddings/embedding_service.py (2)
233-237: Map upstream auth failure to EmbeddingAuthenticationError and include masked key context.Great mapping. For diagnosability, pass a masked prefix (never the full key).
- except openai.AuthenticationError as e: + except openai.AuthenticationError as e: # Invalid API key - critical error, stop everything search_logger.error("Authentication failed: Invalid API key", exc_info=True) - raise EmbeddingAuthenticationError("Invalid API key") from e + from os import getenv + masked = (getenv("OPENAI_API_KEY", "")[:4] + "…") if getenv("OPENAI_API_KEY") else None + raise EmbeddingAuthenticationError("Invalid API key", api_key_prefix=masked) from e
72-87: Docstring: include EmbeddingAuthenticationError in Raises.Reflect the new behavior for callers and API layer.
async def create_embedding(text: str, provider: str | None = None) -> list[float]: @@ - Raises: + Raises: + EmbeddingAuthenticationError: When upstream API key is invalid/unauthorized EmbeddingQuotaExhaustedError: When OpenAI quota is exhausted EmbeddingRateLimitError: When rate limited EmbeddingAPIError: For other API errors
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
python/src/server/api_routes/knowledge_api.py(3 hunks)python/src/server/services/embeddings/embedding_exceptions.py(1 hunks)python/src/server/services/embeddings/embedding_service.py(5 hunks)python/src/server/services/search/rag_service.py(7 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
python/src/{server,mcp,agents}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/{server,mcp,agents}/**/*.py: Fail fast on service startup failures, missing configuration, database connection issues, auth failures, critical dependency outages, and invalid data that would corrupt state
External API calls should use retry with exponential backoff and ultimately fail with a clear, contextual error message
Error messages must include context (operation being attempted) and relevant IDs/URLs/data for debugging
Preserve full stack traces in logs (e.g., Python logging with exc_info=True)
Use specific exception types; avoid catching broad Exception unless re-raising with context
Never signal failure by returning None/null; raise a descriptive exception instead
Files:
python/src/server/api_routes/knowledge_api.pypython/src/server/services/embeddings/embedding_exceptions.pypython/src/server/services/embeddings/embedding_service.pypython/src/server/services/search/rag_service.py
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: For batch processing and background tasks, continue processing but log detailed per-item failures and return both successes and failures
Do not crash the server on a single WebSocket event failure; log the error and continue serving other clients
Files:
python/src/server/api_routes/knowledge_api.pypython/src/server/services/embeddings/embedding_exceptions.pypython/src/server/services/embeddings/embedding_service.pypython/src/server/services/search/rag_service.py
python/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting and Mypy for type checking before commit
Files:
python/src/server/api_routes/knowledge_api.pypython/src/server/services/embeddings/embedding_exceptions.pypython/src/server/services/embeddings/embedding_service.pypython/src/server/services/search/rag_service.py
{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}
📄 CodeRabbit inference engine (CLAUDE.md)
{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}: Remove dead code immediately; do not keep legacy/unused functions
Avoid comments that reference change history (e.g., LEGACY, CHANGED, REMOVED); keep comments focused on current functionality
Files:
python/src/server/api_routes/knowledge_api.pypython/src/server/services/embeddings/embedding_exceptions.pypython/src/server/services/embeddings/embedding_service.pypython/src/server/services/search/rag_service.py
python/src/server/**
📄 CodeRabbit inference engine (CLAUDE.md)
Keep FastAPI application code under python/src/server/ (routes in api_routes/, services in services/, main in main.py)
Files:
python/src/server/api_routes/knowledge_api.pypython/src/server/services/embeddings/embedding_exceptions.pypython/src/server/services/embeddings/embedding_service.pypython/src/server/services/search/rag_service.py
python/src/{server/services,agents}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely instead of persisting bad data
Files:
python/src/server/services/embeddings/embedding_exceptions.pypython/src/server/services/embeddings/embedding_service.pypython/src/server/services/search/rag_service.py
🧬 Code graph analysis (3)
python/src/server/api_routes/knowledge_api.py (2)
python/src/server/services/embeddings/embedding_exceptions.py (2)
EmbeddingAuthenticationError(86-98)EmbeddingQuotaExhaustedError(46-58)python/src/server/config/logfire_config.py (1)
safe_logfire_error(238-250)
python/src/server/services/embeddings/embedding_service.py (1)
python/src/server/services/embeddings/embedding_exceptions.py (1)
EmbeddingAuthenticationError(86-98)
python/src/server/services/search/rag_service.py (2)
python/src/server/services/embeddings/embedding_exceptions.py (1)
EmbeddingAuthenticationError(86-98)python/src/server/services/embeddings/embedding_service.py (1)
create_embedding(72-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (7)
python/src/server/services/embeddings/embedding_service.py (3)
111-113: Good: let auth errors bubble to HTTP 401.
280-283: Good: immediate bubble-up of auth errors during batch loop.
333-336: Good: top-level re-raise preserves 401 semantics.python/src/server/services/search/rag_service.py (4)
116-118: Good: use single-vector create_embedding(text=...).
144-146: Good: propagate auth failures to avoid silent empty results.
236-243: Reranking gating looks good.Proceeding only when a strategy exists and results are present is correct.
381-383: Good: bubble auth failures to API layer from code examples path.
| except EmbeddingAuthenticationError as e: | ||
| safe_logfire_error(f"Authentication error in code examples search: {str(e)}") | ||
| raise HTTPException(status_code=401, detail="Invalid API key") | ||
| except Exception as e: |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Same here: standardize the 401 response body.
- raise HTTPException(status_code=401, detail="Invalid API key")
+ raise HTTPException(status_code=401, detail={"error": "Invalid API key"})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except EmbeddingAuthenticationError as e: | |
| safe_logfire_error(f"Authentication error in code examples search: {str(e)}") | |
| raise HTTPException(status_code=401, detail="Invalid API key") | |
| except Exception as e: | |
| except EmbeddingAuthenticationError as e: | |
| safe_logfire_error(f"Authentication error in code examples search: {str(e)}") | |
| raise HTTPException(status_code=401, detail={"error": "Invalid API key"}) | |
| except Exception as e: |
🤖 Prompt for AI Agents
In python/src/server/api_routes/knowledge_api.py around lines 794 to 797, the
except block raises HTTPException(status_code=401, detail="Invalid API key")
which returns an ad-hoc string; replace this with the project's standardized 401
response body (either by returning/importing the shared unauthorized response
object/function used elsewhere, e.g. API_RESPONSES.UNAUTHORIZED or a
get_401_response() helper) so the 401 payload matches the rest of the API;
update the raise to use that shared constant/function and import it at the top
of the file.
… prefix; accurate reranking_applied; remove unused import
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
python/src/server/services/embeddings/embedding_exceptions.py (1)
94-99: Prefer neutral naming for logged key contextRename the metadata key to avoid implying an unmasked prefix. Keeps semantics clear in logs.
Apply:
- if masked: - self.metadata["api_key_prefix"] = masked + if masked: + self.metadata["api_key_hint"] = masked
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
python/src/server/api_routes/knowledge_api.py(3 hunks)python/src/server/services/embeddings/embedding_exceptions.py(1 hunks)python/src/server/services/search/rag_service.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- python/src/server/api_routes/knowledge_api.py
- python/src/server/services/search/rag_service.py
🧰 Additional context used
📓 Path-based instructions (6)
python/src/{server,mcp,agents}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/{server,mcp,agents}/**/*.py: Fail fast on service startup failures, missing configuration, database connection issues, auth failures, critical dependency outages, and invalid data that would corrupt state
External API calls should use retry with exponential backoff and ultimately fail with a clear, contextual error message
Error messages must include context (operation being attempted) and relevant IDs/URLs/data for debugging
Preserve full stack traces in logs (e.g., Python logging with exc_info=True)
Use specific exception types; avoid catching broad Exception unless re-raising with context
Never signal failure by returning None/null; raise a descriptive exception instead
Files:
python/src/server/services/embeddings/embedding_exceptions.py
python/src/{server/services,agents}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely instead of persisting bad data
Files:
python/src/server/services/embeddings/embedding_exceptions.py
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: For batch processing and background tasks, continue processing but log detailed per-item failures and return both successes and failures
Do not crash the server on a single WebSocket event failure; log the error and continue serving other clients
Files:
python/src/server/services/embeddings/embedding_exceptions.py
python/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting and Mypy for type checking before commit
Files:
python/src/server/services/embeddings/embedding_exceptions.py
{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}
📄 CodeRabbit inference engine (CLAUDE.md)
{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}: Remove dead code immediately; do not keep legacy/unused functions
Avoid comments that reference change history (e.g., LEGACY, CHANGED, REMOVED); keep comments focused on current functionality
Files:
python/src/server/services/embeddings/embedding_exceptions.py
python/src/server/**
📄 CodeRabbit inference engine (CLAUDE.md)
Keep FastAPI application code under python/src/server/ (routes in api_routes/, services in services/, main in main.py)
Files:
python/src/server/services/embeddings/embedding_exceptions.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (2)
python/src/server/services/embeddings/embedding_exceptions.py (2)
86-101: Auth exception + masked key: LGTMSpecific exception type and masked key handling look correct and align with the PR goals.
86-101: End-to-end 401 handling and embedding call verification complete: EmbeddingAuthenticationError is re-thrown in embedding_service, mapped to HTTP 401 in both RAG query and code examples routes, single-vector create_embedding(text=…) is in place, and reranking is gated by reranking_strategy.
Integrates critical fixes from PR #521 into Thomas's comprehensive error handling infrastructure: - RAG Service: Add EmbeddingAuthenticationError import and propagation - Embedding Service: Fix single embedding function to properly convert openai.AuthenticationError → EmbeddingAuthenticationError - Embedding Service: Improve error type detection in batch failure handling - RAG Service: Let embedding errors bubble up instead of converting to generic errors Now OpenAI authentication errors properly return HTTP 401 instead of HTTP 500: - Before: {"error": "RAG query failed: ..."} [HTTP 500] - After: {"error": "OpenAI API authentication failed", "error_type": "authentication_failed"} [HTTP 401] Tested with both invalid and valid API keys. All endpoints working correctly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Included in #371 |
Summary
Surface OpenAI auth failures as HTTP 401 instead of silent empty results in RAG. Fix single-vector embedding call and guard reranking to avoid undefined variables.
Fixes: #362
Related: #371
What & Why
Users with invalid/expired OpenAI keys saw [] results and no guidance.
This change bubbles EmbeddingAuthenticationError → API returns 401 with "Invalid API key", so users get clear, actionable feedback.
Changes
Server
embedding_exceptions.py: add EmbeddingAuthenticationError.
embedding_service.py:
map openai.AuthenticationError → EmbeddingAuthenticationError.
in create_embedding(), re-raise EmbeddingAuthenticationError so it isn’t wrapped.
rag_service.py:
use single-vector create_embedding(text=...).
guard reranking with self.reranking_strategy is not None.
re-raise EmbeddingAuthenticationError to API.
api_routes/knowledge_api.py:
translate EmbeddingAuthenticationError to HTTP 401 for /api/rag/query (and keep code-examples behavior unchanged).
No schema/config changes.
How I Tested
Unit/functional (inside Docker archon-server):
Targeted RAG tests now pass after fixes:
tests/test_rag_simple.py::TestRAGIntegrationSimple::test_code_search_with_agentic_rag ✅
tests/test_rag_strategies.py::{test_full_rag_pipeline,test_error_handling_in_rag_pipeline} ✅
Quota behavior suite (selected) ✅
Note: tests/mcp_server/... are for the MCP image and are not run in the server container (fail to import mcp in this image). Unchanged by this PR.
Manual API checks (with bogus key):
POST /api/rag/query → 401 {"detail":"Invalid API key"} ✅
POST /api/rag/code-examples → unchanged behavior (not embedding-gated) ✅
UI sanity: with invalid key, crawls proceed until embedding phase; RAG search returns 401 instead of [].
Screenshots / Logs
curl to /api/rag/query → 401 (Invalid API key)
create_embedding("hello") raises EmbeddingAuthenticationError (expected).
Breaking Changes
None. Only error-mapping & guard logic.
Risks & Rollback
Risk: If any upstream code expected silent empty results, it will now see 401.
Rollback: Revert this PR; behavior returns to prior state.
Checklist
Branch on coleam00/Archon (fix/371-auth-bubble-rag)
Clear title & linked issues (#362, #371)
Server tests pass for changed areas (see notes on MCP tests)
No secrets committed; logs sanitize API keys
Matches CLAUDE.md coding rules and existing patterns
Docs/README not required (server-side error semantics only)
Verified in Docker (docker compose up --build -d)
Reviewer Notes
Key touchpoints:
python/src/server/services/embeddings/embedding_exceptions.py
python/src/server/services/embeddings/embedding_service.py
python/src/server/services/search/rag_service.py
python/src/server/api_routes/knowledge_api.py
Focus on the re-raise path for EmbeddingAuthenticationError and the single-vector embedding call.
Summary by CodeRabbit
Bug Fixes
Refactor
Other