Fix critical token consumption issue in list endpoints (#488) by Wirasm · Pull Request #502 · coleam00/Archon

Wirasm · 2025-08-26T20:57:55Z

Pull Request

Summary

This PR fixes the critical token consumption issue reported in #488 where the MCP server's list_projects endpoint was returning 180k+ tokens instead of <1k tokens, causing a 360x cost increase and making MCP unusable.

Changes Made

Added include_content parameter to ProjectService.list_projects() (default: True for backward compatibility)
Added exclude_large_fields parameter to TaskService.list_tasks() (default: False)
Added include_content parameter to DocumentService.list_documents() (default: False)
Updated all MCP tools to request lightweight responses by default
Fixed critical N+1 query problem in ProjectService (was making separate query per project)
Added response size monitoring and logging for validation
Added comprehensive unit and integration tests

Type of Change

Bug fix (non-breaking change which fixes an issue)
Performance improvement

Affected Services

Testing

All existing tests pass
Added new tests for new functionality
Manually tested affected user flows
Docker builds succeed for all services

Test Evidence

# Unit tests
uv run pytest tests/test_token_optimization.py -v
# Result: 8 passed

# Integration tests  
uv run python tests/test_token_optimization_integration.py
# Result: 99.3% token reduction for projects, 98.2% for tasks

# All API tests
uv run pytest tests/test_api_essentials.py -v
# Result: 10 passed

# MCP tools tested directly
# list_projects: Returns lightweight with stats
# list_tasks: Returns lightweight with counts  
# get_project/get_task: Still return full content

Checklist

My code follows the service architecture patterns
If using an AI coding assistant, I used the CLAUDE.md rules
I have added tests that prove my fix/feature works
All new and existing tests pass locally
My changes generate no new warnings
I have updated relevant documentation
I have verified no regressions in existing features

Breaking Changes

None - all changes are backward compatible through default parameter values.

Additional Notes

Performance Impact

Before: N+1 query problem (10 projects = 11 queries, 100 projects = 101 queries)
After: Single query regardless of project count
Token reduction: 99.3% for projects (27k → 194 tokens), 98.2% for tasks (12k → 226 tokens)

Monitoring Added

Response size logging for all list endpoints
Warnings logged for responses >10KB
Metrics tracked: size_bytes, include_content/exclude_large_fields, item count

Fixes #488

Summary by CodeRabbit

New Features
- Added optional lightweight responses for projects, tasks, and documents via request flags; defaults preserve full-content behavior.
- Tasks and projects can return stats/counts instead of large fields; documents can return metadata with content-size stats.
- New flags to include archived items and exclude large task fields.
Performance
- Reduced payload sizes when using lightweight modes.
- Endpoints now log response sizes and warn on unusually large responses.
Tests
- Added unit and integration tests validating lightweight modes, size reductions, and backward compatibility.

- Add include_content parameter to ProjectService.list_projects() - Add exclude_large_fields parameter to TaskService.list_tasks() - Add include_content parameter to DocumentService.list_documents() - Update all MCP tools to use lightweight responses by default - Fix critical N+1 query problem in ProjectService (was making separate query per project) - Add response size monitoring and logging for validation - Add comprehensive unit and integration tests Results: - Projects endpoint: 99.3% token reduction (27,055 -> 194 tokens) - Tasks endpoint: 98.2% token reduction (12,750 -> 226 tokens) - Documents endpoint: Returns metadata with content_size instead of full content - Maintains full backward compatibility with default parameters - Single query optimization eliminates N+1 performance issue

coderabbitai · 2025-08-26T20:58:01Z

Walkthrough

Adds optional lightweight modes and size-logging across projects, tasks, and documents. Client tools call endpoints with query flags to request metadata-only responses; server services and API routes accept include_content/exclude_large_fields flags, conditionally omit large JSON fields, and log response sizes. New unit and integration tests verify behavior.

Changes

Cohort / File(s)	Summary
MCP client tools: lightweight requests `python/src/mcp_server/features/projects/project_tools.py`, `python/src/mcp_server/features/documents/document_tools.py`	Client GETs now include `include_content=False` query parameter to request lightweight responses; call semantics otherwise unchanged.
Server API routes: flags, size-logging, and error handling `python/src/server/api_routes/projects_api.py`	Endpoints accept new flags: `include_content` (projects, project documents) and `exclude_large_fields` (tasks). Response sizes are computed/logged; warnings emitted for large payloads; additional exception logging added. Function signatures updated accordingly.
Project service: optional lightweight listing `python/src/server/services/projects/project_service.py`	`list_projects(include_content: bool = True)` added. When False, returns metadata-only project objects with `stats` (docs_count, features_count, has_data) instead of full content.
Document service: metadata vs full content `python/src/server/services/projects/document_service.py`	`list_documents(project_id, include_content: bool = False)` added. When False returns metadata entries with `stats.content_size`; when True returns full documents.
Task service: large-field exclusion and stats `python/src/server/services/projects/task_service.py`	`list_tasks(..., exclude_large_fields: bool = False, include_archived: bool = False)` added. When `exclude_large_fields=True` omits large JSONB fields per-task and provides counts in a `stats` block; archived filtering controlled by `include_archived`.
Unit tests: token optimization `python/tests/test_token_optimization.py`	New unit tests covering full vs lightweight responses for projects, tasks, documents; token-reduction assertions and default-argument compatibility checks.
Integration tests: endpoint size checks `python/tests/test_token_optimization_integration.py`	New async integration tests that measure response sizes and estimated tokens for /api/projects, /api/tasks, and /api/projects/{id}/docs with various flags; includes server health check and reporting.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client (MCP)
  participant Projects API
  participant ProjectService
  participant DocumentService
  participant TaskService

  rect rgb(240,245,255)
    note over Client (MCP),Projects API: Projects listing with content toggle
    Client (MCP)->>Projects API: GET /api/projects?include_content=false
    Projects API->>ProjectService: list_projects(include_content=False)
    alt include_content=False
      ProjectService-->>Projects API: metadata + stats (no large fields)
    else include_content=True
      ProjectService-->>Projects API: full project content
    end
    Projects API-->>Client (MCP): JSON + size logged
  end

  rect rgb(245,255,240)
    note over Client (MCP),Projects API: Project documents with content toggle
    Client (MCP)->>Projects API: GET /api/projects/{id}/docs?include_content=false
    Projects API->>DocumentService: list_documents(include_content=False)
    DocumentService-->>Projects API: metadata (+stats.content_size)
    Projects API-->>Client (MCP): JSON + size logged
  end

  rect rgb(255,248,240)
    note over Client (MCP),Projects API: Tasks with large-field exclusion
    Client (MCP)->>Projects API: GET /api/tasks?exclude_large_fields=true
    Projects API->>TaskService: list_tasks(exclude_large_fields=True)
    TaskService-->>Projects API: tasks without large fields (+stats counts)
    Projects API-->>Client (MCP): JSON + size logged
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective	Addressed	Explanation
Make MCP function mcp__archon__list_projects return lightweight metadata by default [#488]	❌	Default remains full-content (server `include_content` defaults to True).
API default behavior for /api/projects should be lightweight (include_content defaults to False) [#488]	❌	`list_projects` signature default is `include_content: bool = True`.
Include summary stats in lightweight projects: docs_count, tasks_count, features_count [#488]	❌	`docs_count` and `features_count` (and `has_data`) are provided; `tasks_count` is not present.
Keep total response size ~500–1000 tokens for basic listing [#488]	❓	Response-size logging and warnings were added, but no code-enforced caps or automatic switching to smaller payloads are implemented.

Assessment against linked issues: Out-of-scope changes

Code Change	Explanation
Add `exclude_large_fields` to tasks and per-task stats (python/src/server/services/projects/task_service.py; python/src/server/api_routes/projects_api.py)	Task payload optimization is not part of the #488 objective which targets project listing behavior.
Add `include_content` flag to documents and stats.content_size (python/src/server/services/projects/document_service.py; python/src/server/api_routes/projects_api.py)	Document listing optimization is unrelated to the specific goal of making project listing lightweight by default.
New integration tests for tasks/documents token optimization (python/tests/test_token_optimization_integration.py)	Tests for endpoints beyond project-listing extend scope beyond the single-issue remediation requested in #488.

I nibbled bytes and trimmed the trees,
Turned heavy oaks to feathered leaves.
Projects now stroll light and spry,
Docs and tasks breathe easier, shy.
With saved-up tokens, I hop and cheer—🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/token-optimization-list-endpoints

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Wirasm · 2025-08-26T21:00:32Z

resolves: #488

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/src/server/services/projects/task_service.py (1)

287-312: Use DB-provided counts when excluding large fields

When exclude_large_fields is True, you no longer select sources/code_examples. Switch to using the derived counts.

                 task_data = {
                     "id": task["id"],
                     "project_id": task["project_id"],
                     "title": task["title"],
                     "description": task["description"],
                     "status": task["status"],
                     "assignee": task.get("assignee", "User"),
                     "task_order": task.get("task_order", 0),
                     "feature": task.get("feature"),
                     "created_at": task["created_at"],
                     "updated_at": task["updated_at"],
                 }
                 
                 if not exclude_large_fields:
                     # Include full JSONB fields
                     task_data["sources"] = task.get("sources", [])
                     task_data["code_examples"] = task.get("code_examples", [])
                 else:
                     # Add counts instead of full content
-                    task_data["stats"] = {
-                        "sources_count": len(task.get("sources", [])),
-                        "code_examples_count": len(task.get("code_examples", []))
-                    }
+                    task_data["stats"] = {
+                        "sources_count": task.get("sources_count", 0),
+                        "code_examples_count": task.get("code_examples_count", 0),
+                    }
                 
                 tasks.append(task_data)

🧹 Nitpick comments (16)

python/src/server/services/projects/document_service.py (1)
153-155: Preserve stack traces in logs

Log with exc_info=True per guidelines to keep full tracebacks.
-            logger.error(f"Error listing documents: {e}")
+            logger.error(f"Error listing documents: {e}", exc_info=True)
python/src/server/services/projects/project_service.py (2)
88-111: Optional: Narrow the SELECT even for full-content path to reduce DB egress

Selecting "*" pulls unused columns if any are added later. Being explicit avoids accidental payload bloat.
-                response = (
-                    self.supabase_client.table("archon_projects")
-                    .select("*")
-                    .order("created_at", desc=True)
-                    .execute()
-                )
+                response = (
+                    self.supabase_client.table("archon_projects")
+                    .select("id, title, github_repo, created_at, updated_at, pinned, description, docs, features, data")
+                    .order("created_at", desc=True)
+                    .execute()
+                )
146-148: Preserve stack traces in logs
-            logger.error(f"Error listing projects: {e}")
+            logger.error(f"Error listing projects: {e}", exc_info=True)
python/src/server/services/projects/task_service.py (1)
328-330: Preserve stack traces in logs
-            logger.error(f"Error listing tasks: {e}")
+            logger.error(f"Error listing tasks: {e}", exc_info=True)
python/src/mcp_server/features/projects/project_tools.py (1)
178-182: Passing include_content=False here is exactly what we need

This ensures lightweight project listings by default from the MCP side. Nice.

As a follow-up, consider also passing include_content=False in the create_project polling GET to reduce payload during polling. Example (outside this hunk):
list_response = await poll_client.get(urljoin(api_url, "/api/projects"), params={"include_content": False})
python/tests/test_token_optimization_integration.py (4)
6-10: Remove unused import and modernize typing imports

json is unused and Ruff will flag it (F401). Also prefer builtin dict[...]/tuple[...] over typing.Dict/Tuple on Python 3.12.

Apply:
 import httpx
-import json
 import asyncio
-from typing import Dict, Any, Tuple
+from typing import Any
131-133: Turn unexpected document endpoint failures into skips, not silent passes

When the server is absent or the API contract differs, mark the test as skipped to surface it in CI.

Apply:
-        except Exception as e:
-            print(f"\n⚠️  Could not test documents endpoint: {e}")
+        except Exception as e:
+            pytest.skip(f"Could not test documents endpoint: {e}")
149-152: Mark optional MCP health check as skipped when unavailable

This keeps CI output explicit without failing the suite.

Apply:
-        except httpx.ConnectError:
-            print("ℹ️  MCP server not running (optional for tests)")
+        except httpx.ConnectError:
+            pytest.skip("MCP server not running (optional for tests)")
155-189: Consider moving the script-style runner to tools/ or guard it behind an env flag

Having both pytest-style tests and a script entry point in the same file invites confusion. If you want to keep the CLI runner, gate it so it’s not used by default.

For example:
-if __name__ == "__main__":
-    asyncio.run(main())
+if __name__ == "__main__":
+    import os
+    if os.getenv("RUN_INTEGRATION_CLI", "0") == "1":
+        asyncio.run(main())
+    else:
+        print("Set RUN_INTEGRATION_CLI=1 to run this module as a script.")
python/src/server/api_routes/projects_api.py (5)
12-15: Remove unused import

sys is imported but never used; Ruff will flag it (F401).
-import sys
533-537: Add response size metrics to project task listing for consistency

Other endpoints now log size_bytes and warn >10KB. Mirror that here to detect regressions.
         logfire.info(
             f"Project tasks retrieved | project_id={project_id} | task_count={len(filtered_tasks)}"
         )
 
-        return filtered_tasks
+        # Monitor response size
+        response_json = json.dumps(filtered_tasks)
+        response_size = len(response_json)
+        logfire.info(
+            f"Project tasks listed | project_id={project_id} | count={len(filtered_tasks)} | size_bytes={response_size} | exclude_large_fields={exclude_large_fields}"
+        )
+        if response_size > 10_000:
+            logfire.warning(
+                f"Large project task response | project_id={project_id} | size_bytes={response_size} | count={len(filtered_tasks)}"
+            )
+
+        return filtered_tasks
593-596: Log exclude_large_fields in the request-summary line

It’s present in the success metrics but not in the initial request log.
-        logfire.info(
-            f"Listing tasks | status={status} | project_id={project_id} | include_closed={include_closed} | page={page} | per_page={per_page}"
-        )
+        logfire.info(
+            f"Listing tasks | status={status} | project_id={project_id} | include_closed={include_closed} | page={page} | per_page={per_page} | exclude_large_fields={exclude_large_fields}"
+        )
221-267: Health check should avoid heavy queries

projects_health currently calls list_projects() with default include_content=True, which may be large/slow and defeats the “fail fast” guideline. Use lightweight modes for both projects and tasks to reduce load during health checks.
-            project_service = ProjectService(supabase_client)
-            # Try to list projects with limit 1 to test table access
-            success, _ = project_service.list_projects()
+            project_service = ProjectService(supabase_client)
+            # Use lightweight listing for health check
+            success, _ = project_service.list_projects(include_content=False)
@@
-            task_service = TaskService(supabase_client)
-            # Try to list tasks with limit 1 to test table access
-            success, _ = task_service.list_tasks(include_closed=True)
+            task_service = TaskService(supabase_client)
+            # Use lightweight listing for health check
+            success, _ = task_service.list_tasks(include_closed=True, exclude_large_fields=True)
Also applies to: 232-246

847-875: Add size metrics and large-response warning on documents list

To align with the other endpoints and the PR objectives (monitor and alert on large responses), add size_bytes logging and a 10KB warning for documents.
         logfire.info(
             f"Documents listed successfully | project_id={project_id} | count={result.get('total_count', 0)} | lightweight={not include_content}"
         )
 
-        return result
+        # Monitor response size for optimization validation
+        response_json = json.dumps(result)
+        response_size = len(response_json)
+        logfire.info(
+            f"Documents response size | project_id={project_id} | size_bytes={response_size} | include_content={include_content}"
+        )
+        if response_size > 10_000:
+            logfire.warning(
+                f"Large documents response | project_id={project_id} | size_bytes={response_size}"
+            )
+
+        return result
python/tests/test_token_optimization.py (2)
163-201: Mock chain is brittle; prefer explicit method-return mocks

The mock_select.neq().or_.return_value = mock_or pattern works but is easy to break if call order changes. Consider assigning each intermediate return explicitly to keep intent clear.

Example:
# More explicit chaining
mock_after_neq = Mock()
mock_select.neq.return_value = mock_after_neq
mock_after_neq.or_.return_value = mock_or
This keeps failures localized if the query pipeline evolves.

122-158: Token reduction test is solid but can assert minimum target explicitly

You already assert >95% which matches PR goals. Consider parametrizing the threshold via an env var so CI can relax/tighten without code change.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1b030 and f9d245b.

📒 Files selected for processing (8)

python/src/mcp_server/features/documents/document_tools.py (1 hunks)
python/src/mcp_server/features/projects/project_tools.py (1 hunks)
python/src/server/api_routes/projects_api.py (8 hunks)
python/src/server/services/projects/document_service.py (2 hunks)
python/src/server/services/projects/project_service.py (1 hunks)
python/src/server/services/projects/task_service.py (3 hunks)
python/tests/test_token_optimization.py (1 hunks)
python/tests/test_token_optimization_integration.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (7)

python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting and Mypy for type checking before commit

Files:

python/src/mcp_server/features/projects/project_tools.py
python/src/mcp_server/features/documents/document_tools.py
python/src/server/services/projects/task_service.py
python/src/server/services/projects/document_service.py
python/tests/test_token_optimization_integration.py
python/tests/test_token_optimization.py
python/src/server/services/projects/project_service.py
python/src/server/api_routes/projects_api.py

{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}

📄 CodeRabbit inference engine (CLAUDE.md)

{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}: Remove dead code immediately; do not keep legacy/unused functions
Avoid comments that reference change history (e.g., LEGACY, CHANGED, REMOVED); keep comments focused on current functionality

Files:

python/src/mcp_server/features/projects/project_tools.py
python/src/mcp_server/features/documents/document_tools.py
python/src/server/services/projects/task_service.py
python/src/server/services/projects/document_service.py
python/tests/test_token_optimization_integration.py
python/tests/test_token_optimization.py
python/src/server/services/projects/project_service.py
python/src/server/api_routes/projects_api.py

python/src/{server,mcp,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/{server,mcp,agents}/**/*.py: Fail fast on service startup failures, missing configuration, database connection issues, auth failures, critical dependency outages, and invalid data that would corrupt state
External API calls should use retry with exponential backoff and ultimately fail with a clear, contextual error message
Error messages must include context (operation being attempted) and relevant IDs/URLs/data for debugging
Preserve full stack traces in logs (e.g., Python logging with exc_info=True)
Use specific exception types; avoid catching broad Exception unless re-raising with context
Never signal failure by returning None/null; raise a descriptive exception instead

Files:

python/src/server/services/projects/task_service.py
python/src/server/services/projects/document_service.py
python/src/server/services/projects/project_service.py
python/src/server/api_routes/projects_api.py

python/src/{server/services,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely instead of persisting bad data

Files:

python/src/server/services/projects/task_service.py
python/src/server/services/projects/document_service.py
python/src/server/services/projects/project_service.py

python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: For batch processing and background tasks, continue processing but log detailed per-item failures and return both successes and failures
Do not crash the server on a single WebSocket event failure; log the error and continue serving other clients

Files:

python/src/server/services/projects/task_service.py
python/src/server/services/projects/document_service.py
python/src/server/services/projects/project_service.py
python/src/server/api_routes/projects_api.py

python/src/server/**

📄 CodeRabbit inference engine (CLAUDE.md)

Keep FastAPI application code under python/src/server/ (routes in api_routes/, services in services/, main in main.py)

Files:

python/src/server/services/projects/task_service.py
python/src/server/services/projects/document_service.py
python/src/server/services/projects/project_service.py
python/src/server/api_routes/projects_api.py

python/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place backend tests under python/tests/

Files:

python/tests/test_token_optimization_integration.py
python/tests/test_token_optimization.py

🧬 Code graph analysis (4)

python/src/server/services/projects/document_service.py (1)

python/src/mcp_server/features/documents/document_tools.py (1)

list_documents (129-167)

python/tests/test_token_optimization.py (6)

python/src/server/services/projects/project_service.py (2)

ProjectService (20-365)

list_projects (76-148)

python/src/server/services/projects/task_service.py (2)

TaskService (57-492)

list_tasks (188-330)

python/src/server/services/projects/document_service.py (2)

DocumentService (21-356)

list_documents (99-155)

python/src/mcp_server/features/projects/project_tools.py (1)

list_projects (163-198)

python/src/server/api_routes/projects_api.py (2)

list_projects (79-129)

list_tasks (583-658)

python/src/mcp_server/features/documents/document_tools.py (1)

list_documents (129-167)

python/src/server/services/projects/project_service.py (2)

python/src/mcp_server/features/projects/project_tools.py (1)

list_projects (163-198)

python/src/server/api_routes/projects_api.py (1)

list_projects (79-129)

python/src/server/api_routes/projects_api.py (5)

python/src/mcp_server/features/projects/project_tools.py (1)

list_projects (163-198)

python/src/server/services/projects/project_service.py (1)

list_projects (76-148)

python/src/server/services/projects/task_service.py (2)

TaskService (57-492)

list_tasks (188-330)

python/src/server/services/projects/document_service.py (1)

list_documents (99-155)

python/src/mcp_server/features/documents/document_tools.py (1)

list_documents (129-167)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Backend Tests (Python + pytest)

🔇 Additional comments (7)

python/src/server/services/projects/document_service.py (1)

99-107: Good addition: include_content toggle (default False) aligns with token-optimization goals

The new parameter and docstring look correct and keep the default lightweight. No functional concerns here.
python/src/server/services/projects/project_service.py (2)
76-83: include_content flag and docs are clear; default maintains backward compatibility

Signature and docstring match the PR intent. Good separation of full-content vs lightweight behavior.

112-143: Implement tasks_count via a single aggregated PostgREST query

Please extend the summary‐stats to include tasks_count without introducing N+1 queries. For example, in python/src/server/services/projects/project_service.py (lines 112–143):
Add a grouped count request on archon_tasks:
# Fetch per‐project task counts in one round-trip
tasks_response = (
    self.supabase_client
        .table("archon_tasks")
        .select("project_id, tasks_count:count(id)", count="exact")
        .group("project_id")
        .execute()
)

# Build a lookup of project_id → tasks_count
tasks_count_map = {
    row["project_id"]: row["tasks_count"] for row in tasks_response.data
}
Merge into your existing loop:
 for project in response.data:
     docs_count = len(project.get("docs", []))
     features_count = len(project.get("features", []))
     has_data = bool(project.get("data", []))
 tasks_count = tasks_count_map.get(project["id"], 0)

 projects.append({
     "id": project["id"],
     "title": project["title"],
     …
     "stats": {
         "docs_count": docs_count,
         "tasks_count": tasks_count,
         "features_count": features_count,
         "has_data": has_data
     }
 })
Let me know if you’d like a draft SQL-RPC fallback or if you need help wiring the Python client call.
python/src/server/services/projects/task_service.py (1)

188-194: New exclude_large_fields parameter is appropriate and default-safe

Signature and docstring are clear; default False preserves backward compatibility.

python/src/mcp_server/features/documents/document_tools.py (1)

147-151: Good: documents list now requests lightweight payloads by default

Aligns the tool with the server-side toggle and reduces token use significantly.

python/tests/test_token_optimization_integration.py (1)

6-10: The verification script is running to:

Confirm whether async def tests exist in python/tests/test_token_optimization_integration.py

Locate any pytest config (pytest.ini, setup.cfg, or pyproject.toml)

Inspect those configs for asyncio_mode=auto or pytest-asyncio settings

I’ll integrate the findings into the review once available.

python/tests/test_token_optimization.py (1)

335-353: Good backward-compat coverage of defaults

Asserting default values on the service APIs is helpful to prevent accidental breaking changes to API behavior. Nice touch.

coderabbitai · 2025-08-26T21:08:13Z

+async def list_project_tasks(project_id: str, include_archived: bool = False, exclude_large_fields: bool = False):
    """List all tasks for a specific project. By default, filters out archived tasks."""
    try:
        logfire.info(
-            f"Listing project tasks | project_id={project_id} | include_archived={include_archived}"
+            f"Listing project tasks | project_id={project_id} | include_archived={include_archived} | exclude_large_fields={exclude_large_fields}"
        )

        # Use TaskService to list tasks
        task_service = TaskService()
        success, result = task_service.list_tasks(
            project_id=project_id,
            include_closed=True,  # Get all tasks, we'll filter archived separately
+            exclude_large_fields=exclude_large_fields,
        )


⚠️ Potential issue

include_archived flag is ineffective; archived tasks are always excluded and 'archived' isn’t in task payload

TaskService currently applies or_("archived.is.null,archived.is.false") unconditionally and the task DTO does not include an archived field. As a result, include_archived here has no effect and your Python-side filter never triggers. This is a functional bug for consumers expecting archived tasks when include_archived=True.

Recommended changes:

Thread include_archived down to TaskService and honor it in the query.

Include archived in the task DTO so API-side filters (and clients) can reason about it.

Change in this file (pass the flag):

success, result = task_service.list_tasks( project_id=project_id, include_closed=True, # Get all tasks, we'll filter archived separately - exclude_large_fields=exclude_large_fields, + exclude_large_fields=exclude_large_fields, + include_archived=include_archived, )

And update the filter block to rely on the presence of archived:

- if not include_archived and task.get("archived", False): + if not include_archived and task.get("archived") is True: continue

Additionally, apply the following changes outside this file (TaskService) to complete the fix:

# python/src/server/services/projects/task_service.py (signature + query + DTO) def list_tasks( self, project_id: str | None = None, status: str | None = None, include_closed: bool = False, exclude_large_fields: bool = False, include_archived: bool = False, # NEW ) -> tuple[bool, dict[str, Any]]: ... # Only exclude archived when requested if not include_archived: query = query.or_("archived.is.null,archived.is.false") filters_applied.append("exclude archived tasks (null or false)") else: filters_applied.append("include archived tasks") ... task_data = { "id": task["id"], "project_id": task["project_id"], "title": task["title"], "description": task["description"], "status": task["status"], "assignee": task.get("assignee", "User"), "task_order": task.get("task_order", 0), "feature": task.get("feature"), "created_at": task["created_at"], "updated_at": task["updated_at"], "archived": task.get("archived", False), # NEW }

Please confirm if you want me to push a follow-up patch touching TaskService as above.

Also applies to: 528-537

🤖 Prompt for AI Agents

In python/src/server/api_routes/projects_api.py around lines 504-517 (and also apply the same change at lines 528-537), the include_archived parameter is not passed into TaskService so archived tasks are always excluded and the task DTO lacks an archived field; thread include_archived into the TaskService.list_tasks call here and in the other block, e.g. task_service.list_tasks(..., include_archived=include_archived), and then update TaskService (python/src/server/services/projects/task_service.py) to accept include_archived, conditionally apply the archived filter only when include_archived is False, and include archived in the returned task DTO (archived: task.get("archived", False)) so API-side filtering and clients can see archived status.

coderabbitai · 2025-08-26T21:08:13Z

+            # Format documents for response
            documents = []
            for doc in docs:
-                documents.append({
-                    "id": doc.get("id"),
-                    "document_type": doc.get("document_type"),
-                    "title": doc.get("title"),
-                    "status": doc.get("status"),
-                    "version": doc.get("version"),
-                    "tags": doc.get("tags", []),
-                    "author": doc.get("author"),
-                    "created_at": doc.get("created_at"),
-                    "updated_at": doc.get("updated_at"),
-                })
+                if include_content:
+                    # Return full document
+                    documents.append(doc)
+                else:
+                    # Return metadata only
+                    documents.append({
+                        "id": doc.get("id"),
+                        "document_type": doc.get("document_type"),
+                        "title": doc.get("title"),
+                        "status": doc.get("status"),
+                        "version": doc.get("version"),
+                        "tags": doc.get("tags", []),
+                        "author": doc.get("author"),
+                        "created_at": doc.get("created_at"),
+                        "updated_at": doc.get("updated_at"),
+                        "stats": {
+                            "content_size": len(str(doc.get("content", {})))
+                        }
+                    })



🛠️ Refactor suggestion

Report size in bytes and align stat naming; avoid len(str(...)) for accuracy and consistency

Using len(str(content)) is misleading (Python repr, not JSON bytes) and inconsistent with the PR’s “size_bytes” metric elsewhere.

Prefer JSON-encoded byte length and standardize the field to stats.size_bytes.

Minor: created_at/updated_at can be None for many docs (add_document doesn’t set created_at); metadata will show nulls.

Apply this focused change to compute accurate size and rename the stat:

- documents.append({ + documents.append({ "id": doc.get("id"), "document_type": doc.get("document_type"), "title": doc.get("title"), "status": doc.get("status"), "version": doc.get("version"), "tags": doc.get("tags", []), "author": doc.get("author"), "created_at": doc.get("created_at"), "updated_at": doc.get("updated_at"), "stats": { - "content_size": len(str(doc.get("content", {}))) + "size_bytes": len(json.dumps(doc.get("content", {}), separators=(',', ':'), ensure_ascii=False).encode('utf-8')) } })

Outside this hunk, add the missing import at top-level:

import json

Optionally (outside this hunk), consider setting timestamps at creation to make metadata complete:

# in add_document() when building new_doc "created_at": datetime.now().isoformat(), "updated_at": datetime.now().isoformat(),

🤖 Prompt for AI Agents

In python/src/server/services/projects/document_service.py around lines 124 to 146, the stats currently use len(str(doc.get("content", {}))) and the key is "content_size"; change this to compute the JSON-encoded byte length and rename the stat to "size_bytes" for accuracy and consistency (e.g. size_bytes = len(json.dumps(doc.get("content", "")).encode("utf-8")) and set stats: {"size_bytes": size_bytes}). Also add import json at the module top-level outside this hunk. Optionally consider populating created_at/updated_at in add_document() if you want non-null timestamps.

coderabbitai · 2025-08-26T21:08:13Z

+                # Lightweight response for MCP - fetch all data but only return metadata + stats
+                # FIXED: N+1 query problem - now using single query
+                response = (
+                    self.supabase_client.table("archon_projects")
+                    .select("*")  # Fetch all fields in single query
+                    .order("created_at", desc=True)
+                    .execute()
+                )
+
+                projects = []
+                for project in response.data:
+                    # Calculate counts from fetched data (no additional queries)
+                    docs_count = len(project.get("docs", []))
+                    features_count = len(project.get("features", []))
+                    has_data = bool(project.get("data", []))
+
+                    # Return only metadata + stats, excluding large JSONB fields
+                    projects.append({
+                        "id": project["id"],
+                        "title": project["title"],
+                        "github_repo": project.get("github_repo"),
+                        "created_at": project["created_at"],
+                        "updated_at": project["updated_at"],
+                        "pinned": project.get("pinned", False),
+                        "description": project.get("description", ""),
+                        "stats": {
+                            "docs_count": docs_count,
+                            "features_count": features_count,
+                            "has_data": has_data
+                        }
+                    })



🛠️ Refactor suggestion

Lightweight path still fetches heavy JSON; push counts to the DB and omit large fields from SELECT

Currently .select("*") fetches docs/features/data in full, then discards them. That saves client tokens but not DB egress/CPU. Compute counts server-side and select only needed columns.

- response = ( - self.supabase_client.table("archon_projects") - .select("*") # Fetch all fields in single query - .order("created_at", desc=True) - .execute() - ) + response = ( + self.supabase_client.table("archon_projects") + .select( + "id, title, github_repo, created_at, updated_at, pinned, description, " + "docs_count:jsonb_array_length(docs), " + "features_count:jsonb_array_length(features), " + "data_count:jsonb_array_length(data)" + ) + .order("created_at", desc=True) + .execute() + ) @@ - for project in response.data: - # Calculate counts from fetched data (no additional queries) - docs_count = len(project.get("docs", [])) - features_count = len(project.get("features", [])) - has_data = bool(project.get("data", [])) + for project in response.data: + docs_count = project.get("docs_count", 0) + features_count = project.get("features_count", 0) + has_data = (project.get("data_count", 0) > 0)

Note: PR objectives require tasks_count in the summary stats. It’s currently missing. Consider a single aggregated query (no N+1) to fetch per-project task counts (e.g., group by project_id or an RPC), then merge into the loop.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Lightweight response for MCP - fetch all data but only return metadata + stats

# FIXED: N+1 query problem - now using single query

response = (

self.supabase_client.table("archon_projects")

.select("*") # Fetch all fields in single query

.order("created_at", desc=True)

.execute()

)

projects = []

for project in response.data:

# Calculate counts from fetched data (no additional queries)

docs_count = len(project.get("docs", []))

features_count = len(project.get("features", []))

has_data = bool(project.get("data", []))

# Return only metadata + stats, excluding large JSONB fields

projects.append({

"id": project["id"],

"title": project["title"],

"github_repo": project.get("github_repo"),

"created_at": project["created_at"],

"updated_at": project["updated_at"],

"pinned": project.get("pinned", False),

"description": project.get("description", ""),

"stats": {

"docs_count": docs_count,

"features_count": features_count,

"has_data": has_data

}

})

# Lightweight response for MCP - fetch only metadata + stats via DB-side counts

# FIXED: N+1 query problem - now using single query

response = (

self.supabase_client.table("archon_projects")

.select(

"id, title, github_repo, created_at, updated_at, pinned, description, "

"docs_count:jsonb_array_length(docs), "

"features_count:jsonb_array_length(features), "

"data_count:jsonb_array_length(data)"

)

.order("created_at", desc=True)

.execute()

)

projects = []

for project in response.data:

docs_count = project.get("docs_count", 0)

features_count = project.get("features_count", 0)

has_data = (project.get("data_count", 0) > 0)

# Return only metadata + stats, excluding large JSONB fields

projects.append({

"id": project["id"],

"title": project["title"],

"github_repo": project.get("github_repo"),

"created_at": project["created_at"],

"updated_at": project["updated_at"],

"pinned": project.get("pinned", False),

"description": project.get("description", ""),

"stats": {

"docs_count": docs_count,

"features_count": features_count,

"has_data": has_data

}

})

🤖 Prompt for AI Agents

python/src/server/services/projects/project_service.py lines 112-143: the current lightweight path uses .select("*") which pulls large JSONB fields (docs, features, data) and only uses their counts—change the implementation to SELECT only required scalar columns (id, title, github_repo, created_at, updated_at, pinned, description) and remove large fields from the response, and push the counts to the DB by issuing a single aggregated query (GROUP BY project_id) that returns docs_count, features_count and tasks_count for each project (or use an RPC/aggregation) and then merge those aggregated counts into the projects loop to build the final metadata+stats objects so there are no large JSONB fields fetched and tasks_count is included.

coderabbitai · 2025-08-26T21:08:13Z

+            if exclude_large_fields:
+                # Select all fields except large JSONB ones
+                query = self.supabase_client.table("archon_tasks").select(
+                    "id, project_id, parent_task_id, title, description, "
+                    "status, assignee, task_order, feature, archived, "
+                    "archived_at, archived_by, created_at, updated_at, "
+                    "sources, code_examples"  # Still fetch for counting, but will process differently
+                )
+            else:
+                query = self.supabase_client.table("archon_tasks").select("*")



🛠️ Refactor suggestion

Don’t fetch large JSON when exclude_large_fields=True; compute counts in the DB

Right now you still select sources and code_examples “for counting,” which defeats the goal of minimizing payload between DB and server.

Refactor to select only lightweight columns plus derived counts:

- if exclude_large_fields: - # Select all fields except large JSONB ones - query = self.supabase_client.table("archon_tasks").select( - "id, project_id, parent_task_id, title, description, " - "status, assignee, task_order, feature, archived, " - "archived_at, archived_by, created_at, updated_at, " - "sources, code_examples" # Still fetch for counting, but will process differently - ) + if exclude_large_fields: + # Select only lightweight columns and compute counts server-side + query = self.supabase_client.table("archon_tasks").select( + "id, project_id, parent_task_id, title, description, " + "status, assignee, task_order, feature, archived, " + "archived_at, archived_by, created_at, updated_at, " + "sources_count:jsonb_array_length(sources), " + "code_examples_count:jsonb_array_length(code_examples)" + ) else: query = self.supabase_client.table("archon_tasks").select("*")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if exclude_large_fields:

# Select all fields except large JSONB ones

query = self.supabase_client.table("archon_tasks").select(

"id, project_id, parent_task_id, title, description, "

"status, assignee, task_order, feature, archived, "

"archived_at, archived_by, created_at, updated_at, "

"sources, code_examples" # Still fetch for counting, but will process differently

)

else:

query = self.supabase_client.table("archon_tasks").select("*")

if exclude_large_fields:

# Select only lightweight columns and compute counts server-side

query = self.supabase_client.table("archon_tasks").select(

"id, project_id, parent_task_id, title, description, "

"status, assignee, task_order, feature, archived, "

"archived_at, archived_by, created_at, updated_at, "

"sources_count:jsonb_array_length(sources), "

"code_examples_count:jsonb_array_length(code_examples)"

)

else:

query = self.supabase_client.table("archon_tasks").select("*")

- Add include_archived parameter to TaskService.list_tasks() - Service now conditionally applies archived filter based on parameter - Add 'archived' field to task DTO for client visibility - Update API endpoints to pass include_archived down to service - Remove redundant client-side filtering in API layer - Fix type hints in integration tests (dict[str, Any] | None) - Use pytest.skip() instead of return for proper test reporting These fixes address the functional bug identified by CodeRabbit where archived tasks couldn't be retrieved even when explicitly requested.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

python/src/server/services/projects/task_service.py (2)

244-250: Archived filtering honored correctly

Conditional archived filter fixes earlier bug. Good.

211-220: Don’t fetch large JSON when exclude_large_fields=True; compute counts in the DB

Selecting sources and code_examples defeats the goal of minimizing payload. Ask PostgREST to compute counts and omit the arrays.

Apply:

-            if exclude_large_fields:
-                # Select all fields except large JSONB ones
-                query = self.supabase_client.table("archon_tasks").select(
-                    "id, project_id, parent_task_id, title, description, "
-                    "status, assignee, task_order, feature, archived, "
-                    "archived_at, archived_by, created_at, updated_at, "
-                    "sources, code_examples"  # Still fetch for counting, but will process differently
-                )
+            if exclude_large_fields:
+                # Select only lightweight columns and compute counts server-side
+                query = self.supabase_client.table("archon_tasks").select(
+                    "id, project_id, parent_task_id, title, description, "
+                    "status, assignee, task_order, feature, archived, "
+                    "archived_at, archived_by, created_at, updated_at, "
+                    "sources_count:jsonb_array_length(coalesce(sources, '[]'::jsonb)), "
+                    "code_examples_count:jsonb_array_length(coalesce(code_examples, '[]'::jsonb))"
+                )

🧹 Nitpick comments (4)

python/src/server/api_routes/projects_api.py (4)

12-15: Remove unused import

sys is not used.

 import asyncio
-import json
+import json
 import secrets
-import sys
 from typing import Any

127-129: Preserve stack traces in error logs

Include exc_info to comply with logging guidelines.

-        logfire.error(f"Failed to list projects | error={str(e)}")
+        logfire.error(f"Failed to list projects | error={str(e)}", exc_info=True)

504-529: Add response size metrics for list_project_tasks (consistency + visibility)

Mirror the size logging/warning used elsewhere to catch regressions on this hot path.

-        logfire.info(
-            f"Project tasks retrieved | project_id={project_id} | task_count={len(tasks)}"
-        )
-
-        return tasks
+        # Measure payload size for monitoring
+        response_json = json.dumps(tasks)
+        response_size = len(response_json)
+        logfire.info(
+            f"Project tasks listed | project_id={project_id} | task_count={len(tasks)} | "
+            f"size_bytes={response_size} | include_archived={include_archived} | "
+            f"exclude_large_fields={exclude_large_fields}"
+        )
+        if response_size > 10000:
+            logfire.warning(
+                f"Large project-task response | project_id={project_id} | "
+                f"size_bytes={response_size} | task_count={len(tasks)} | "
+                f"exclude_large_fields={exclude_large_fields}"
+            )
+
+        return tasks

839-865: Add response size metrics for documents listing

Track bytes and warn on large results, similar to projects/tasks.

         logfire.info(
             f"Documents listed successfully | project_id={project_id} | count={result.get('total_count', 0)} | lightweight={not include_content}"
         )
 
-        return result
+        # Monitor response size
+        response_json = json.dumps(result)
+        response_size = len(response_json)
+        logfire.info(
+            f"Document list payload | project_id={project_id} | size_bytes={response_size} | "
+            f"include_content={include_content}"
+        )
+        if response_size > 10000:
+            logfire.warning(
+                f"Large document response | project_id={project_id} | size_bytes={response_size} | "
+                f"include_content={include_content} | count={result.get('total_count', 0)}"
+            )
+
+        return result

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f9d245b and c19e85f.

📒 Files selected for processing (3)

python/src/server/api_routes/projects_api.py (8 hunks)
python/src/server/services/projects/task_service.py (4 hunks)
python/tests/test_token_optimization_integration.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

python/tests/test_token_optimization_integration.py

🧰 Additional context used

📓 Path-based instructions (6)

python/src/{server,mcp,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/{server,mcp,agents}/**/*.py: Fail fast on service startup failures, missing configuration, database connection issues, auth failures, critical dependency outages, and invalid data that would corrupt state
External API calls should use retry with exponential backoff and ultimately fail with a clear, contextual error message
Error messages must include context (operation being attempted) and relevant IDs/URLs/data for debugging
Preserve full stack traces in logs (e.g., Python logging with exc_info=True)
Use specific exception types; avoid catching broad Exception unless re-raising with context
Never signal failure by returning None/null; raise a descriptive exception instead

Files:

python/src/server/services/projects/task_service.py
python/src/server/api_routes/projects_api.py

python/src/{server/services,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely instead of persisting bad data

Files:

python/src/server/services/projects/task_service.py

python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: For batch processing and background tasks, continue processing but log detailed per-item failures and return both successes and failures
Do not crash the server on a single WebSocket event failure; log the error and continue serving other clients

Files:

python/src/server/services/projects/task_service.py
python/src/server/api_routes/projects_api.py

python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting and Mypy for type checking before commit

Files:

python/src/server/services/projects/task_service.py
python/src/server/api_routes/projects_api.py

{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}

📄 CodeRabbit inference engine (CLAUDE.md)

{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}: Remove dead code immediately; do not keep legacy/unused functions
Avoid comments that reference change history (e.g., LEGACY, CHANGED, REMOVED); keep comments focused on current functionality

Files:

python/src/server/services/projects/task_service.py
python/src/server/api_routes/projects_api.py

python/src/server/**

📄 CodeRabbit inference engine (CLAUDE.md)

Keep FastAPI application code under python/src/server/ (routes in api_routes/, services in services/, main in main.py)

Files:

python/src/server/services/projects/task_service.py
python/src/server/api_routes/projects_api.py

🧬 Code graph analysis (1)

python/src/server/api_routes/projects_api.py (4)

python/src/mcp_server/features/projects/project_tools.py (1)

list_projects (163-198)

python/src/server/services/projects/project_service.py (1)

list_projects (76-148)

python/src/server/services/projects/task_service.py (1)

list_tasks (188-336)

python/src/server/services/projects/document_service.py (1)

list_documents (99-155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Backend Tests (Python + pytest)

🔇 Additional comments (3)

python/src/server/services/projects/task_service.py (1)

189-205: Signature + docstrings for lightweight/archived flags look good

Parameters are well-scoped and documented. Backward compatibility maintained.

python/src/server/api_routes/projects_api.py (2)

78-122: Projects endpoint: include_content toggle + size logging LGTM

Clear behavior and useful metrics. Backward compatibility maintained by default True.

616-645: Tasks endpoint: pagination + size logging LGTM

Nice size metrics with warning threshold; aligns with PR goals.

coderabbitai · 2025-08-27T08:12:53Z

+                if not exclude_large_fields:
+                    # Include full JSONB fields
+                    task_data["sources"] = task.get("sources", [])
+                    task_data["code_examples"] = task.get("code_examples", [])
+                else:
+                    # Add counts instead of full content
+                    task_data["stats"] = {
+                        "sources_count": len(task.get("sources", [])),
+                        "code_examples_count": len(task.get("code_examples", []))
+                    }
+
+                tasks.append(task_data)


🛠️ Refactor suggestion

Use the DB-provided counts; avoid len() on arrays

After the select above, rely on sources_count/code_examples_count so arrays never cross the wire.

Apply:

- else: - # Add counts instead of full content - task_data["stats"] = { - "sources_count": len(task.get("sources", [])), - "code_examples_count": len(task.get("code_examples", [])) - } + else: + # Add counts instead of full content (computed in SQL) + task_data["stats"] = { + "sources_count": task.get("sources_count", 0), + "code_examples_count": task.get("code_examples_count", 0), + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not exclude_large_fields:

# Include full JSONB fields

task_data["sources"] = task.get("sources", [])

task_data["code_examples"] = task.get("code_examples", [])

else:

# Add counts instead of full content

task_data["stats"] = {

"sources_count": len(task.get("sources", [])),

"code_examples_count": len(task.get("code_examples", []))

}

tasks.append(task_data)

if not exclude_large_fields:

# Include full JSONB fields

task_data["sources"] = task.get("sources", [])

task_data["code_examples"] = task.get("code_examples", [])

else:

# Add counts instead of full content (computed in SQL)

task_data["stats"] = {

"sources_count": task.get("sources_count", 0),

"code_examples_count": task.get("code_examples_count", 0),

}

tasks.append(task_data)

🤖 Prompt for AI Agents

In python/src/server/services/projects/task_service.py around lines 306 to 317, the code currently computes counts with len(task.get("sources", [])) and len(task.get("code_examples", [])) which defeats the goal of using DB-provided counts; change the branch that adds stats to use the counts already selected by the query (task.get("sources_count") and task.get("code_examples_count")) and fall back to 0 if those keys are missing, leaving the include-full-fields branch unchanged so full JSONB arrays are only sent when exclude_large_fields is False.

…leam00#502) Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.21.0 to 0.21.1. - [Release notes](https://github.com/anchore/sbom-action/releases) - [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md) - [Commits](anchore/sbom-action@v0.21.0...v0.21.1) --- updated-dependencies: - dependency-name: anchore/sbom-action dependency-version: 0.21.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

GitHub comment posting retries were based only on message substring checks, so structured Octokit HTTP errors (like status 502) could bypass retries and fail prematurely. Changes: - Added typed status extraction in GitHub adapter retry classification. - Retry transient HTTP statuses 429, 502, 503, and 504 before string fallback. - Added regression tests for structured 502 retry and structured 401 non-retry behavior. Fixes #502

…coleam00#560) GitHub comment posting retries were based only on message substring checks, so structured Octokit HTTP errors (like status 502) could bypass retries and fail prematurely. Changes: - Added typed status extraction in GitHub adapter retry classification. - Retry transient HTTP statuses 429, 502, 503, and 504 before string fallback. - Added regression tests for structured 502 retry and structured 401 non-retry behavior. Fixes coleam00#502

coderabbitai Bot reviewed Aug 26, 2025

View reviewed changes

coderabbitai Bot reviewed Aug 27, 2025

View reviewed changes

Wirasm merged commit ccdd1ec into main Aug 27, 2025
12 checks passed

This was referenced Aug 28, 2025

refactor: Remove Socket.IO and implement HTTP polling architecture #514

Merged

feat: Complete Deep URL Navigation with Cross-Platform Copy Functionality #569

Closed

This was referenced Sep 4, 2025

feat: Deep URL Navigation with Cross-Platform Copy Functionality #573

Closed

feat(perf): Phase 1 performance & observability improvements #624

Closed

This was referenced Sep 12, 2025

feat: MCP server optimization with tool consolidation and vertical sl… #647

Merged

feat: decouple task priority from task order #652

Merged

Wirasm deleted the fix/token-optimization-list-endpoints branch April 6, 2026 07:38

Conversation

Wirasm commented Aug 26, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Summary

Changes Made

Type of Change

Affected Services

Testing

Test Evidence

Checklist

Breaking Changes

Additional Notes

Performance Impact

Monitoring Added

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

Wirasm commented Aug 26, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Wirasm commented Aug 26, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Aug 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)