Skip to content

Fix critical token consumption issue in list endpoints (#488)#502

Merged
Wirasm merged 2 commits intomainfrom
fix/token-optimization-list-endpoints
Aug 27, 2025
Merged

Fix critical token consumption issue in list endpoints (#488)#502
Wirasm merged 2 commits intomainfrom
fix/token-optimization-list-endpoints

Conversation

@Wirasm
Copy link
Copy Markdown
Collaborator

@Wirasm Wirasm commented Aug 26, 2025

Pull Request

Summary

This PR fixes the critical token consumption issue reported in #488 where the MCP server's list_projects endpoint was returning 180k+ tokens instead of <1k tokens, causing a 360x cost increase and making MCP unusable.

Changes Made

  • Added include_content parameter to ProjectService.list_projects() (default: True for backward compatibility)
  • Added exclude_large_fields parameter to TaskService.list_tasks() (default: False)
  • Added include_content parameter to DocumentService.list_documents() (default: False)
  • Updated all MCP tools to request lightweight responses by default
  • Fixed critical N+1 query problem in ProjectService (was making separate query per project)
  • Added response size monitoring and logging for validation
  • Added comprehensive unit and integration tests

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • Performance improvement

Affected Services

  • Server (FastAPI backend)
  • MCP Server (Model Context Protocol)
  • Frontend (React UI)
  • Agents (PydanticAI service)
  • Database (migrations/schema)
  • Docker/Infrastructure
  • Documentation site

Testing

  • All existing tests pass
  • Added new tests for new functionality
  • Manually tested affected user flows
  • Docker builds succeed for all services

Test Evidence

# Unit tests
uv run pytest tests/test_token_optimization.py -v
# Result: 8 passed

# Integration tests  
uv run python tests/test_token_optimization_integration.py
# Result: 99.3% token reduction for projects, 98.2% for tasks

# All API tests
uv run pytest tests/test_api_essentials.py -v
# Result: 10 passed

# MCP tools tested directly
# list_projects: Returns lightweight with stats
# list_tasks: Returns lightweight with counts  
# get_project/get_task: Still return full content

Checklist

  • My code follows the service architecture patterns
  • If using an AI coding assistant, I used the CLAUDE.md rules
  • I have added tests that prove my fix/feature works
  • All new and existing tests pass locally
  • My changes generate no new warnings
  • I have updated relevant documentation
  • I have verified no regressions in existing features

Breaking Changes

None - all changes are backward compatible through default parameter values.

Additional Notes

Performance Impact

  • Before: N+1 query problem (10 projects = 11 queries, 100 projects = 101 queries)
  • After: Single query regardless of project count
  • Token reduction: 99.3% for projects (27k → 194 tokens), 98.2% for tasks (12k → 226 tokens)

Monitoring Added

  • Response size logging for all list endpoints
  • Warnings logged for responses >10KB
  • Metrics tracked: size_bytes, include_content/exclude_large_fields, item count

Fixes #488

Summary by CodeRabbit

  • New Features

    • Added optional lightweight responses for projects, tasks, and documents via request flags; defaults preserve full-content behavior.
    • Tasks and projects can return stats/counts instead of large fields; documents can return metadata with content-size stats.
    • New flags to include archived items and exclude large task fields.
  • Performance

    • Reduced payload sizes when using lightweight modes.
    • Endpoints now log response sizes and warn on unusually large responses.
  • Tests

    • Added unit and integration tests validating lightweight modes, size reductions, and backward compatibility.

- Add include_content parameter to ProjectService.list_projects()
- Add exclude_large_fields parameter to TaskService.list_tasks()
- Add include_content parameter to DocumentService.list_documents()
- Update all MCP tools to use lightweight responses by default
- Fix critical N+1 query problem in ProjectService (was making separate query per project)
- Add response size monitoring and logging for validation
- Add comprehensive unit and integration tests

Results:
- Projects endpoint: 99.3% token reduction (27,055 -> 194 tokens)
- Tasks endpoint: 98.2% token reduction (12,750 -> 226 tokens)
- Documents endpoint: Returns metadata with content_size instead of full content
- Maintains full backward compatibility with default parameters
- Single query optimization eliminates N+1 performance issue
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 26, 2025

Walkthrough

Adds optional lightweight modes and size-logging across projects, tasks, and documents. Client tools call endpoints with query flags to request metadata-only responses; server services and API routes accept include_content/exclude_large_fields flags, conditionally omit large JSON fields, and log response sizes. New unit and integration tests verify behavior.

Changes

Cohort / File(s) Summary
MCP client tools: lightweight requests
python/src/mcp_server/features/projects/project_tools.py, python/src/mcp_server/features/documents/document_tools.py
Client GETs now include include_content=False query parameter to request lightweight responses; call semantics otherwise unchanged.
Server API routes: flags, size-logging, and error handling
python/src/server/api_routes/projects_api.py
Endpoints accept new flags: include_content (projects, project documents) and exclude_large_fields (tasks). Response sizes are computed/logged; warnings emitted for large payloads; additional exception logging added. Function signatures updated accordingly.
Project service: optional lightweight listing
python/src/server/services/projects/project_service.py
list_projects(include_content: bool = True) added. When False, returns metadata-only project objects with stats (docs_count, features_count, has_data) instead of full content.
Document service: metadata vs full content
python/src/server/services/projects/document_service.py
list_documents(project_id, include_content: bool = False) added. When False returns metadata entries with stats.content_size; when True returns full documents.
Task service: large-field exclusion and stats
python/src/server/services/projects/task_service.py
list_tasks(..., exclude_large_fields: bool = False, include_archived: bool = False) added. When exclude_large_fields=True omits large JSONB fields per-task and provides counts in a stats block; archived filtering controlled by include_archived.
Unit tests: token optimization
python/tests/test_token_optimization.py
New unit tests covering full vs lightweight responses for projects, tasks, documents; token-reduction assertions and default-argument compatibility checks.
Integration tests: endpoint size checks
python/tests/test_token_optimization_integration.py
New async integration tests that measure response sizes and estimated tokens for /api/projects, /api/tasks, and /api/projects/{id}/docs with various flags; includes server health check and reporting.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client (MCP)
  participant Projects API
  participant ProjectService
  participant DocumentService
  participant TaskService

  rect rgb(240,245,255)
    note over Client (MCP),Projects API: Projects listing with content toggle
    Client (MCP)->>Projects API: GET /api/projects?include_content=false
    Projects API->>ProjectService: list_projects(include_content=False)
    alt include_content=False
      ProjectService-->>Projects API: metadata + stats (no large fields)
    else include_content=True
      ProjectService-->>Projects API: full project content
    end
    Projects API-->>Client (MCP): JSON + size logged
  end

  rect rgb(245,255,240)
    note over Client (MCP),Projects API: Project documents with content toggle
    Client (MCP)->>Projects API: GET /api/projects/{id}/docs?include_content=false
    Projects API->>DocumentService: list_documents(include_content=False)
    DocumentService-->>Projects API: metadata (+stats.content_size)
    Projects API-->>Client (MCP): JSON + size logged
  end

  rect rgb(255,248,240)
    note over Client (MCP),Projects API: Tasks with large-field exclusion
    Client (MCP)->>Projects API: GET /api/tasks?exclude_large_fields=true
    Projects API->>TaskService: list_tasks(exclude_large_fields=True)
    TaskService-->>Projects API: tasks without large fields (+stats counts)
    Projects API-->>Client (MCP): JSON + size logged
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective Addressed Explanation
Make MCP function mcp__archon__list_projects return lightweight metadata by default [#488] Default remains full-content (server include_content defaults to True).
API default behavior for /api/projects should be lightweight (include_content defaults to False) [#488] list_projects signature default is include_content: bool = True.
Include summary stats in lightweight projects: docs_count, tasks_count, features_count [#488] docs_count and features_count (and has_data) are provided; tasks_count is not present.
Keep total response size ~500–1000 tokens for basic listing [#488] Response-size logging and warnings were added, but no code-enforced caps or automatic switching to smaller payloads are implemented.

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Add exclude_large_fields to tasks and per-task stats (python/src/server/services/projects/task_service.py; python/src/server/api_routes/projects_api.py) Task payload optimization is not part of the #488 objective which targets project listing behavior.
Add include_content flag to documents and stats.content_size (python/src/server/services/projects/document_service.py; python/src/server/api_routes/projects_api.py) Document listing optimization is unrelated to the specific goal of making project listing lightweight by default.
New integration tests for tasks/documents token optimization (python/tests/test_token_optimization_integration.py) Tests for endpoints beyond project-listing extend scope beyond the single-issue remediation requested in #488.

I nibbled bytes and trimmed the trees,
Turned heavy oaks to feathered leaves.
Projects now stroll light and spry,
Docs and tasks breathe easier, shy.
With saved-up tokens, I hop and cheer—🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/token-optimization-list-endpoints

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Wirasm
Copy link
Copy Markdown
Collaborator Author

Wirasm commented Aug 26, 2025

resolves: #488

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/src/server/services/projects/task_service.py (1)

287-312: Use DB-provided counts when excluding large fields

When exclude_large_fields is True, you no longer select sources/code_examples. Switch to using the derived counts.

                 task_data = {
                     "id": task["id"],
                     "project_id": task["project_id"],
                     "title": task["title"],
                     "description": task["description"],
                     "status": task["status"],
                     "assignee": task.get("assignee", "User"),
                     "task_order": task.get("task_order", 0),
                     "feature": task.get("feature"),
                     "created_at": task["created_at"],
                     "updated_at": task["updated_at"],
                 }
                 
                 if not exclude_large_fields:
                     # Include full JSONB fields
                     task_data["sources"] = task.get("sources", [])
                     task_data["code_examples"] = task.get("code_examples", [])
                 else:
                     # Add counts instead of full content
-                    task_data["stats"] = {
-                        "sources_count": len(task.get("sources", [])),
-                        "code_examples_count": len(task.get("code_examples", []))
-                    }
+                    task_data["stats"] = {
+                        "sources_count": task.get("sources_count", 0),
+                        "code_examples_count": task.get("code_examples_count", 0),
+                    }
                 
                 tasks.append(task_data)
🧹 Nitpick comments (16)
python/src/server/services/projects/document_service.py (1)

153-155: Preserve stack traces in logs

Log with exc_info=True per guidelines to keep full tracebacks.

-            logger.error(f"Error listing documents: {e}")
+            logger.error(f"Error listing documents: {e}", exc_info=True)
python/src/server/services/projects/project_service.py (2)

88-111: Optional: Narrow the SELECT even for full-content path to reduce DB egress

Selecting "*" pulls unused columns if any are added later. Being explicit avoids accidental payload bloat.

-                response = (
-                    self.supabase_client.table("archon_projects")
-                    .select("*")
-                    .order("created_at", desc=True)
-                    .execute()
-                )
+                response = (
+                    self.supabase_client.table("archon_projects")
+                    .select("id, title, github_repo, created_at, updated_at, pinned, description, docs, features, data")
+                    .order("created_at", desc=True)
+                    .execute()
+                )

146-148: Preserve stack traces in logs

-            logger.error(f"Error listing projects: {e}")
+            logger.error(f"Error listing projects: {e}", exc_info=True)
python/src/server/services/projects/task_service.py (1)

328-330: Preserve stack traces in logs

-            logger.error(f"Error listing tasks: {e}")
+            logger.error(f"Error listing tasks: {e}", exc_info=True)
python/src/mcp_server/features/projects/project_tools.py (1)

178-182: Passing include_content=False here is exactly what we need

This ensures lightweight project listings by default from the MCP side. Nice.

As a follow-up, consider also passing include_content=False in the create_project polling GET to reduce payload during polling. Example (outside this hunk):

list_response = await poll_client.get(urljoin(api_url, "/api/projects"), params={"include_content": False})
python/tests/test_token_optimization_integration.py (4)

6-10: Remove unused import and modernize typing imports

json is unused and Ruff will flag it (F401). Also prefer builtin dict[...]/tuple[...] over typing.Dict/Tuple on Python 3.12.

Apply:

 import httpx
-import json
 import asyncio
-from typing import Dict, Any, Tuple
+from typing import Any

131-133: Turn unexpected document endpoint failures into skips, not silent passes

When the server is absent or the API contract differs, mark the test as skipped to surface it in CI.

Apply:

-        except Exception as e:
-            print(f"\n⚠️  Could not test documents endpoint: {e}")
+        except Exception as e:
+            pytest.skip(f"Could not test documents endpoint: {e}")

149-152: Mark optional MCP health check as skipped when unavailable

This keeps CI output explicit without failing the suite.

Apply:

-        except httpx.ConnectError:
-            print("ℹ️  MCP server not running (optional for tests)")
+        except httpx.ConnectError:
+            pytest.skip("MCP server not running (optional for tests)")

155-189: Consider moving the script-style runner to tools/ or guard it behind an env flag

Having both pytest-style tests and a script entry point in the same file invites confusion. If you want to keep the CLI runner, gate it so it’s not used by default.

For example:

-if __name__ == "__main__":
-    asyncio.run(main())
+if __name__ == "__main__":
+    import os
+    if os.getenv("RUN_INTEGRATION_CLI", "0") == "1":
+        asyncio.run(main())
+    else:
+        print("Set RUN_INTEGRATION_CLI=1 to run this module as a script.")
python/src/server/api_routes/projects_api.py (5)

12-15: Remove unused import

sys is imported but never used; Ruff will flag it (F401).

-import sys

533-537: Add response size metrics to project task listing for consistency

Other endpoints now log size_bytes and warn >10KB. Mirror that here to detect regressions.

         logfire.info(
             f"Project tasks retrieved | project_id={project_id} | task_count={len(filtered_tasks)}"
         )
 
-        return filtered_tasks
+        # Monitor response size
+        response_json = json.dumps(filtered_tasks)
+        response_size = len(response_json)
+        logfire.info(
+            f"Project tasks listed | project_id={project_id} | count={len(filtered_tasks)} | size_bytes={response_size} | exclude_large_fields={exclude_large_fields}"
+        )
+        if response_size > 10_000:
+            logfire.warning(
+                f"Large project task response | project_id={project_id} | size_bytes={response_size} | count={len(filtered_tasks)}"
+            )
+
+        return filtered_tasks

593-596: Log exclude_large_fields in the request-summary line

It’s present in the success metrics but not in the initial request log.

-        logfire.info(
-            f"Listing tasks | status={status} | project_id={project_id} | include_closed={include_closed} | page={page} | per_page={per_page}"
-        )
+        logfire.info(
+            f"Listing tasks | status={status} | project_id={project_id} | include_closed={include_closed} | page={page} | per_page={per_page} | exclude_large_fields={exclude_large_fields}"
+        )

221-267: Health check should avoid heavy queries

projects_health currently calls list_projects() with default include_content=True, which may be large/slow and defeats the “fail fast” guideline. Use lightweight modes for both projects and tasks to reduce load during health checks.

-            project_service = ProjectService(supabase_client)
-            # Try to list projects with limit 1 to test table access
-            success, _ = project_service.list_projects()
+            project_service = ProjectService(supabase_client)
+            # Use lightweight listing for health check
+            success, _ = project_service.list_projects(include_content=False)
@@
-            task_service = TaskService(supabase_client)
-            # Try to list tasks with limit 1 to test table access
-            success, _ = task_service.list_tasks(include_closed=True)
+            task_service = TaskService(supabase_client)
+            # Use lightweight listing for health check
+            success, _ = task_service.list_tasks(include_closed=True, exclude_large_fields=True)

Also applies to: 232-246


847-875: Add size metrics and large-response warning on documents list

To align with the other endpoints and the PR objectives (monitor and alert on large responses), add size_bytes logging and a 10KB warning for documents.

         logfire.info(
             f"Documents listed successfully | project_id={project_id} | count={result.get('total_count', 0)} | lightweight={not include_content}"
         )
 
-        return result
+        # Monitor response size for optimization validation
+        response_json = json.dumps(result)
+        response_size = len(response_json)
+        logfire.info(
+            f"Documents response size | project_id={project_id} | size_bytes={response_size} | include_content={include_content}"
+        )
+        if response_size > 10_000:
+            logfire.warning(
+                f"Large documents response | project_id={project_id} | size_bytes={response_size}"
+            )
+
+        return result
python/tests/test_token_optimization.py (2)

163-201: Mock chain is brittle; prefer explicit method-return mocks

The mock_select.neq().or_.return_value = mock_or pattern works but is easy to break if call order changes. Consider assigning each intermediate return explicitly to keep intent clear.

Example:

# More explicit chaining
mock_after_neq = Mock()
mock_select.neq.return_value = mock_after_neq
mock_after_neq.or_.return_value = mock_or

This keeps failures localized if the query pipeline evolves.


122-158: Token reduction test is solid but can assert minimum target explicitly

You already assert >95% which matches PR goals. Consider parametrizing the threshold via an env var so CI can relax/tighten without code change.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1b030 and f9d245b.

📒 Files selected for processing (8)
  • python/src/mcp_server/features/documents/document_tools.py (1 hunks)
  • python/src/mcp_server/features/projects/project_tools.py (1 hunks)
  • python/src/server/api_routes/projects_api.py (8 hunks)
  • python/src/server/services/projects/document_service.py (2 hunks)
  • python/src/server/services/projects/project_service.py (1 hunks)
  • python/src/server/services/projects/task_service.py (3 hunks)
  • python/tests/test_token_optimization.py (1 hunks)
  • python/tests/test_token_optimization_integration.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting and Mypy for type checking before commit

Files:

  • python/src/mcp_server/features/projects/project_tools.py
  • python/src/mcp_server/features/documents/document_tools.py
  • python/src/server/services/projects/task_service.py
  • python/src/server/services/projects/document_service.py
  • python/tests/test_token_optimization_integration.py
  • python/tests/test_token_optimization.py
  • python/src/server/services/projects/project_service.py
  • python/src/server/api_routes/projects_api.py
{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}

📄 CodeRabbit inference engine (CLAUDE.md)

{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}: Remove dead code immediately; do not keep legacy/unused functions
Avoid comments that reference change history (e.g., LEGACY, CHANGED, REMOVED); keep comments focused on current functionality

Files:

  • python/src/mcp_server/features/projects/project_tools.py
  • python/src/mcp_server/features/documents/document_tools.py
  • python/src/server/services/projects/task_service.py
  • python/src/server/services/projects/document_service.py
  • python/tests/test_token_optimization_integration.py
  • python/tests/test_token_optimization.py
  • python/src/server/services/projects/project_service.py
  • python/src/server/api_routes/projects_api.py
python/src/{server,mcp,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/{server,mcp,agents}/**/*.py: Fail fast on service startup failures, missing configuration, database connection issues, auth failures, critical dependency outages, and invalid data that would corrupt state
External API calls should use retry with exponential backoff and ultimately fail with a clear, contextual error message
Error messages must include context (operation being attempted) and relevant IDs/URLs/data for debugging
Preserve full stack traces in logs (e.g., Python logging with exc_info=True)
Use specific exception types; avoid catching broad Exception unless re-raising with context
Never signal failure by returning None/null; raise a descriptive exception instead

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/services/projects/document_service.py
  • python/src/server/services/projects/project_service.py
  • python/src/server/api_routes/projects_api.py
python/src/{server/services,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely instead of persisting bad data

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/services/projects/document_service.py
  • python/src/server/services/projects/project_service.py
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: For batch processing and background tasks, continue processing but log detailed per-item failures and return both successes and failures
Do not crash the server on a single WebSocket event failure; log the error and continue serving other clients

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/services/projects/document_service.py
  • python/src/server/services/projects/project_service.py
  • python/src/server/api_routes/projects_api.py
python/src/server/**

📄 CodeRabbit inference engine (CLAUDE.md)

Keep FastAPI application code under python/src/server/ (routes in api_routes/, services in services/, main in main.py)

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/services/projects/document_service.py
  • python/src/server/services/projects/project_service.py
  • python/src/server/api_routes/projects_api.py
python/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place backend tests under python/tests/

Files:

  • python/tests/test_token_optimization_integration.py
  • python/tests/test_token_optimization.py
🧬 Code graph analysis (4)
python/src/server/services/projects/document_service.py (1)
python/src/mcp_server/features/documents/document_tools.py (1)
  • list_documents (129-167)
python/tests/test_token_optimization.py (6)
python/src/server/services/projects/project_service.py (2)
  • ProjectService (20-365)
  • list_projects (76-148)
python/src/server/services/projects/task_service.py (2)
  • TaskService (57-492)
  • list_tasks (188-330)
python/src/server/services/projects/document_service.py (2)
  • DocumentService (21-356)
  • list_documents (99-155)
python/src/mcp_server/features/projects/project_tools.py (1)
  • list_projects (163-198)
python/src/server/api_routes/projects_api.py (2)
  • list_projects (79-129)
  • list_tasks (583-658)
python/src/mcp_server/features/documents/document_tools.py (1)
  • list_documents (129-167)
python/src/server/services/projects/project_service.py (2)
python/src/mcp_server/features/projects/project_tools.py (1)
  • list_projects (163-198)
python/src/server/api_routes/projects_api.py (1)
  • list_projects (79-129)
python/src/server/api_routes/projects_api.py (5)
python/src/mcp_server/features/projects/project_tools.py (1)
  • list_projects (163-198)
python/src/server/services/projects/project_service.py (1)
  • list_projects (76-148)
python/src/server/services/projects/task_service.py (2)
  • TaskService (57-492)
  • list_tasks (188-330)
python/src/server/services/projects/document_service.py (1)
  • list_documents (99-155)
python/src/mcp_server/features/documents/document_tools.py (1)
  • list_documents (129-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (7)
python/src/server/services/projects/document_service.py (1)

99-107: Good addition: include_content toggle (default False) aligns with token-optimization goals

The new parameter and docstring look correct and keep the default lightweight. No functional concerns here.

python/src/server/services/projects/project_service.py (2)

76-83: include_content flag and docs are clear; default maintains backward compatibility

Signature and docstring match the PR intent. Good separation of full-content vs lightweight behavior.


112-143: Implement tasks_count via a single aggregated PostgREST query

Please extend the summary‐stats to include tasks_count without introducing N+1 queries. For example, in python/src/server/services/projects/project_service.py (lines 112–143):

  • Add a grouped count request on archon_tasks:

    # Fetch per‐project task counts in one round-trip
    tasks_response = (
        self.supabase_client
            .table("archon_tasks")
            .select("project_id, tasks_count:count(id)", count="exact")
            .group("project_id")
            .execute()
    )
    
    # Build a lookup of project_id → tasks_count
    tasks_count_map = {
        row["project_id"]: row["tasks_count"] for row in tasks_response.data
    }
  • Merge into your existing loop:

     for project in response.data:
         docs_count = len(project.get("docs", []))
         features_count = len(project.get("features", []))
         has_data = bool(project.get("data", []))
  •  tasks_count = tasks_count_map.get(project["id"], 0)
    
     projects.append({
         "id": project["id"],
         "title": project["title"],
         …
         "stats": {
             "docs_count": docs_count,
    
  •          "tasks_count": tasks_count,
             "features_count": features_count,
             "has_data": has_data
         }
     })
    
    
    

Let me know if you’d like a draft SQL-RPC fallback or if you need help wiring the Python client call.

python/src/server/services/projects/task_service.py (1)

188-194: New exclude_large_fields parameter is appropriate and default-safe

Signature and docstring are clear; default False preserves backward compatibility.

python/src/mcp_server/features/documents/document_tools.py (1)

147-151: Good: documents list now requests lightweight payloads by default

Aligns the tool with the server-side toggle and reduces token use significantly.

python/tests/test_token_optimization_integration.py (1)

6-10: The verification script is running to:

  • Confirm whether async def tests exist in python/tests/test_token_optimization_integration.py
  • Locate any pytest config (pytest.ini, setup.cfg, or pyproject.toml)
  • Inspect those configs for asyncio_mode=auto or pytest-asyncio settings

I’ll integrate the findings into the review once available.

python/tests/test_token_optimization.py (1)

335-353: Good backward-compat coverage of defaults

Asserting default values on the service APIs is helpful to prevent accidental breaking changes to API behavior. Nice touch.

Comment on lines +504 to 517
async def list_project_tasks(project_id: str, include_archived: bool = False, exclude_large_fields: bool = False):
"""List all tasks for a specific project. By default, filters out archived tasks."""
try:
logfire.info(
f"Listing project tasks | project_id={project_id} | include_archived={include_archived}"
f"Listing project tasks | project_id={project_id} | include_archived={include_archived} | exclude_large_fields={exclude_large_fields}"
)

# Use TaskService to list tasks
task_service = TaskService()
success, result = task_service.list_tasks(
project_id=project_id,
include_closed=True, # Get all tasks, we'll filter archived separately
exclude_large_fields=exclude_large_fields,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

include_archived flag is ineffective; archived tasks are always excluded and 'archived' isn’t in task payload

TaskService currently applies or_("archived.is.null,archived.is.false") unconditionally and the task DTO does not include an archived field. As a result, include_archived here has no effect and your Python-side filter never triggers. This is a functional bug for consumers expecting archived tasks when include_archived=True.

Recommended changes:

  1. Thread include_archived down to TaskService and honor it in the query.
  2. Include archived in the task DTO so API-side filters (and clients) can reason about it.

Change in this file (pass the flag):

 success, result = task_service.list_tasks(
     project_id=project_id,
     include_closed=True,  # Get all tasks, we'll filter archived separately
-    exclude_large_fields=exclude_large_fields,
+    exclude_large_fields=exclude_large_fields,
+    include_archived=include_archived,
 )

And update the filter block to rely on the presence of archived:

-            if not include_archived and task.get("archived", False):
+            if not include_archived and task.get("archived") is True:
                 continue

Additionally, apply the following changes outside this file (TaskService) to complete the fix:

# python/src/server/services/projects/task_service.py (signature + query + DTO)
def list_tasks(
    self,
    project_id: str | None = None,
    status: str | None = None,
    include_closed: bool = False,
    exclude_large_fields: bool = False,
    include_archived: bool = False,   # NEW
) -> tuple[bool, dict[str, Any]]:
    ...
    # Only exclude archived when requested
    if not include_archived:
        query = query.or_("archived.is.null,archived.is.false")
        filters_applied.append("exclude archived tasks (null or false)")
    else:
        filters_applied.append("include archived tasks")

    ...
    task_data = {
        "id": task["id"],
        "project_id": task["project_id"],
        "title": task["title"],
        "description": task["description"],
        "status": task["status"],
        "assignee": task.get("assignee", "User"),
        "task_order": task.get("task_order", 0),
        "feature": task.get("feature"),
        "created_at": task["created_at"],
        "updated_at": task["updated_at"],
        "archived": task.get("archived", False),   # NEW
    }

Please confirm if you want me to push a follow-up patch touching TaskService as above.

Also applies to: 528-537

🤖 Prompt for AI Agents
In python/src/server/api_routes/projects_api.py around lines 504-517 (and also
apply the same change at lines 528-537), the include_archived parameter is not
passed into TaskService so archived tasks are always excluded and the task DTO
lacks an archived field; thread include_archived into the TaskService.list_tasks
call here and in the other block, e.g. task_service.list_tasks(...,
include_archived=include_archived), and then update TaskService
(python/src/server/services/projects/task_service.py) to accept
include_archived, conditionally apply the archived filter only when
include_archived is False, and include archived in the returned task DTO
(archived: task.get("archived", False)) so API-side filtering and clients can
see archived status.

Comment on lines +124 to 146
# Format documents for response
documents = []
for doc in docs:
documents.append({
"id": doc.get("id"),
"document_type": doc.get("document_type"),
"title": doc.get("title"),
"status": doc.get("status"),
"version": doc.get("version"),
"tags": doc.get("tags", []),
"author": doc.get("author"),
"created_at": doc.get("created_at"),
"updated_at": doc.get("updated_at"),
})
if include_content:
# Return full document
documents.append(doc)
else:
# Return metadata only
documents.append({
"id": doc.get("id"),
"document_type": doc.get("document_type"),
"title": doc.get("title"),
"status": doc.get("status"),
"version": doc.get("version"),
"tags": doc.get("tags", []),
"author": doc.get("author"),
"created_at": doc.get("created_at"),
"updated_at": doc.get("updated_at"),
"stats": {
"content_size": len(str(doc.get("content", {})))
}
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Report size in bytes and align stat naming; avoid len(str(...)) for accuracy and consistency

  • Using len(str(content)) is misleading (Python repr, not JSON bytes) and inconsistent with the PR’s “size_bytes” metric elsewhere.
  • Prefer JSON-encoded byte length and standardize the field to stats.size_bytes.
  • Minor: created_at/updated_at can be None for many docs (add_document doesn’t set created_at); metadata will show nulls.

Apply this focused change to compute accurate size and rename the stat:

-                    documents.append({
+                    documents.append({
                         "id": doc.get("id"),
                         "document_type": doc.get("document_type"),
                         "title": doc.get("title"),
                         "status": doc.get("status"),
                         "version": doc.get("version"),
                         "tags": doc.get("tags", []),
                         "author": doc.get("author"),
                         "created_at": doc.get("created_at"),
                         "updated_at": doc.get("updated_at"),
                         "stats": {
-                            "content_size": len(str(doc.get("content", {})))
+                            "size_bytes": len(json.dumps(doc.get("content", {}), separators=(',', ':'), ensure_ascii=False).encode('utf-8'))
                         }
                     })

Outside this hunk, add the missing import at top-level:

import json

Optionally (outside this hunk), consider setting timestamps at creation to make metadata complete:

# in add_document() when building new_doc
"created_at": datetime.now().isoformat(),
"updated_at": datetime.now().isoformat(),
🤖 Prompt for AI Agents
In python/src/server/services/projects/document_service.py around lines 124 to
146, the stats currently use len(str(doc.get("content", {}))) and the key is
"content_size"; change this to compute the JSON-encoded byte length and rename
the stat to "size_bytes" for accuracy and consistency (e.g. size_bytes =
len(json.dumps(doc.get("content", "")).encode("utf-8")) and set stats:
{"size_bytes": size_bytes}). Also add import json at the module top-level
outside this hunk. Optionally consider populating created_at/updated_at in
add_document() if you want non-null timestamps.

Comment on lines +112 to 143
# Lightweight response for MCP - fetch all data but only return metadata + stats
# FIXED: N+1 query problem - now using single query
response = (
self.supabase_client.table("archon_projects")
.select("*") # Fetch all fields in single query
.order("created_at", desc=True)
.execute()
)

projects = []
for project in response.data:
# Calculate counts from fetched data (no additional queries)
docs_count = len(project.get("docs", []))
features_count = len(project.get("features", []))
has_data = bool(project.get("data", []))

# Return only metadata + stats, excluding large JSONB fields
projects.append({
"id": project["id"],
"title": project["title"],
"github_repo": project.get("github_repo"),
"created_at": project["created_at"],
"updated_at": project["updated_at"],
"pinned": project.get("pinned", False),
"description": project.get("description", ""),
"stats": {
"docs_count": docs_count,
"features_count": features_count,
"has_data": has_data
}
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Lightweight path still fetches heavy JSON; push counts to the DB and omit large fields from SELECT

Currently .select("*") fetches docs/features/data in full, then discards them. That saves client tokens but not DB egress/CPU. Compute counts server-side and select only needed columns.

-                response = (
-                    self.supabase_client.table("archon_projects")
-                    .select("*")  # Fetch all fields in single query
-                    .order("created_at", desc=True)
-                    .execute()
-                )
+                response = (
+                    self.supabase_client.table("archon_projects")
+                    .select(
+                        "id, title, github_repo, created_at, updated_at, pinned, description, "
+                        "docs_count:jsonb_array_length(docs), "
+                        "features_count:jsonb_array_length(features), "
+                        "data_count:jsonb_array_length(data)"
+                    )
+                    .order("created_at", desc=True)
+                    .execute()
+                )
@@
-                for project in response.data:
-                    # Calculate counts from fetched data (no additional queries)
-                    docs_count = len(project.get("docs", []))
-                    features_count = len(project.get("features", []))
-                    has_data = bool(project.get("data", []))
+                for project in response.data:
+                    docs_count = project.get("docs_count", 0)
+                    features_count = project.get("features_count", 0)
+                    has_data = (project.get("data_count", 0) > 0)

Note: PR objectives require tasks_count in the summary stats. It’s currently missing. Consider a single aggregated query (no N+1) to fetch per-project task counts (e.g., group by project_id or an RPC), then merge into the loop.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Lightweight response for MCP - fetch all data but only return metadata + stats
# FIXED: N+1 query problem - now using single query
response = (
self.supabase_client.table("archon_projects")
.select("*") # Fetch all fields in single query
.order("created_at", desc=True)
.execute()
)
projects = []
for project in response.data:
# Calculate counts from fetched data (no additional queries)
docs_count = len(project.get("docs", []))
features_count = len(project.get("features", []))
has_data = bool(project.get("data", []))
# Return only metadata + stats, excluding large JSONB fields
projects.append({
"id": project["id"],
"title": project["title"],
"github_repo": project.get("github_repo"),
"created_at": project["created_at"],
"updated_at": project["updated_at"],
"pinned": project.get("pinned", False),
"description": project.get("description", ""),
"stats": {
"docs_count": docs_count,
"features_count": features_count,
"has_data": has_data
}
})
# Lightweight response for MCP - fetch only metadata + stats via DB-side counts
# FIXED: N+1 query problem - now using single query
response = (
self.supabase_client.table("archon_projects")
.select(
"id, title, github_repo, created_at, updated_at, pinned, description, "
"docs_count:jsonb_array_length(docs), "
"features_count:jsonb_array_length(features), "
"data_count:jsonb_array_length(data)"
)
.order("created_at", desc=True)
.execute()
)
projects = []
for project in response.data:
docs_count = project.get("docs_count", 0)
features_count = project.get("features_count", 0)
has_data = (project.get("data_count", 0) > 0)
# Return only metadata + stats, excluding large JSONB fields
projects.append({
"id": project["id"],
"title": project["title"],
"github_repo": project.get("github_repo"),
"created_at": project["created_at"],
"updated_at": project["updated_at"],
"pinned": project.get("pinned", False),
"description": project.get("description", ""),
"stats": {
"docs_count": docs_count,
"features_count": features_count,
"has_data": has_data
}
})
🤖 Prompt for AI Agents
python/src/server/services/projects/project_service.py lines 112-143: the
current lightweight path uses .select("*") which pulls large JSONB fields (docs,
features, data) and only uses their counts—change the implementation to SELECT
only required scalar columns (id, title, github_repo, created_at, updated_at,
pinned, description) and remove large fields from the response, and push the
counts to the DB by issuing a single aggregated query (GROUP BY project_id) that
returns docs_count, features_count and tasks_count for each project (or use an
RPC/aggregation) and then merge those aggregated counts into the projects loop
to build the final metadata+stats objects so there are no large JSONB fields
fetched and tasks_count is included.

Comment on lines +209 to 219
if exclude_large_fields:
# Select all fields except large JSONB ones
query = self.supabase_client.table("archon_tasks").select(
"id, project_id, parent_task_id, title, description, "
"status, assignee, task_order, feature, archived, "
"archived_at, archived_by, created_at, updated_at, "
"sources, code_examples" # Still fetch for counting, but will process differently
)
else:
query = self.supabase_client.table("archon_tasks").select("*")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Don’t fetch large JSON when exclude_large_fields=True; compute counts in the DB

Right now you still select sources and code_examples “for counting,” which defeats the goal of minimizing payload between DB and server.

Refactor to select only lightweight columns plus derived counts:

-            if exclude_large_fields:
-                # Select all fields except large JSONB ones
-                query = self.supabase_client.table("archon_tasks").select(
-                    "id, project_id, parent_task_id, title, description, "
-                    "status, assignee, task_order, feature, archived, "
-                    "archived_at, archived_by, created_at, updated_at, "
-                    "sources, code_examples"  # Still fetch for counting, but will process differently
-                )
+            if exclude_large_fields:
+                # Select only lightweight columns and compute counts server-side
+                query = self.supabase_client.table("archon_tasks").select(
+                    "id, project_id, parent_task_id, title, description, "
+                    "status, assignee, task_order, feature, archived, "
+                    "archived_at, archived_by, created_at, updated_at, "
+                    "sources_count:jsonb_array_length(sources), "
+                    "code_examples_count:jsonb_array_length(code_examples)"
+                )
             else:
                 query = self.supabase_client.table("archon_tasks").select("*")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if exclude_large_fields:
# Select all fields except large JSONB ones
query = self.supabase_client.table("archon_tasks").select(
"id, project_id, parent_task_id, title, description, "
"status, assignee, task_order, feature, archived, "
"archived_at, archived_by, created_at, updated_at, "
"sources, code_examples" # Still fetch for counting, but will process differently
)
else:
query = self.supabase_client.table("archon_tasks").select("*")
if exclude_large_fields:
# Select only lightweight columns and compute counts server-side
query = self.supabase_client.table("archon_tasks").select(
"id, project_id, parent_task_id, title, description, "
"status, assignee, task_order, feature, archived, "
"archived_at, archived_by, created_at, updated_at, "
"sources_count:jsonb_array_length(sources), "
"code_examples_count:jsonb_array_length(code_examples)"
)
else:
query = self.supabase_client.table("archon_tasks").select("*")

Comment thread python/tests/test_token_optimization_integration.py Outdated
Comment thread python/tests/test_token_optimization_integration.py
Comment thread python/tests/test_token_optimization_integration.py
- Add include_archived parameter to TaskService.list_tasks()
- Service now conditionally applies archived filter based on parameter
- Add 'archived' field to task DTO for client visibility
- Update API endpoints to pass include_archived down to service
- Remove redundant client-side filtering in API layer
- Fix type hints in integration tests (dict[str, Any] | None)
- Use pytest.skip() instead of return for proper test reporting

These fixes address the functional bug identified by CodeRabbit where
archived tasks couldn't be retrieved even when explicitly requested.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
python/src/server/services/projects/task_service.py (2)

244-250: Archived filtering honored correctly

Conditional archived filter fixes earlier bug. Good.


211-220: Don’t fetch large JSON when exclude_large_fields=True; compute counts in the DB

Selecting sources and code_examples defeats the goal of minimizing payload. Ask PostgREST to compute counts and omit the arrays.

Apply:

-            if exclude_large_fields:
-                # Select all fields except large JSONB ones
-                query = self.supabase_client.table("archon_tasks").select(
-                    "id, project_id, parent_task_id, title, description, "
-                    "status, assignee, task_order, feature, archived, "
-                    "archived_at, archived_by, created_at, updated_at, "
-                    "sources, code_examples"  # Still fetch for counting, but will process differently
-                )
+            if exclude_large_fields:
+                # Select only lightweight columns and compute counts server-side
+                query = self.supabase_client.table("archon_tasks").select(
+                    "id, project_id, parent_task_id, title, description, "
+                    "status, assignee, task_order, feature, archived, "
+                    "archived_at, archived_by, created_at, updated_at, "
+                    "sources_count:jsonb_array_length(coalesce(sources, '[]'::jsonb)), "
+                    "code_examples_count:jsonb_array_length(coalesce(code_examples, '[]'::jsonb))"
+                )
🧹 Nitpick comments (4)
python/src/server/api_routes/projects_api.py (4)

12-15: Remove unused import

sys is not used.

 import asyncio
-import json
+import json
 import secrets
-import sys
 from typing import Any

127-129: Preserve stack traces in error logs

Include exc_info to comply with logging guidelines.

-        logfire.error(f"Failed to list projects | error={str(e)}")
+        logfire.error(f"Failed to list projects | error={str(e)}", exc_info=True)

504-529: Add response size metrics for list_project_tasks (consistency + visibility)

Mirror the size logging/warning used elsewhere to catch regressions on this hot path.

-        logfire.info(
-            f"Project tasks retrieved | project_id={project_id} | task_count={len(tasks)}"
-        )
-
-        return tasks
+        # Measure payload size for monitoring
+        response_json = json.dumps(tasks)
+        response_size = len(response_json)
+        logfire.info(
+            f"Project tasks listed | project_id={project_id} | task_count={len(tasks)} | "
+            f"size_bytes={response_size} | include_archived={include_archived} | "
+            f"exclude_large_fields={exclude_large_fields}"
+        )
+        if response_size > 10000:
+            logfire.warning(
+                f"Large project-task response | project_id={project_id} | "
+                f"size_bytes={response_size} | task_count={len(tasks)} | "
+                f"exclude_large_fields={exclude_large_fields}"
+            )
+
+        return tasks

839-865: Add response size metrics for documents listing

Track bytes and warn on large results, similar to projects/tasks.

         logfire.info(
             f"Documents listed successfully | project_id={project_id} | count={result.get('total_count', 0)} | lightweight={not include_content}"
         )
 
-        return result
+        # Monitor response size
+        response_json = json.dumps(result)
+        response_size = len(response_json)
+        logfire.info(
+            f"Document list payload | project_id={project_id} | size_bytes={response_size} | "
+            f"include_content={include_content}"
+        )
+        if response_size > 10000:
+            logfire.warning(
+                f"Large document response | project_id={project_id} | size_bytes={response_size} | "
+                f"include_content={include_content} | count={result.get('total_count', 0)}"
+            )
+
+        return result
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f9d245b and c19e85f.

📒 Files selected for processing (3)
  • python/src/server/api_routes/projects_api.py (8 hunks)
  • python/src/server/services/projects/task_service.py (4 hunks)
  • python/tests/test_token_optimization_integration.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/tests/test_token_optimization_integration.py
🧰 Additional context used
📓 Path-based instructions (6)
python/src/{server,mcp,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/{server,mcp,agents}/**/*.py: Fail fast on service startup failures, missing configuration, database connection issues, auth failures, critical dependency outages, and invalid data that would corrupt state
External API calls should use retry with exponential backoff and ultimately fail with a clear, contextual error message
Error messages must include context (operation being attempted) and relevant IDs/URLs/data for debugging
Preserve full stack traces in logs (e.g., Python logging with exc_info=True)
Use specific exception types; avoid catching broad Exception unless re-raising with context
Never signal failure by returning None/null; raise a descriptive exception instead

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/api_routes/projects_api.py
python/src/{server/services,agents}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely instead of persisting bad data

Files:

  • python/src/server/services/projects/task_service.py
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: For batch processing and background tasks, continue processing but log detailed per-item failures and return both successes and failures
Do not crash the server on a single WebSocket event failure; log the error and continue serving other clients

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/api_routes/projects_api.py
python/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/**/*.py: Target Python 3.12 with a 120-character line length
Use Ruff for linting and Mypy for type checking before commit

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/api_routes/projects_api.py
{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}

📄 CodeRabbit inference engine (CLAUDE.md)

{python/**/*.py,archon-ui-main/src/**/*.{ts,tsx,js,jsx}}: Remove dead code immediately; do not keep legacy/unused functions
Avoid comments that reference change history (e.g., LEGACY, CHANGED, REMOVED); keep comments focused on current functionality

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/api_routes/projects_api.py
python/src/server/**

📄 CodeRabbit inference engine (CLAUDE.md)

Keep FastAPI application code under python/src/server/ (routes in api_routes/, services in services/, main in main.py)

Files:

  • python/src/server/services/projects/task_service.py
  • python/src/server/api_routes/projects_api.py
🧬 Code graph analysis (1)
python/src/server/api_routes/projects_api.py (4)
python/src/mcp_server/features/projects/project_tools.py (1)
  • list_projects (163-198)
python/src/server/services/projects/project_service.py (1)
  • list_projects (76-148)
python/src/server/services/projects/task_service.py (1)
  • list_tasks (188-336)
python/src/server/services/projects/document_service.py (1)
  • list_documents (99-155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (3)
python/src/server/services/projects/task_service.py (1)

189-205: Signature + docstrings for lightweight/archived flags look good

Parameters are well-scoped and documented. Backward compatibility maintained.

python/src/server/api_routes/projects_api.py (2)

78-122: Projects endpoint: include_content toggle + size logging LGTM

Clear behavior and useful metrics. Backward compatibility maintained by default True.


616-645: Tasks endpoint: pagination + size logging LGTM

Nice size metrics with warning threshold; aligns with PR goals.

Comment on lines +306 to +317
if not exclude_large_fields:
# Include full JSONB fields
task_data["sources"] = task.get("sources", [])
task_data["code_examples"] = task.get("code_examples", [])
else:
# Add counts instead of full content
task_data["stats"] = {
"sources_count": len(task.get("sources", [])),
"code_examples_count": len(task.get("code_examples", []))
}

tasks.append(task_data)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the DB-provided counts; avoid len() on arrays

After the select above, rely on sources_count/code_examples_count so arrays never cross the wire.

Apply:

-                else:
-                    # Add counts instead of full content
-                    task_data["stats"] = {
-                        "sources_count": len(task.get("sources", [])),
-                        "code_examples_count": len(task.get("code_examples", []))
-                    }
+                else:
+                    # Add counts instead of full content (computed in SQL)
+                    task_data["stats"] = {
+                        "sources_count": task.get("sources_count", 0),
+                        "code_examples_count": task.get("code_examples_count", 0),
+                    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not exclude_large_fields:
# Include full JSONB fields
task_data["sources"] = task.get("sources", [])
task_data["code_examples"] = task.get("code_examples", [])
else:
# Add counts instead of full content
task_data["stats"] = {
"sources_count": len(task.get("sources", [])),
"code_examples_count": len(task.get("code_examples", []))
}
tasks.append(task_data)
if not exclude_large_fields:
# Include full JSONB fields
task_data["sources"] = task.get("sources", [])
task_data["code_examples"] = task.get("code_examples", [])
else:
# Add counts instead of full content (computed in SQL)
task_data["stats"] = {
"sources_count": task.get("sources_count", 0),
"code_examples_count": task.get("code_examples_count", 0),
}
tasks.append(task_data)
🤖 Prompt for AI Agents
In python/src/server/services/projects/task_service.py around lines 306 to 317,
the code currently computes counts with len(task.get("sources", [])) and
len(task.get("code_examples", [])) which defeats the goal of using DB-provided
counts; change the branch that adds stats to use the counts already selected by
the query (task.get("sources_count") and task.get("code_examples_count")) and
fall back to 0 if those keys are missing, leaving the include-full-fields branch
unchanged so full JSONB arrays are only sent when exclude_large_fields is False.

@Wirasm Wirasm merged commit ccdd1ec into main Aug 27, 2025
12 checks passed
POWERFULMOVES pushed a commit to POWERFULMOVES/PMOVES-Archon that referenced this pull request Feb 12, 2026
…leam00#502)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.21.0 to 0.21.1.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](anchore/sbom-action@v0.21.0...v0.21.1)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-version: 0.21.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@Wirasm Wirasm deleted the fix/token-optimization-list-endpoints branch April 6, 2026 07:38
coleam00 pushed a commit that referenced this pull request Apr 7, 2026
GitHub comment posting retries were based only on message substring checks, so structured Octokit HTTP errors (like status 502) could bypass retries and fail prematurely.

Changes:
- Added typed status extraction in GitHub adapter retry classification.
- Retry transient HTTP statuses 429, 502, 503, and 504 before string fallback.
- Added regression tests for structured 502 retry and structured 401 non-retry behavior.

Fixes #502
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
…coleam00#560)

GitHub comment posting retries were based only on message substring checks, so structured Octokit HTTP errors (like status 502) could bypass retries and fail prematurely.

Changes:
- Added typed status extraction in GitHub adapter retry classification.
- Retry transient HTTP statuses 429, 502, 503, and 504 before string fallback.
- Added regression tests for structured 502 retry and structured 401 non-retry behavior.

Fixes coleam00#502
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
…coleam00#560)

GitHub comment posting retries were based only on message substring checks, so structured Octokit HTTP errors (like status 502) could bypass retries and fail prematurely.

Changes:
- Added typed status extraction in GitHub adapter retry classification.
- Retry transient HTTP statuses 429, 502, 503, and 504 before string fallback.
- Added regression tests for structured 502 retry and structured 401 non-retry behavior.

Fixes coleam00#502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Critical Bug: list_projects Returns Full Project Content (180k+ Tokens)

1 participant