feat: agent work orders - user-selectable 6-command workflow architecture#807
feat: agent work orders - user-selectable 6-command workflow architecture#807
Conversation
Completes the implementation of test/review workflows with automatic resolution and integrates them into the orchestrator. **Phase 3: Test Workflow with Resolution** - Created test_workflow.py with automatic test failure resolution - Implements retry loop with max 4 attempts (configurable via MAX_TEST_RETRY_ATTEMPTS) - Parses JSON test results and resolves failures one by one - Uses existing test.md and resolve_failed_test.md commands - Added run_tests() and resolve_test_failure() to workflow_operations.py **Phase 4: Review Workflow with Resolution** - Created review_workflow.py with automatic blocker issue resolution - Implements retry loop with max 3 attempts (configurable via MAX_REVIEW_RETRY_ATTEMPTS) - Categorizes issues by severity (blocker/tech_debt/skippable) - Only blocks on blocker issues - tech_debt and skippable allowed to pass - Created review_runner.md and resolve_failed_review.md commands - Added run_review() and resolve_review_issue() to workflow_operations.py - Supports screenshot capture for UI review (configurable via ENABLE_SCREENSHOT_CAPTURE) **Phase 5: Compositional Integration** - Updated workflow_orchestrator.py to integrate test and review phases - Test phase runs between commit and PR creation (if ENABLE_TEST_PHASE=true) - Review phase runs after tests (if ENABLE_REVIEW_PHASE=true) - Both phases are optional and controlled by config flags - Step history tracks test and review execution results - Proper error handling and logging for all phases **Supporting Changes** - Updated agent_names.py to add REVIEWER constant - Added configuration flags to config.py for test/review phases - All new code follows structured logging patterns - Maintains compatibility with existing workflow steps **Files Changed**: 19 files, 3035+ lines - New: test_workflow.py, review_workflow.py, review commands - Modified: orchestrator, workflow_operations, agent_names, config - Phases 1-2 files (worktree, state, port allocation) also staged The implementation is complete and ready for testing. All phases now support parallel execution via worktree isolation with deterministic port allocation.
Simplifies the workflow orchestrator from hardcoded 11-step atomic operations to user-selectable 6-command workflow with context passing. Core changes: - WorkflowStep enum: 11 steps → 6 commands (create-branch, planning, execute, commit, create-pr, prp-review) - workflow_orchestrator.py: 367 lines → 200 lines with command stitching loop - Remove workflow_type field, add selected_commands parameter - Simplify agent names from 11 → 6 constants - Remove test/review phase config flags (now optional commands) Deletions: - Remove test_workflow.py, review_workflow.py, workflow_phase_tracker.py - Remove 32 old command files from .claude/commands - Remove PRPs/specs and PRD files from version control - Update .gitignore to exclude specs, features, and validation markdown files Breaking changes: - AgentWorkOrder no longer has workflow_type field - CreateAgentWorkOrderRequest now uses selected_commands instead of workflow_type - WorkflowStep enum values incompatible with old step history 56 files changed, 625 insertions(+), 15,007 deletions(-)
WalkthroughAdds a new Agent Work Orders microservice and frontend: FastAPI service, CLI executor, sandbox managers, workflow orchestrator, GitHub integration, state repositories (file/supabase/in-memory), SSE log streaming, command loader, utilities, Docker/dev integration, docs, and extensive backend/frontend tests and UI components for managing work orders. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Frontend (User)
participant API as FastAPI Proxy / Server
participant Repo as State Repository
participant Orch as WorkflowOrchestrator (background)
participant Sandbox as Agent Sandbox
participant Exec as AgentCLIExecutor
participant GH as GitHubClient
UI->>API: POST /api/agent-work-orders (create)
API->>Repo: persist initial metadata (PENDING)
API->>Orch: spawn execute_workflow(async)
API-->>UI: 201 Created (work_order_id)
Orch->>Repo: update status RUNNING
Orch->>Sandbox: setup(repository_url, sandbox_id)
Sandbox-->>Orch: ready
loop each selected command
Orch->>Exec: build_command(command_name, args)
Exec->>Sandbox: execute command (async)
Sandbox-->>Exec: stdout/stderr, exit_code
Exec-->>Orch: CommandExecutionResult (session_id, output)
Orch->>Repo: save_step_history(result)
Orch->>Orch: store output in context for next step
end
alt create-pr step produced PR
Orch->>GH: create_pull_request(...)
GH-->>Orch: PR URL
Orch->>Repo: update status COMPLETED + pr_url + git stats
else no PR created
Orch->>Repo: update status COMPLETED + git stats
end
Orch->>Sandbox: cleanup()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 25
🧹 Nitpick comments (27)
python/src/agent_work_orders/command_loader/claude_command_loader.py (1)
22-34: Align docstring with actual behaviorThe summary line says “Load command file content,” but the implementation returns the filesystem path. Either return the file contents or update the docstring to describe that it resolves the path. Keeping documentation accurate avoids confusion for future callers.
Apply this diff to clarify the docstring:
- def load_command(self, command_name: str) -> str: - """Load command file content + def load_command(self, command_name: str) -> str: + """Resolve the command file pathpython/src/agent_work_orders/sandbox_manager/sandbox_protocol.py (1)
22-29: Consider removing implementation-specific detail from protocol docstring.The note "Does NOT create a branch - agent creates branch during execution" is an implementation detail specific to git-based sandboxes. Protocol docstrings should describe the contract (what the method does) rather than implementation specifics (what it doesn't do).
Apply this diff to make the docstring more protocol-appropriate:
async def setup(self) -> None: """Set up the sandbox environment This should prepare the sandbox for agent execution. - For git-based sandboxes, this typically clones the repository. - Does NOT create a branch - agent creates branch during execution. + Implementations may perform tasks such as cloning repositories, + creating directories, or initializing resources. """ ...python/src/agent_work_orders/main.py (1)
31-42: Consider extracting version constant.The version string "0.1.0" is duplicated on lines 19 and 41. This could lead to inconsistencies if one is updated but not the other.
Apply this diff to extract the version:
from .utils.structured_logger import configure_structured_logging +VERSION = "0.1.0" + app = FastAPI( title="Agent Work Orders API", description="PRD-compliant agent work order system for workflow-based agent execution", - version="0.1.0", + version=VERSION, ) # CORS middleware app.add_middleware( CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) # Include routes app.include_router(router) @app.get("/health") async def health() -> dict: """Health check endpoint""" return { "status": "healthy", "service": "agent-work-orders", - "version": "0.1.0", + "version": VERSION, }python/src/agent_work_orders/state_manager/repository_factory.py (1)
15-20: Return type could be more specific.The return type
WorkOrderRepository | FileStateRepositoryis accurate but makes it harder for callers to know what interface to expect. Both classes should implement the same interface.Consider introducing a Protocol or ABC that both repository classes implement, then return that type:
from typing import Protocol class WorkOrderRepositoryProtocol(Protocol): async def create(self, work_order: AgentWorkOrderState, metadata: dict) -> None: ... async def get(self, agent_work_order_id: str) -> tuple[AgentWorkOrderState, dict] | None: ... # ... other methods def create_repository() -> WorkOrderRepositoryProtocol: """Create a work order repository based on configuration Returns: Repository instance implementing the standard interface """This would make the interface contract explicit and enable better type checking for callers.
python/E2E_TEST_RESULTS.md (2)
36-36: Add language identifier to code block.The code block at line 36 should specify the language for proper syntax highlighting.
Apply this diff:
-``` +```text 2025-10-08 12:38:57 [info] command_load_started command_name=agent_workflow_plan
81-81: Use proper heading syntax instead of emphasis.Multiple sections use bold emphasis (
**Text**) instead of proper Markdown headings. For better document structure and accessibility, use heading syntax (###).Example for line 81:
-**Option 1: Use Claude Code CLI (Current System)** +### Option 1: Use Claude Code CLI (Current System)Apply similar changes to lines 87, 103, 109, 121, 137, 148, 154, 160, and 166.
Also applies to: 87-87, 103-103, 109-109, 121-121, 137-137, 148-148, 154-154, 160-160, 166-166
python/src/agent_work_orders/utils/structured_logger.py (1)
27-27: Consider JSON logging for production deployments.The ConsoleRenderer is appropriate for MVP/development but production deployments typically benefit from JSON-formatted logs for integration with log aggregation systems (e.g., ELK, Datadog).
When moving to production, consider using:
structlog.processors.JSONRenderer()This can be made configurable based on environment:
# In config.py LOG_FORMAT = os.getenv("LOG_FORMAT", "console") # or "json" # In structured_logger.py renderer = ( structlog.processors.JSONRenderer() if config.LOG_FORMAT == "json" else structlog.dev.ConsoleRenderer() )python/tests/agent_work_orders/test_agent_executor.py (2)
261-262: Remove duplicate imports.The
tempfileandosmodules are imported locally in these test functions but are already available at module level (line 5 fortempfile). The duplicate imports are unnecessary.Remove the local import statements:
def test_build_command_replaces_arguments_placeholder(): """Test that $ARGUMENTS placeholder is replaced with actual arguments""" executor = AgentCLIExecutor() - # Create temp command file with $ARGUMENTS - import tempfile - import os - with tempfile.NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:Apply similar changes to lines 284-285.
Also applies to: 284-285
17-46: Consider using context manager for temporary file cleanup.The pattern of
NamedTemporaryFile(delete=False)followed by manualPath(command_file_path).unlink()in a try-finally block works but is more verbose than usingTemporaryDirectory(as in test_command_loader.py) or letting the context manager handle cleanup.Consider this pattern for consistency with test_command_loader.py:
def test_build_command(): """Test building Claude CLI command with all flags""" executor = AgentCLIExecutor(cli_path="claude") with tempfile.TemporaryDirectory() as tmpdir: command_file = Path(tmpdir) / "test_command.md" command_file.write_text("Test command content with args: $1 and $2") command, prompt_text = executor.build_command( command_file_path=str(command_file), args=["issue-42", "wo-test123"], model="sonnet", ) # Verify command includes required flags assert "claude" in command # ... rest of assertionsThis eliminates the try-finally and manual cleanup.
python/.claude/commands/agent-work-orders/create-branch.md (1)
13-23: Base off repo default branch and validate ref format
- Derive the default base from origin HEAD (not current branch) to avoid starting from a stale/feature branch.
- Validate generated names with git’s ref checker and enforce the 50‑char cap on the full ref (prefix + slug).
Suggested snippet to embed:
- CURRENT_BRANCH=$(git branch --show-current) - if [[ "$CURRENT_BRANCH" != "main" && "$CURRENT_BRANCH" != "master" ]]; then - echo "Warning: Currently on branch '$CURRENT_BRANCH', not main/master" - echo "Proceeding with branch creation from current branch" - fi + DEFAULT_BASE=$(git symbolic-ref --quiet refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@' || echo "main") + echo "Using base branch: $DEFAULT_BASE"- # Truncate if needed to keep under 50 chars + # Truncate if needed to keep under 50 chars (including prefix)- if git show-ref --verify --quiet refs/heads/<branch-name>; then + if git show-ref --verify --quiet "refs/heads/<branch-name>"; then echo "Branch <branch-name> already exists" # Append version suffix COUNTER=2 - while git show-ref --verify --quiet refs/heads/<branch-name>-v$COUNTER; do + while git show-ref --verify --quiet "refs/heads/<branch-name>-v$COUNTER"; do COUNTER=$((COUNTER + 1)) done BRANCH_NAME="<branch-name>-v$COUNTER" fiAnd before creation:
+ # Validate name is a valid branch ref + git check-ref-format --branch "$BRANCH_NAME" || { echo "Invalid branch name"; exit 1; } + - git checkout -b <branch-name> + git fetch origin --quiet + git checkout -b "$BRANCH_NAME" "origin/$DEFAULT_BASE"Also applies to: 37-42, 57-71
python/tests/agent_work_orders/test_models.py (1)
93-107: Add negative validation test for invalid selected_commandsEnsure CreateAgentWorkOrderRequest rejects unknown commands.
Example:
+def test_create_agent_work_order_request_rejects_invalid_command(): + with pytest.raises(ValueError): + CreateAgentWorkOrderRequest( + repository_url="https://github.com/owner/repo", + sandbox_type=SandboxType.GIT_BRANCH, + user_request="x", + selected_commands=["create-branch", "nope"], + )Also applies to: 109-120, 122-134, 136-148
python/tests/agent_work_orders/test_sandbox_manager.py (1)
98-127: Timeout handling must catch asyncio.TimeoutErrorThe implementation should catch asyncio.TimeoutError from asyncio.wait_for to produce the “timed out” result reliably.
Please ensure both GitBranchSandbox and GitWorktreeSandbox use:
except asyncio.TimeoutError: ...python/tests/agent_work_orders/test_workflow_operations.py (1)
35-37: Return a str path from load_command in tests for clarity/typing fidelity.Production load_command returns a str; using MagicMock objects as paths obscures intent. Prefer simple string paths.
Example change (apply similarly across tests):
- mock_command_loader.load_command = MagicMock(return_value=MagicMock(file_path="create-branch.md")) + mock_command_loader.load_command = MagicMock(return_value="create-branch.md")Also applies to: 100-102, 136-138, 171-173, 224-226, 256-258, 312-314, 364-366
python/src/agent_work_orders/sandbox_manager/git_branch_sandbox.py (2)
95-113: Catch asyncio.TimeoutError explicitly in execute_command.wait_for raises asyncio.TimeoutError; catching the explicit type is clearer and future-safe.
- except TimeoutError: + except asyncio.TimeoutError: process.kill() await process.wait() duration = time.time() - start_time
155-165: Log stack trace and consider non-blocking git branch lookup.Add exc_info=True; also get_current_branch uses blocking subprocess in an async function, which can block the loop. Consider moving to asyncio.subprocess or asyncio.to_thread. See utils.git_operations.get_current_branch.
- except Exception as e: - self._logger.error("git_branch_query_failed", error=str(e)) + except Exception as e: + self._logger.error("git_branch_query_failed", error=str(e), exc_info=True) return Nonepython/src/agent_work_orders/utils/port_allocation.py (3)
7-9: Use stable hash fallback for deterministic ports.Python’s hash() is randomized per process. Use a stable hash (e.g., SHA‑1) when base‑36 conversion fails.
@@ -import os +import os +import hashlib @@ - try: + try: # Take first 8 alphanumeric chars and convert from base 36 id_chars = ''.join(c for c in work_order_id[:8] if c.isalnum()) index = int(id_chars, 36) % 15 except ValueError: - # Fallback to simple hash if conversion fails - index = hash(work_order_id) % 15 + # Fallback to stable hash if conversion fails + h = hashlib.sha1(work_order_id.encode("utf-8")).hexdigest()[:8] + index = int(h, 16) % 15Also applies to: 21-33
46-49: Optional: set SO_REUSEADDR to reduce TIME_WAIT bind issues.Not required, but improves availability checks on busy dev hosts.
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.settimeout(1) + s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) s.bind(('localhost', port)) return True
70-78: Note: TOCTOU race in port selection is acceptable but inherent.Between availability check and actual use, another process may take the port. Consider rechecking on use site or allowing a small retry loop where consumed.
python/src/agent_work_orders/github_integration/github_client.py (2)
297-305: Support SSH Git URLs (git@github.com:owner/repo.git)._parser currently handles http(s) and owner/repo only. Add SSH pattern for broader compatibility.
- if repository_url.startswith("http"): + if repository_url.startswith("http") or repository_url.startswith("git@"): # Extract from URL - match = re.search(r"github\.com[/:]([^/]+)/([^/\.]+)", repository_url) + match = re.search(r"github\.com[/:]([^/]+)/([^/\.]+)", repository_url) if not match: raise ValueError("Invalid GitHub URL format") return match.group(1), match.group(2)This regex already matches both https://github.com/owner/repo(.git) and git@github.com:owner/repo(.git).
24-70: Minor: consider returning error details on verification failure for API consumers.verify_repository_access returns bool only; exposing stderr message (when safe) could help UX in the verify endpoint.
python/src/agent_work_orders/agent_executor/agent_cli_executor.py (2)
287-289: Preserve stack traces on failures with exc_info=TrueAdd exc_info=True to warnings per backend logging guidelines to aid diagnosis.
[As per coding guidelines]
- self._logger.warning("prompt_save_failed", error=str(e)) + self._logger.warning("prompt_save_failed", error=str(e), exc_info=True) @@ - except Exception as e: - self._logger.warning("jsonl_to_json_conversion_failed", error=str(e)) + except Exception as e: + self._logger.warning("jsonl_to_json_conversion_failed", error=str(e), exc_info=True) @@ - except Exception as e: - self._logger.warning("output_artifacts_save_failed", error=str(e)) + except Exception as e: + self._logger.warning("output_artifacts_save_failed", error=str(e), exc_info=True) @@ - except Exception as e: - self._logger.warning("session_id_extraction_failed", error=str(e)) + except Exception as e: + self._logger.warning("session_id_extraction_failed", error=str(e), exc_info=True) @@ - except Exception as e: - self._logger.warning("result_message_extraction_failed", error=str(e)) + except Exception as e: + self._logger.warning("result_message_extraction_failed", error=str(e), exc_info=True)Also applies to: 323-325, 329-330, 356-357, 384-385
55-57: Specify UTF‑8 encoding for file IOBe explicit for portability and to avoid locale‑dependent bugs.
- with open(command_file_path) as f: + with open(command_file_path, encoding="utf-8") as f: prompt_text = f.read()Apply similarly to other file writes:
- Line 282: with open(prompt_file, "w", encoding="utf-8")
- Line 314: with open(jsonl_file, "w", encoding="utf-8")
- Line 321: with open(json_file, "w", encoding="utf-8")
python/src/agent_work_orders/models.py (3)
237-244: Include REVIEW in step progression (and consider selected commands awareness)Sequence omits REVIEW (“prp-review”). Add it; longer‑term, derive from selected_commands for flexibility.
step_sequence = [ WorkflowStep.CREATE_BRANCH, WorkflowStep.PLANNING, WorkflowStep.EXECUTE, WorkflowStep.COMMIT, WorkflowStep.CREATE_PR, + WorkflowStep.REVIEW, ]If desired, I can refactor get_current_step(selected_commands: list[WorkflowStep]) to compute the next step from the user‑selected sequence.
107-121: Type selected_commands as Enum list; drop manual validatorLet Pydantic coerce/validate strings to WorkflowStep and simplify code.
- selected_commands: list[str] = Field( - default=["create-branch", "planning", "execute", "commit", "create-pr"], - description="Commands to run in sequence" - ) + selected_commands: list[WorkflowStep] = Field( + default=[ + WorkflowStep.CREATE_BRANCH, + WorkflowStep.PLANNING, + WorkflowStep.EXECUTE, + WorkflowStep.COMMIT, + WorkflowStep.CREATE_PR, + ], + description="Commands to run in sequence" + ) @@ - @field_validator('selected_commands') - @classmethod - def validate_commands(cls, v: list[str]) -> list[str]: - """Validate that all commands are valid WorkflowStep values""" - valid = {step.value for step in WorkflowStep} - for cmd in v: - if cmd not in valid: - raise ValueError(f"Invalid command: {cmd}. Must be one of {valid}") - return v + # Pydantic will coerce string inputs to WorkflowStep and validate automatically.This remains wire‑compatible; clients can still send strings.
21-25: Remove leftover AgentWorkflowType (workflow_type was removed in this PR)Enum appears unused and contradicts the new selected_commands design; delete to avoid confusion.
-class AgentWorkflowType(str, Enum): - """Workflow types for agent execution""" - - PLAN = "agent_workflow_plan" -python/src/agent_work_orders/state_manager/file_state_repository.py (2)
183-275: Consider raising exceptions for missing work orders.Lines 203, 235, and 264 return silently when the work order is not found, only logging a warning. This diverges from the coding guideline: "Never return None to indicate failure in backend Python; raise exceptions with details instead."
Additionally, this behavior is inconsistent with
save_step_history(lines 289-294), which creates minimal state if the work order doesn't exist.Consider one of these approaches:
Option 1 (recommended): Raise exceptions for missing work orders
async with self._lock: data = await self._read_state_file(agent_work_order_id) if not data: - self._logger.warning( + self._logger.error( "work_order_not_found_for_update", agent_work_order_id=agent_work_order_id ) - return + raise ValueError(f"Work order not found: {agent_work_order_id}")Option 2: Document the silent-return behavior
If silent returns are intentional (e.g., for idempotency), add a note to each method's docstring:
"""Update work order status and other fields Args: agent_work_order_id: Work order ID status: New status **kwargs: Additional fields to update Note: Returns silently if work order does not exist. """
337-343: Unclear type ignore comment.Line 337 has
# type: ignore[valid-type]for a straightforwardlist[str]return type. This comment may be unnecessary or indicates a mypy configuration issue.Try removing the type ignore comment:
- def list_state_ids(self) -> "list[str]": # type: ignore[valid-type] + def list_state_ids(self) -> list[str]: """List all work order IDs with state files Returns: List of work order IDs """ return [f.stem for f in self.state_directory.glob("*.json")]If type checking still fails, investigate the root cause rather than silencing the warning.
| @@ -0,0 +1,55 @@ | |||
| # Create Git Commit | |||
|
|
|||
| Create an atomic git commit with a properly formatted commit message following best practices for the uncommited changes or these specific files if specified. | |||
There was a problem hiding this comment.
Correct the spelling in the opening sentence.
“uncommited” → “uncommitted” to keep the instructions polished.
-Create an atomic git commit with a properly formatted commit message following best practices for the uncommited changes or these specific files if specified.
+Create an atomic git commit with a properly formatted commit message following best practices for the uncommitted changes or these specific files if specified.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Create an atomic git commit with a properly formatted commit message following best practices for the uncommited changes or these specific files if specified. | |
| Create an atomic git commit with a properly formatted commit message following best practices for the uncommitted changes or these specific files if specified. |
🧰 Tools
🪛 LanguageTool
[grammar] ~3-~3: Ensure spelling is correct
Context: ...essage following best practices for the uncommited changes or these specific files if spec...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In .claude/commands/agent-work-orders/commit.md around line 3, the word
"uncommited" is misspelled; change it to "uncommitted" so the opening sentence
reads "Create an atomic git commit with a properly formatted commit message
following best practices for the uncommitted changes or these specific files if
specified."
| - `CLAUDE.md` - Core project instructions, principles, logging rules, testing requirements | ||
| - `python/src/agent_work_orders` - Project overview and setup (if exists) | ||
|
|
||
| - Identify core files in the agent work orders directory to understand what we are woerking on and its intent |
There was a problem hiding this comment.
Fix typo in instructions.
“woerking” → “working” to keep onboarding guidance clear.
-- Identify core files in the agent work orders directory to understand what we are woerking on and its intent
+- Identify core files in the agent work orders directory to understand what we are working on and its intent📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Identify core files in the agent work orders directory to understand what we are woerking on and its intent | |
| - Identify core files in the agent work orders directory to understand what we are working on and its intent |
🧰 Tools
🪛 LanguageTool
[grammar] ~15-~15: Ensure spelling is correct
Context: ...ers directory to understand what we are woerking on and its intent ## Report Provide a...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In .claude/commands/agent-work-orders/prime.md around line 15, there is a typo
"woerking" that should be "working"; update that word in the sentence so the
instruction reads "Identify core files in the agent work orders directory to
understand what we are working on and its intent."
| - Navigate to backend: `cd app/backend` | ||
| - Start server in background: `uv sync && uv run python run_api.py` | ||
| - Wait 2-3 seconds for startup | ||
| - Test health endpoint: `curl http://localhost:8000/health` | ||
| - Test products endpoint: `curl http://localhost:8000/api/products` | ||
|
|
||
| ### Frontend Server (Bun + React) | ||
|
|
||
| - Navigate to frontend: `cd ../app/frontend` | ||
| - Start server in background: `bun install && bun dev` | ||
| - Wait 2-3 seconds for startup | ||
| - Frontend should be accessible at `http://localhost:3000` |
There was a problem hiding this comment.
🧩 Analysis chain
Verify documented paths match repository structure.
The documented paths (app/backend, app/frontend, run_api.py) do not appear to match the actual repository structure based on the PR context, where the FastAPI app is at python/src/agent_work_orders/main.py and mounted via python/server/main.py.
Verify these paths exist in the repository structure:
🏁 Script executed:
#!/bin/bash
# Check if documented paths exist
echo "Checking backend path:"
fd -t d "^backend$" app/ 2>/dev/null || echo "app/backend not found"
echo ""
echo "Checking frontend path:"
fd -t d "^frontend$" app/ 2>/dev/null || echo "app/frontend not found"
echo ""
echo "Checking run_api.py:"
fd "^run_api\.py$" app/ 2>/dev/null || echo "run_api.py not found in app/"
echo ""
echo "Actual Python structure:"
fd -t f "main\.py" python/src/agent_work_orders/Length of output: 206
Update start-server.md with correct backend/frontend paths
The instructions reference non-existent app/backend, app/frontend, and run_api.py. Update to:
- Backend:
cd python/server(orcd python/src/agent_work_orders), then
uv sync && uv run python main.py - Remove or correct the Frontend section to point at the actual frontend directory (or omit if none)
🧰 Tools
🪛 LanguageTool
[grammar] ~15-~15: There might be a mistake here.
Context: ...erver (FastAPI) - Navigate to backend: cd app/backend - Start server in background: `uv sync && ...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: .../backend- Start server in background:uv sync && uv run python run_api.py` - Wait 2-3 seconds for startup - Test heal...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...nds for startup - Test health endpoint: curl http://localhost:8000/health - Test products endpoint: `curl http://loc...
(QB_NEW_EN)
[grammar] ~23-~23: There might be a mistake here.
Context: ... (Bun + React) - Navigate to frontend: cd ../app/frontend - Start server in background: `bun install...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...frontend- Start server in background:bun install && bun dev` - Wait 2-3 seconds for startup - Frontend ...
(QB_NEW_EN)
🤖 Prompt for AI Agents
In .claude/commands/agent-work-orders/start-server.md around lines 15 to 26, the
startup instructions point to non-existent paths/app files; update the backend
path to the real location (cd python/server or cd python/src/agent_work_orders)
and change the run command to use main.py (uv sync && uv run python main.py);
also remove or correct the Frontend section so it references the actual frontend
directory if one exists, otherwise delete the frontend block entirely; ensure
all example curl URLs remain correct for the configured host/port.
| # Create Git Commit | ||
|
|
||
| Create an atomic git commit with a properly formatted commit message following best practices for the uncommited changes or these specific files if specified. | ||
|
|
||
| Specific files (skip if not specified): | ||
|
|
||
| - File 1: $1 | ||
| - File 2: $2 | ||
| - File 3: $3 | ||
| - File 4: $4 | ||
| - File 5: $5 | ||
|
|
There was a problem hiding this comment.
Fix typo in line 3.
Line 3 contains a typo: "uncommited" should be "uncommitted".
Apply this diff:
-Create an atomic git commit with a properly formatted commit message following best practices for the uncommited changes or these specific files if specified.
+Create an atomic git commit with a properly formatted commit message following best practices for the uncommitted changes or these specific files if specified.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Create Git Commit | |
| Create an atomic git commit with a properly formatted commit message following best practices for the uncommited changes or these specific files if specified. | |
| Specific files (skip if not specified): | |
| - File 1: $1 | |
| - File 2: $2 | |
| - File 3: $3 | |
| - File 4: $4 | |
| - File 5: $5 | |
| # Create Git Commit | |
| Create an atomic git commit with a properly formatted commit message following best practices for the uncommitted changes or these specific files if specified. | |
| Specific files (skip if not specified): | |
| - File 1: $1 | |
| - File 2: $2 | |
| - File 3: $3 | |
| - File 4: $4 | |
| - File 5: $5 |
🧰 Tools
🪛 LanguageTool
[grammar] ~3-~3: Ensure spelling is correct
Context: ...essage following best practices for the uncommited changes or these specific files if spec...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~7-~7: There might be a mistake here.
Context: ...s (skip if not specified): - File 1: $1 - File 2: $2 - File 3: $3 - File 4: $4 - F...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...t specified): - File 1: $1 - File 2: $2 - File 3: $3 - File 4: $4 - File 5: $5 ##...
(QB_NEW_EN)
[grammar] ~9-~9: There might be a mistake here.
Context: ... - File 1: $1 - File 2: $2 - File 3: $3 - File 4: $4 - File 5: $5 ## Instructions...
(QB_NEW_EN)
[grammar] ~10-~10: There might be a mistake here.
Context: ...1 - File 2: $2 - File 3: $3 - File 4: $4 - File 5: $5 ## Instructions **Commit Me...
(QB_NEW_EN)
🤖 Prompt for AI Agents
In python/.claude/commands/agent-work-orders/commit.md around lines 1 to 12,
there's a typo on line 3: replace "uncommited" with the correct spelling
"uncommitted" so the sentence reads "Create an atomic git commit with a properly
formatted commit message following best practices for the uncommitted changes or
these specific files if specified."; update the file accordingly and save.
| ## Read | ||
|
|
||
| - `CLAUDE.md` - Core project instructions, principles, logging rules, testing requirements | ||
| - `python/src/agent_work_orders` - Project overview and setup (if exists) | ||
|
|
||
| - Identify core files in the agent work orders directory to understand what we are woerking on and its intent |
There was a problem hiding this comment.
Fix typo in line 15.
Line 15 contains a typo: "woerking" should be "working".
Apply this diff:
-- Identify core files in the agent work orders directory to understand what we are woerking on and its intent
+- Identify core files in the agent work orders directory to understand what we are working on and its intent📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Read | |
| - `CLAUDE.md` - Core project instructions, principles, logging rules, testing requirements | |
| - `python/src/agent_work_orders` - Project overview and setup (if exists) | |
| - Identify core files in the agent work orders directory to understand what we are woerking on and its intent | |
| ## Read | |
| - `CLAUDE.md` - Core project instructions, principles, logging rules, testing requirements | |
| - `python/src/agent_work_orders` - Project overview and setup (if exists) | |
| - Identify core files in the agent work orders directory to understand what we are working on and its intent |
🧰 Tools
🪛 LanguageTool
[grammar] ~15-~15: Ensure spelling is correct
Context: ...ers directory to understand what we are woerking on and its intent ## Report Provide a...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In python/.claude/commands/agent-work-orders/prime.md around lines 10 to 15,
there's a typo on line 15 where "woerking" should be "working"; update that word
to the correct spelling and save the file.
| def test_step_history_creation(): | ||
| """Test creating StepHistory""" | ||
| history = StepHistory(agent_work_order_id="wo-test123", steps=[]) | ||
|
|
||
| assert history.agent_work_order_id == "wo-test123" | ||
| assert len(history.steps) == 0 | ||
|
|
There was a problem hiding this comment.
Avoid mutable default list in StepHistory; add coverage
Pydantic field steps uses a mutable default in models (see models.py lines 221‑251). Use default_factory to prevent shared state, and add a test to guard it.
Model fix (outside this file):
- class StepHistory(BaseModel):
+ class StepHistory(BaseModel):
"""History of all step executions for a work order"""
agent_work_order_id: str
- steps: list[StepExecutionResult] = []
+ steps: list[StepExecutionResult] = Field(default_factory=list)Add a test here to catch regressions:
+def test_step_history_steps_default_isolated_instances():
+ h1 = StepHistory(agent_work_order_id="wo1")
+ h1.steps.append(
+ StepExecutionResult(
+ step=WorkflowStep.CREATE_BRANCH,
+ agent_name="x",
+ success=True,
+ duration_seconds=0.1,
+ )
+ )
+ h2 = StepHistory(agent_work_order_id="wo2")
+ assert h2.steps == []Also applies to: 208-233, 234-239
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_models.py around lines 200-206 (and
similarly for ranges 208-233 and 234-239), the test and model are vulnerable to
a mutable default list for the StepHistory.steps Pydantic field (see models.py
lines 221-251); update the model to use pydantic.Field(default_factory=list) for
steps, then expand this test to assert lists aren't shared by creating two
StepHistory instances, mutate one (e.g., append a step) and assert the other
remains unchanged to guard against regressions.
| def test_sandbox_factory_not_implemented(): | ||
| """Test creating unsupported sandbox types""" | ||
| factory = SandboxFactory() | ||
|
|
||
| with pytest.raises(NotImplementedError): | ||
| factory.create_sandbox( | ||
| sandbox_type=SandboxType.GIT_WORKTREE, | ||
| repository_url="https://github.com/owner/repo", | ||
| sandbox_identifier="sandbox-test", | ||
| ) | ||
|
|
||
| with pytest.raises(NotImplementedError): | ||
| factory.create_sandbox( | ||
| sandbox_type=SandboxType.E2B, | ||
| repository_url="https://github.com/owner/repo", | ||
| sandbox_identifier="sandbox-test", | ||
| ) | ||
|
|
||
| with pytest.raises(NotImplementedError): | ||
| factory.create_sandbox( | ||
| sandbox_type=SandboxType.DAGGER, | ||
| repository_url="https://github.com/owner/repo", | ||
| sandbox_identifier="sandbox-test", | ||
| ) |
There was a problem hiding this comment.
Update factory test: Git worktree is implemented
SandboxFactory now returns GitWorktreeSandbox for GIT_WORKTREE (see sandbox_factory.py). Adjust expectations to assert instance creation rather than NotImplementedError; keep NotImplemented for E2B and DAGGER.
Suggested change:
- with pytest.raises(NotImplementedError):
- factory.create_sandbox(
- sandbox_type=SandboxType.GIT_WORKTREE,
- repository_url="https://github.com/owner/repo",
- sandbox_identifier="sandbox-test",
- )
+ wt = factory.create_sandbox(
+ sandbox_type=SandboxType.GIT_WORKTREE,
+ repository_url="https://github.com/owner/repo",
+ sandbox_identifier="sandbox-test",
+ )
+ from src.agent_work_orders.sandbox_manager.git_worktree_sandbox import GitWorktreeSandbox
+ assert isinstance(wt, GitWorktreeSandbox)🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_sandbox_manager.py around lines 182 to
205, the test currently expects NotImplementedError for GIT_WORKTREE but the
factory now returns a GitWorktreeSandbox; change the first assertion to call
factory.create_sandbox(...) for sandbox_type=SandboxType.GIT_WORKTREE and assert
the returned object is an instance of GitWorktreeSandbox (import it at top if
needed), while leaving the two subsequent with
pytest.raises(NotImplementedError) checks for E2B and DAGGER unchanged.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
python/tests/agent_work_orders/test_sandbox_manager.py (1)
182-198: Test GIT_WORKTREE sandbox creation.
SandboxFactorynow returnsGitWorktreeSandboxforGIT_WORKTREE(persandbox_factory.py). The test should verify this implementation rather than only testing the unimplemented sandbox types (E2B, DAGGER).Add a test case for GIT_WORKTREE before testing the unimplemented types:
def test_sandbox_factory_git_worktree(): """Test creating git worktree sandbox via factory""" factory = SandboxFactory() sandbox = factory.create_sandbox( sandbox_type=SandboxType.GIT_WORKTREE, repository_url="https://github.com/owner/repo", sandbox_identifier="sandbox-test", ) from src.agent_work_orders.sandbox_manager.git_worktree_sandbox import GitWorktreeSandbox assert isinstance(sandbox, GitWorktreeSandbox) assert sandbox.repository_url == "https://github.com/owner/repo" assert sandbox.sandbox_identifier == "sandbox-test" def test_sandbox_factory_not_implemented(): """Test creating unsupported sandbox types""" factory = SandboxFactory() with pytest.raises(NotImplementedError): factory.create_sandbox( sandbox_type=SandboxType.E2B, repository_url="https://github.com/owner/repo", sandbox_identifier="sandbox-test", ) with pytest.raises(NotImplementedError): factory.create_sandbox( sandbox_type=SandboxType.DAGGER, repository_url="https://github.com/owner/repo", sandbox_identifier="sandbox-test", )
🧹 Nitpick comments (2)
python/tests/agent_work_orders/test_sandbox_manager.py (2)
1-11: Addasyncioto module-level imports.The
asynciomodule is currently imported insidetest_git_branch_sandbox_execute_command_timeout(line 101) but should be imported at the module level alongside other standard library imports for consistency and clarity.Apply this diff to add the import:
import pytest from pathlib import Path from unittest.mock import AsyncMock, MagicMock, patch from tempfile import TemporaryDirectory +import asyncio
101-101: Remove redundant inline import.Once
asynciois added to the module-level imports (as suggested above), this inline import becomes unnecessary and should be removed.Apply this diff:
- import asyncio - with TemporaryDirectory() as tmpdir:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
python/tests/agent_work_orders/test_api.py(1 hunks)python/tests/agent_work_orders/test_sandbox_manager.py(1 hunks)python/tests/agent_work_orders/test_state_manager.py(1 hunks)python/tests/agent_work_orders/test_workflow_orchestrator.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- python/tests/agent_work_orders/test_state_manager.py
🧰 Additional context used
🧬 Code graph analysis (3)
python/tests/agent_work_orders/test_api.py (3)
python/src/agent_work_orders/models.py (8)
AgentWorkOrderStatus(12-18)AgentWorkflowType(21-24)SandboxType(27-33)AgentWorkOrderState(54-65)GitHubRepository(166-172)StepExecutionResult(209-219)StepHistory(222-252)WorkflowStep(43-51)python/src/agent_work_orders/state_manager/work_order_repository.py (2)
get(45-60)get_step_history(164-174)python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py (1)
execute_workflow(43-199)
python/tests/agent_work_orders/test_workflow_orchestrator.py (3)
python/src/agent_work_orders/models.py (6)
AgentWorkOrderStatus(12-18)SandboxType(27-33)StepExecutionResult(209-219)StepHistory(222-252)WorkflowExecutionError(261-264)WorkflowStep(43-51)python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py (2)
WorkflowOrchestrator(25-199)execute_workflow(43-199)python/src/agent_work_orders/sandbox_manager/sandbox_factory.py (1)
create_sandbox(15-43)
python/tests/agent_work_orders/test_sandbox_manager.py (3)
python/src/agent_work_orders/models.py (2)
SandboxSetupError(267-270)SandboxType(27-33)python/src/agent_work_orders/sandbox_manager/git_branch_sandbox.py (5)
GitBranchSandbox(20-179)setup(38-70)execute_command(72-153)get_git_branch_name(155-165)cleanup(167-179)python/src/agent_work_orders/sandbox_manager/sandbox_factory.py (2)
SandboxFactory(12-43)create_sandbox(15-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests (Python + pytest)
| request_data = { | ||
| "repository_url": "https://github.com/owner/repo", | ||
| "sandbox_type": "git_branch", | ||
| "workflow_type": "agent_workflow_plan", | ||
| "user_request": "Add user authentication feature", | ||
| "github_issue_number": "42", | ||
| } |
There was a problem hiding this comment.
Update API tests to the selected_commands contract.
These payloads/metadata still use workflow_type, but the API now validates selected_commands. As written, the tests will 422 (or assert the wrong response shape). Replace every workflow_type field with the appropriate selected_commands list and drop the AgentWorkflowType import.
Example fix:
-from src.agent_work_orders.models import (
- AgentWorkOrderStatus,
- AgentWorkflowType,
- SandboxType,
-)
+from src.agent_work_orders.models import (
+ AgentWorkOrderStatus,
+ SandboxType,
+)
@@
- request_data = {
- "repository_url": "https://github.com/owner/repo",
- "sandbox_type": "git_branch",
- "workflow_type": "agent_workflow_plan",
- "user_request": "Add user authentication feature",
- "github_issue_number": "42",
- }
+ request_data = {
+ "repository_url": "https://github.com/owner/repo",
+ "sandbox_type": "git_branch",
+ "selected_commands": ["create-branch", "planning", "execute", "commit", "create-pr"],
+ "user_request": "Add user authentication feature",
+ "github_issue_number": "42",
+ }
@@
- metadata = {
- "workflow_type": AgentWorkflowType.PLAN,
+ metadata = {
+ "selected_commands": ["create-branch", "planning", "execute", "commit", "create-pr"],
"sandbox_type": SandboxType.GIT_BRANCH,
...Apply the same substitution to every request payload and metadata dictionary that still mentions workflow_type.
Also applies to: 55-60, 107-115, 152-164, 202-210
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_api.py around lines 33 to 39 (and also
update the other regions flagged: 55-60, 107-115, 152-164, 202-210), the test
request payloads still use the deprecated "workflow_type" key which the API now
validates as "selected_commands"; replace each "workflow_type":
"agent_workflow_plan" (or other workflow values) with "selected_commands":
["agent_workflow_plan"] (or the appropriate list of command strings for that
test), remove any import/use of AgentWorkflowType, and ensure the request_data
and any metadata dicts follow the new contract so the tests send
selected_commands lists and assert the updated response shapes.
| await orchestrator.execute_workflow( | ||
| agent_work_order_id="wo-test", | ||
| repository_url="https://github.com/owner/repo", | ||
| sandbox_type=SandboxType.GIT_BRANCH, | ||
| user_request="Test feature", | ||
| selected_commands=["create-branch", "planning", "execute"], | ||
| ) |
There was a problem hiding this comment.
Assert failure path raises WorkflowExecutionError.
Line 195 calls execute_workflow after a failed step, but never asserts that WorkflowExecutionError propagates. Without pytest.raises, the test still passes even if the orchestrator silently swallows failures—which is the bug we’re trying to prevent. Wrap the await in with pytest.raises(WorkflowExecutionError) so the test fails if the exception isn’t re-raised.
- await orchestrator.execute_workflow(
+ with pytest.raises(WorkflowExecutionError):
+ await orchestrator.execute_workflow(
agent_work_order_id="wo-test",
repository_url="https://github.com/owner/repo",
sandbox_type=SandboxType.GIT_BRANCH,
user_request="Test feature",
selected_commands=["create-branch", "planning", "execute"],
- )
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await orchestrator.execute_workflow( | |
| agent_work_order_id="wo-test", | |
| repository_url="https://github.com/owner/repo", | |
| sandbox_type=SandboxType.GIT_BRANCH, | |
| user_request="Test feature", | |
| selected_commands=["create-branch", "planning", "execute"], | |
| ) | |
| with pytest.raises(WorkflowExecutionError): | |
| await orchestrator.execute_workflow( | |
| agent_work_order_id="wo-test", | |
| repository_url="https://github.com/owner/repo", | |
| sandbox_type=SandboxType.GIT_BRANCH, | |
| user_request="Test feature", | |
| selected_commands=["create-branch", "planning", "execute"], | |
| ) |
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_workflow_orchestrator.py around lines 195
to 201, the test calls await orchestrator.execute_workflow after a failed step
but does not assert that WorkflowExecutionError is raised; update the test to
wrap the await call in with pytest.raises(WorkflowExecutionError): so the test
fails if the orchestrator swallows the error, and ensure pytest and
WorkflowExecutionError are imported or available in the test file.
| await orchestrator.execute_workflow( | ||
| agent_work_order_id="wo-test", | ||
| repository_url="https://github.com/owner/repo", | ||
| sandbox_type=SandboxType.GIT_BRANCH, | ||
| user_request="Test feature", | ||
| selected_commands=["invalid-command"], | ||
| ) |
There was a problem hiding this comment.
execute_workflow with unknown commands should raise.
The unknown-command path (Line 339) is also expected to raise WorkflowExecutionError, yet the test never asserts that behavior. Add pytest.raises(WorkflowExecutionError) around the call so the suite fails if the orchestrator stops propagating the error.
- await orchestrator.execute_workflow(
- agent_work_order_id="wo-test",
- repository_url="https://github.com/owner/repo",
- sandbox_type=SandboxType.GIT_BRANCH,
- user_request="Test feature",
- selected_commands=["invalid-command"],
- )
+ with pytest.raises(WorkflowExecutionError):
+ await orchestrator.execute_workflow(
+ agent_work_order_id="wo-test",
+ repository_url="https://github.com/owner/repo",
+ sandbox_type=SandboxType.GIT_BRANCH,
+ user_request="Test feature",
+ selected_commands=["invalid-command"],
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await orchestrator.execute_workflow( | |
| agent_work_order_id="wo-test", | |
| repository_url="https://github.com/owner/repo", | |
| sandbox_type=SandboxType.GIT_BRANCH, | |
| user_request="Test feature", | |
| selected_commands=["invalid-command"], | |
| ) | |
| with pytest.raises(WorkflowExecutionError): | |
| await orchestrator.execute_workflow( | |
| agent_work_order_id="wo-test", | |
| repository_url="https://github.com/owner/repo", | |
| sandbox_type=SandboxType.GIT_BRANCH, | |
| user_request="Test feature", | |
| selected_commands=["invalid-command"], | |
| ) |
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_workflow_orchestrator.py around lines 339
to 345, the test calls execute_workflow with an invalid command but does not
assert that a WorkflowExecutionError is raised; wrap the call in a
pytest.raises(WorkflowExecutionError) context manager so the test will fail if
the orchestrator no longer propagates the error, ensuring the invalid-command
path raises as expected.
- add trailing slashes to agent-work-orders endpoints to prevent FastAPI mount() redirects - add defensive null check for repository_url in detail view - fix backend routes to use relative paths with app.mount() - resolves ERR_NAME_NOT_RESOLVED when accessing agent work orders
- add trailing slashes to prevent FastAPI mount() 307 redirects - add defensive null check for repository_url in detail view - fixes ERR_NAME_NOT_RESOLVED when browser follows redirect to archon-server
This reverts commit c2a568e.
- Change from fixed backend/frontend ports to 10-port ranges per work order - Support 20 concurrent work orders (200 ports: 9000-9199) - Add port availability checking with flexible allocation - Make git_worktree default sandbox type - Standardize API routes with /api/ prefix - Add comprehensive port allocation tests - Update environment file generation with PORT_0-PORT_9 variables - Maintain backward compatibility with BACKEND_PORT/FRONTEND_PORT aliases
- Sort imports consistently - Remove unused imports (pytest, MagicMock, patch, etc.) - Update to datetime.UTC alias from timezone.utc - Fix formatting and organization issues
- Move sse-starlette from base dependencies to agent-work-orders group - Keep structlog in agent-work-orders group (already there) - Update lockfile accordingly
There was a problem hiding this comment.
Actionable comments posted: 53
♻️ Duplicate comments (12)
python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py (1)
125-143: The current TimeoutError catch is correct for Python 3.12.The past review suggested catching
asyncio.TimeoutErrorinstead ofTimeoutError. However, in Python 3.11+,asyncio.TimeoutErrorwas deprecated and is now an alias for the built-inTimeoutError. Since this codebase targets Python 3.12 (per coding guidelines), catchingTimeoutErroris the correct approach. Theasyncio.wait_forfunction raises the built-inTimeoutErrorin Python 3.12.python/src/agent_work_orders/models.py (1)
229-229: Mutable default list on Pydantic field (shared across instances).Use Field(default_factory=list).
- steps: list[StepExecutionResult] = [] + steps: list[StepExecutionResult] = Field(default_factory=list)python/src/agent_work_orders/main.py (2)
13-14: Move logging configuration to startup event handler.Configuring logging at module import time prevents proper test isolation and creates side effects during import. This issue was previously flagged but remains unresolved.
Apply this diff to move logging to a startup event:
-# Configure logging on startup -configure_structured_logging(config.LOG_LEVEL) - app = FastAPI( title="Agent Work Orders API", description="Agent work order system for workflow-based agent execution", version="0.1.0", ) + +@app.on_event("startup") +async def startup_event() -> None: + """Configure logging on application startup""" + configure_structured_logging(config.LOG_LEVEL)
22-29: Fix insecure CORS configuration.The combination of
allow_origins=["*"]andallow_credentials=Trueis a security vulnerability that was previously flagged but remains unresolved. Browser CORS policies do not allow wildcards when credentials are enabled, exposing the API to credential theft.Apply this diff to use environment-based origin configuration:
+from typing import cast + +# Get allowed origins from environment or use sensible defaults +allowed_origins = cast( + list[str], + config.ALLOWED_ORIGINS.split(",") if hasattr(config, "ALLOWED_ORIGINS") else ["http://localhost:3000"], +) + # CORS middleware app.add_middleware( CORSMiddleware, - allow_origins=["*"], + allow_origins=allowed_origins, allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )Then add to
python/src/agent_work_orders/config.py:ALLOWED_ORIGINS: str = os.getenv("ALLOWED_ORIGINS", "http://localhost:3000")python/tests/agent_work_orders/test_api.py (6)
8-12: Remove unusedAgentWorkflowTypeimport; API now usesselected_commands.This import is obsolete—
workflow_typewas removed from the API contract in favor ofselected_commands. Tests usingAgentWorkflowTypewill fail validation.Apply this diff:
from src.agent_work_orders.models import ( - AgentWorkflowType, AgentWorkOrderStatus, SandboxType, )
32-38: Replaceworkflow_typewithselected_commandsarray.The API contract changed—
workflow_typeno longer exists. Sendselected_commandsas a list of command strings.Apply this diff:
request_data = { "repository_url": "https://github.com/owner/repo", "sandbox_type": "git_branch", - "workflow_type": "agent_workflow_plan", + "selected_commands": ["create-branch", "planning", "execute", "commit", "create-pr"], "user_request": "Add user authentication feature", "github_issue_number": "42", }
54-60: Replaceworkflow_typewithselected_commandsarray.Same issue as line 35—the API no longer accepts
workflow_type.Apply this diff:
request_data = { "repository_url": "https://github.com/owner/repo", "sandbox_type": "git_branch", - "workflow_type": "agent_workflow_plan", + "selected_commands": ["create-branch", "planning", "execute", "commit", "create-pr"], "user_request": "Fix the login bug where users can't sign in", }
106-114: Update metadata to useselected_commandsinstead ofworkflow_type.The mocked metadata doesn't match the API response shape—
workflow_typeno longer exists in work order metadata.Apply this diff:
metadata = { - "workflow_type": AgentWorkflowType.PLAN, + "selected_commands": ["create-branch", "planning", "execute", "commit", "create-pr"], "sandbox_type": SandboxType.GIT_BRANCH, "github_issue_number": "42", "status": AgentWorkOrderStatus.RUNNING, "current_phase": None, "created_at": datetime.now(), "updated_at": datetime.now(), }
151-163: Update metadata to useselected_commandsinstead ofworkflow_type.Same issue as line 107—metadata structure is outdated.
Apply this diff:
metadata = { - "workflow_type": AgentWorkflowType.PLAN, + "selected_commands": ["create-branch", "planning", "execute", "commit", "create-pr"], "sandbox_type": SandboxType.GIT_BRANCH, "github_issue_number": "42", "status": AgentWorkOrderStatus.COMPLETED, "current_phase": None, "created_at": datetime.now(), "updated_at": datetime.now(), "github_pull_request_url": "https://github.com/owner/repo/pull/42", "git_commit_count": 5, "git_files_changed": 10, "error_message": None, }
200-209: Update metadata to useselected_commandsinstead ofworkflow_type.Same issue as previous tests—metadata still references the removed
workflow_typefield.Apply this diff:
metadata = { - "workflow_type": AgentWorkflowType.PLAN, + "selected_commands": ["create-branch", "planning", "execute", "commit", "create-pr"], "sandbox_type": SandboxType.GIT_BRANCH, "status": AgentWorkOrderStatus.RUNNING, "current_phase": None, "created_at": datetime.now(), "updated_at": datetime.now(), "git_commit_count": 3, "git_files_changed": 7, }python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py (1)
226-237: Status remains RUNNING if workflow completes without create-branch command.Lines 232-237 only update the status to COMPLETED when
branch_nameexists (i.e., when create-branch was executed). If a user runs a custom workflow like["planning", "execute"]without create-branch or create-pr, the work order will never transition to COMPLETED status.Apply this diff to unconditionally mark the workflow COMPLETED after the loop:
- # Calculate git stats for workflows that complete without PR - branch_name = context.get("create-branch") - if branch_name: - git_stats = await self._calculate_git_stats( - branch_name, sandbox.working_dir - ) - await self.state_repository.update_status( - agent_work_order_id, - AgentWorkOrderStatus.COMPLETED, - git_commit_count=git_stats["commit_count"], - git_files_changed=git_stats["files_changed"], - ) + # Calculate git stats for workflows that complete without PR + branch_name = context.get("create-branch") + git_stats = {"commit_count": 0, "files_changed": 0} + if branch_name: + git_stats = await self._calculate_git_stats( + branch_name, sandbox.working_dir + ) + + await self.state_repository.update_status( + agent_work_order_id, + AgentWorkOrderStatus.COMPLETED, + git_commit_count=git_stats["commit_count"], + git_files_changed=git_stats["files_changed"], + )python/src/agent_work_orders/api/routes.py (1)
83-98: Critical: Enum serialization issue remains unfixed.This is the same issue flagged in the previous review. Lines 85 and 87 store enum instances directly in
metadata, butFileStateRepository.createwill fail withTypeError: Object of type SandboxType is not JSON serializablewhen it attemptsjson.dump. Additionally,datetime.now()on line 89 may cause similar serialization issues.Apply this diff to serialize enums and datetime as primitives:
# Create metadata metadata = { - "sandbox_type": request.sandbox_type, + "sandbox_type": request.sandbox_type.value, "github_issue_number": request.github_issue_number, - "status": AgentWorkOrderStatus.PENDING, + "status": AgentWorkOrderStatus.PENDING.value, "current_phase": None, - "created_at": datetime.now(), - "updated_at": datetime.now(), + "created_at": datetime.now().isoformat(), + "updated_at": datetime.now().isoformat(), "github_pull_request_url": None, "git_commit_count": 0, "git_files_changed": 0, "error_message": None, }You'll also need to update all read paths (lines 147, 149, 151-152, 196, 198-201) to reconstruct enums and datetime objects from these primitive values.
🧹 Nitpick comments (47)
archon-ui-main/src/features/shared/api/apiClient.ts (1)
70-70: Clarify the purpose of the_methodvariable.The underscore prefix typically indicates an intentionally unused variable in TypeScript. However, if the HTTP method information is needed for context, logging, or error handling elsewhere in the function, the naming is misleading. If the variable is truly unused, consider removing it to reduce clutter.
Verify whether
_methodis used outside the visible code. If not, apply this diff:-const _method = options.method?.toUpperCase() || "GET"; const hasBody = options.body !== undefined && options.body !== null;If the method is still needed elsewhere, retain it with a clearer name:
-const _method = options.method?.toUpperCase() || "GET"; +const httpMethod = options.method?.toUpperCase() || "GET";python/src/agent_work_orders/utils/port_allocation.py (7)
39-47: Use a stable hash fallback; Python’s hash() is process-randomized.hash() varies across processes (PYTHONHASHSEED), breaking determinism. Use a stable hash (e.g., blake2b) for the fallback.
Apply within this range:
- try: - # Take first 8 alphanumeric chars and convert from base 36 - id_chars = ''.join(c for c in work_order_id[:8] if c.isalnum()) - slot = int(id_chars, 36) % MAX_CONCURRENT_WORK_ORDERS - except ValueError: - # Fallback to simple hash if conversion fails - slot = hash(work_order_id) % MAX_CONCURRENT_WORK_ORDERS + try: + # Take first 8 alphanumeric chars and convert from base 36 + id_chars = ''.join(c for c in work_order_id[:8] if c.isalnum()) + if not id_chars: + raise ValueError("empty id segment") + slot = int(id_chars, 36) % MAX_CONCURRENT_WORK_ORDERS + except ValueError: + # Stable fallback: 64-bit blake2b digest + digest = hashlib.blake2b(work_order_id.encode("utf-8"), digest_size=8).digest() + slot = int.from_bytes(digest, "big") % MAX_CONCURRENT_WORK_ORDERSOutside this range, add the import:
+import hashlib
63-69: Avoid DNS and unnecessary timeout in port probe.Bind to 127.0.0.1 and drop settimeout; bind is non-blocking and faster/clearer.
- try: - with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: - s.settimeout(1) - s.bind(('localhost', port)) - return True + try: + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + s.bind(("127.0.0.1", port)) + return True
195-195: Use the constant for default attempts.Keep deprecated shim aligned with core config.
-def find_next_available_ports(work_order_id: str, max_attempts: int = 20) -> tuple[int, int]: +def find_next_available_ports(work_order_id: str, max_attempts: int = MAX_CONCURRENT_WORK_ORDERS) -> tuple[int, int]:
149-156: Write .ports.env atomically to avoid partial files.Use a temp file + os.replace; also set encoding/newlines.
- ports_env_path = os.path.join(worktree_path, ".ports.env") + ports_env_path = os.path.join(worktree_path, ".ports.env") + tmp_path = ports_env_path + ".tmp" - with open(ports_env_path, "w") as f: + with open(tmp_path, "w", encoding="utf-8", newline="\n") as f: # Header f.write("# Port range allocated to this work order\n") f.write("# Each work order gets 10 consecutive ports for flexibility\n") f.write("# CLI tools can ignore ports, microservices can use multiple\n\n") @@ - f.write(f"VITE_BACKEND_URL=http://localhost:{available_ports[0]}\n") + f.write(f"VITE_BACKEND_URL=http://localhost:{available_ports[0]}\n") + + os.replace(tmp_path, ports_env_path)Also applies to: 175-175
34-38: Doc example implies fixed mapping; clarify it’s slot‑dependent.Avoid suggesting specific IDs map to fixed ranges.
- Example: - wo-abc123 -> (9000, 9009) # 10 ports - wo-def456 -> (9010, 9019) # 10 ports - wo-xyz789 -> (9020, 9029) # 10 ports + Example: + Slot 0 -> (9000, 9009) # 10 ports + Slot 1 -> (9010, 9019) # 10 ports + ... + Slot 19 -> (9190, 9199) # 10 ports + Note: slot is derived from the work_order_id (base‑36 modulo 20).
162-166: Optional: add aggregate metadata for consumers.Count and CSV help simple parsers without guessing the last index.
# Individual numbered ports for easy access f.write("# Individual ports (use PORT_0, PORT_1, etc.)\n") + f.write(f"AVAILABLE_PORTS_COUNT={len(available_ports)}\n") + if available_ports: + f.write("AVAILABLE_PORTS_CSV=" + ",".join(str(p) for p in available_ports) + "\n") for i, port in enumerate(available_ports): f.write(f"PORT_{i}={port}\n")
101-111: TOCTOU caveat: availability check is advisory only. Consider a reservation.Between probing and actual service bind, another process can seize a port. Consider a lightweight reservation (e.g., lockfile in worktree or an OS‑level exclusive bind keeper that lives until consumers start) to reduce collisions. I can provide a small helper if desired.
python/src/agent_work_orders/utils/worktree_operations.py (1)
76-76: Consider using structured logging parameters instead of f-strings.F-strings in log messages (lines 76, 84, 91, 100, 103) prevent structured logging tools from parsing and indexing fields. For
structlog, pass values as keyword arguments for better observability.Example refactor for line 76:
- logger.info(f"Base repository exists at {base_repo_path}, fetching latest") + logger.info("Base repository exists, fetching latest", base_repo_path=base_repo_path)Also applies to: 84-84, 91-91, 100-100, 103-103
archon-ui-main/src/features/agent-work-orders/components/CreateWorkOrderDialog.tsx (2)
92-95: Preserve canonical command order when toggling.Current toggle appends, changing execution order. Keep selection sorted by ALL_COMMANDS.
- const toggleCommand = (command: WorkflowStep) => { - setSelectedCommands((prev) => (prev.includes(command) ? prev.filter((c) => c !== command) : [...prev, command])); - }; + const toggleCommand = (command: WorkflowStep) => { + setSelectedCommands((prev) => { + const next = prev.includes(command) ? prev.filter((c) => c !== command) : [...prev, command]; + return ALL_COMMANDS.filter((c) => next.includes(c)); // preserve canonical order + }); + };
42-43: Biome/style nits: trailing commas and line length.
- Add trailing commas in arrays/objects (e.g., ALL_COMMANDS and preset arrays).
- Some className lines likely exceed 120 chars; let Biome wrap/split as configured.
Also applies to: 96-107, 118-171, 201-216
python/src/agent_work_orders/models.py (3)
116-125: Harden validation: de‑duplicate commands while preserving order.Prevents accidental repeats like ["execute","execute"].
def validate_commands(cls, v: list[str]) -> list[str]: """Validate that all commands are valid WorkflowStep values""" valid = {step.value for step in WorkflowStep} - for cmd in v: + for cmd in v: if cmd not in valid: raise ValueError(f"Invalid command: {cmd}. Must be one of {valid}") - return v + seen: set[str] = set() + deduped: list[str] = [] + for cmd in v: + if cmd not in seen: + seen.add(cmd) + deduped.append(cmd) + return deduped
6-6: Use timezone‑aware UTC timestamps.Naive datetimes cause ambiguity in logs/serialization.
-from datetime import datetime +from datetime import datetime, timezone @@ - timestamp: datetime = Field(default_factory=datetime.now) + timestamp: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))Also applies to: 222-222
91-92: Non‑negative constraints for git counters.Cheap guardrail.
- git_commit_count: int = 0 - git_files_changed: int = 0 + git_commit_count: int = Field(0, ge=0) + git_files_changed: int = Field(0, ge=0)archon-ui-main/package.json (1)
57-58: Consider updating dependencies to current stable versions.The specified versions are secure (no known advisories for react-hook-form v7.54.2 or @hookform/resolvers v3.10.0), but both are significantly outdated:
- react-hook-form v7.54.2 is 11 versions behind the latest stable v7.65.0 (as of October 10, 2025)
- @hookform/resolvers v3.10.0 is 2 major versions behind v5.2.2 (as of October 23, 2025)
The major version gap for @hookform/resolvers (v3 → v5) may include breaking changes. Consider upgrading to the latest versions to benefit from recent fixes and improvements.
archon-ui-main/src/pages/AgentWorkOrdersPage.tsx (1)
10-12: Add explicit return type annotation.Per coding guidelines requiring strict TypeScript mode, the function component should have an explicit return type annotation.
Apply this diff to add the return type:
-function AgentWorkOrdersPage() { +function AgentWorkOrdersPage(): React.JSX.Element { return <AgentWorkOrdersView />; }As per coding guidelines.
docker-compose.yml (1)
150-198: Ensure depends_on waits for service readiness, not just startup.The
depends_onat line 161-162 starts thearchon-agent-work-orderscontainer as soon asarchon-serverstarts, but doesn't wait for it to be healthy. This can cause connection failures ifarchon-serverhasn't finished initializing.Apply this diff to add the
service_healthycondition:depends_on: - - archon-server + archon-server: + condition: service_healthyThis ensures
archon-agent-work-ordersonly starts afterarchon-serverpasses its healthcheck.python/src/agent_work_orders/main.py (1)
16-20: Consider defining version constant to avoid duplication.The version string "0.1.0" is duplicated in both the FastAPI app initialization (line 19) and the health endpoint (line 41). Define it once for easier maintenance.
Apply this diff to introduce a version constant:
+# Version constant +VERSION = "0.1.0" + app = FastAPI( title="Agent Work Orders API", description="Agent work order system for workflow-based agent execution", - version="0.1.0", + version=VERSION, )@app.get("/health") async def health() -> dict: """Health check endpoint""" return { "status": "healthy", "service": "agent-work-orders", - "version": "0.1.0", + "version": VERSION, }Also applies to: 35-42
python/Dockerfile.agent-work-orders (2)
73-74: Replace Python-based health check with curl for faster startup.Using Python to import urllib adds unnecessary startup overhead to each health check probe. The curl tool is already installed (line 28) and available in the PATH.
-HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \ - CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:${AGENT_WORK_ORDERS_PORT}/health')" +HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \ + CMD curl -f http://localhost:${AGENT_WORK_ORDERS_PORT}/health || exit 1
32-37: Add error handling for GitHub CLI setup to fail-fast on installation errors.The multi-step GitHub CLI setup (GPG key import, sources list update, package install) can partially fail with unclear error messages. Adding
set -eor explicit error checks would catch issues earlier.-RUN apt-get update && apt-get install -y \ +RUN set -e && apt-get update && apt-get install -y \ git \ curl \ ca-certificates \ wget \ gnupg \ && curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | gpg --dearmor -o /usr/share/keyrings/githubcli-archive-keyring.gpg \ && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \ && apt-get update \ && apt-get install -y gh \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*Alternatively, verify and test the GitHub CLI setup independently to ensure it works across different base image versions.
python/tests/agent_work_orders/test_config.py (2)
12-177: Inconsistent module reloading pattern may indicate test reliability issues.The tests show an inconsistent pattern for when
importlib.reload()is used:
- Some environment override tests reload the config module (docker_compose mode, CLAUDE_CLI_PATH, LOG_LEVEL, CORS_ORIGINS, FILE_STATE_DIRECTORY)
- Others don't (local mode, explicit URL overrides, STATE_STORAGE_TYPE)
This inconsistency suggests uncertainty about which configuration values are read at module import time versus runtime. If a value is cached at import time, tests without reload may pass incorrectly (false positives). If reload isn't needed, the extra complexity hurts maintainability.
Consider standardizing the pattern:
- Determine which config values require module reload for proper testing
- Apply reload consistently only where needed
- Add comments explaining when reload is necessary
- Consider using a fixture to handle reload + cleanup for better test isolation
39-177: Consider adding cleanup for module reload to ensure test isolation.Multiple tests use
importlib.reload(config_module)to test environment variable overrides. While pytest handles environment variable cleanup via@patch.dict, the reloaded module state might persist between tests, potentially causing test order dependencies or issues with parallel execution.Consider using a fixture to handle module reload and cleanup:
@pytest.fixture def reload_config(): """Fixture to reload config module and ensure cleanup""" import src.agent_work_orders.config as config_module importlib.reload(config_module) yield config_module # Cleanup: reload again with clean environment to reset module state importlib.reload(config_module)Then use it in tests like:
def test_config_docker_service_discovery(reload_config): from src.agent_work_orders.config import AgentWorkOrdersConfig # ... rest of testarchon-ui-main/src/features/agent-work-orders/components/WorkOrderList.tsx (2)
32-46: Consider server-side search for better scalability.The search filtering is performed client-side after fetching all work orders for the selected status. While this works well for smaller datasets, it may become inefficient with large numbers of work orders.
Consider adding a search parameter to the
useWorkOrdershook to enable server-side filtering if you expect the dataset to grow significantly.
64-70: Consider enhancing error handling.The error state could provide a better user experience by including a retry mechanism or more detailed error information.
Example enhancement:
if (isError) { return ( <div className="text-center py-12"> <p className="text-red-400 mb-4">Failed to load work orders</p> <button onClick={() => refetch()} className="px-4 py-2 bg-blue-600 hover:bg-blue-700 rounded-lg text-white" > Retry </button> </div> ); }Note: This would require exposing a
refetchfunction from theuseWorkOrdershook.python/tests/agent_work_orders/test_command_loader.py (1)
44-62: LGTM!The test thoroughly validates command listing with explicit membership checks for each expected command.
Optionally, lines 58-61 could use set comparison for more concise validation:
- assert len(commands) == 3 - assert "agent_workflow_plan" in commands - assert "agent_workflow_build" in commands - assert "agent_workflow_test" in commands + assert set(commands) == { + "agent_workflow_plan", + "agent_workflow_build", + "agent_workflow_test" + }archon-ui-main/src/features/agent-work-orders/components/StepHistoryTimeline.tsx (2)
46-48: Add error handling for date parsing.The
new Date(step.timestamp)call could fail if the timestamp is invalid or malformed, potentially causing a runtime error.Consider adding a fallback:
- const timeAgo = formatDistanceToNow(new Date(step.timestamp), { - addSuffix: true, - }); + const timeAgo = (() => { + try { + return formatDistanceToNow(new Date(step.timestamp), { addSuffix: true }); + } catch { + return "unknown time"; + } + })();
42-111: Enhance accessibility with semantic HTML and ARIA labels.The timeline currently uses non-semantic
<div>elements and relies solely on visual indicators (colors, icons) for conveying step status, which creates accessibility barriers for screen reader users and those with color vision deficiencies.Consider these improvements:
Use semantic HTML: Replace the outer
<div>with<ol>and each step container with<li>to create a proper ordered list structure.Add ARIA labels to status indicators:
<div role="img" aria-label={step.success ? "Success" : "Failed"} className={...} >
- Add visually hidden text for status context:
<span className="sr-only"> Step {step.success ? "completed successfully" : "failed"} </span>
- Consider adding
aria-current="step"to the current/active step for better navigation context.archon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsx (2)
109-116: Consider improving link accessibility.The repository and PR links use generic text ("repository URL" and "View PR") which could be improved for screen readers.
Consider more descriptive link text:
<a href={workOrder.repository_url} target="_blank" rel="noopener noreferrer" className="text-blue-400 hover:text-blue-300 underline break-all" + aria-label={`Open repository ${repoName} in new tab`} > {workOrder.repository_url} </a>And for the PR link:
<a href={workOrder.github_pull_request_url} target="_blank" rel="noopener noreferrer" className="text-blue-400 hover:text-blue-300 underline break-all" + aria-label="View pull request in GitHub" > View PR </a>Also applies to: 129-136
180-184: Consider extracting the steps completed calculation.The inline filter logic for computing completed steps could be extracted for better readability.
+ const completedSteps = stepHistory.steps.filter((s) => s.success).length; + const totalSteps = stepHistory.steps.length; + <div> <p className="text-sm text-gray-400">Steps Completed</p> <p className="text-white text-lg font-semibold"> - {stepHistory.steps.filter((s) => s.success).length} / {stepHistory.steps.length} + {completedSteps} / {totalSteps} </p> </div>archon-ui-main/src/features/agent-work-orders/services/agentWorkOrdersService.ts (1)
49-52: Fix URL construction to avoid trailing slash inconsistency.The current URL construction results in
${baseUrl}/when no status filter is provided, and${baseUrl}/?status=...when a filter is present. The trailing slash before query parameters is unconventional and creates inconsistency with other methods.Apply this diff to fix the URL construction:
async listWorkOrders(statusFilter?: AgentWorkOrderStatus): Promise<AgentWorkOrder[]> { const baseUrl = getBaseUrl(); - const params = statusFilter ? `?status=${statusFilter}` : ""; - return await callAPIWithETag<AgentWorkOrder[]>(`${baseUrl}/${params}`); + const params = statusFilter ? `?status=${statusFilter}` : ""; + return await callAPIWithETag<AgentWorkOrder[]>(`${baseUrl}${params}`); },python/src/server/api_routes/agent_work_orders_proxy.py (1)
74-74: Consider making the timeout configurable.The 30-second timeout is hardcoded, which reduces flexibility for different deployment environments or service SLAs. While 30 seconds is reasonable for a proxy, making it configurable would improve maintainability.
Consider reading the timeout from configuration:
# In config/service_discovery.py or a new config module AGENT_WORK_ORDERS_TIMEOUT = float(os.getenv("AGENT_WORK_ORDERS_TIMEOUT", "30.0")) # Then in the proxy: async with httpx.AsyncClient(timeout=AGENT_WORK_ORDERS_TIMEOUT) as client:archon-ui-main/src/features/agent-work-orders/hooks/__tests__/useAgentWorkOrderQueries.test.tsx (4)
70-70: Avoidas nevertype casting; use proper typing.The
as nevertype assertion completely bypasses TypeScript's type checking and could hide type mismatches between mock data and the actual service return type.Consider one of these approaches:
Option 1: Use proper typing
- vi.mocked(agentWorkOrdersService.listWorkOrders).mockResolvedValue(mockWorkOrders as never); + vi.mocked(agentWorkOrdersService.listWorkOrders).mockResolvedValue(mockWorkOrders as Awaited<ReturnType<typeof agentWorkOrdersService.listWorkOrders>>);Option 2: If type is too complex, use
as anywith a comment- vi.mocked(agentWorkOrdersService.listWorkOrders).mockResolvedValue(mockWorkOrders as never); + // Type assertion needed for test mock - actual service returns more complete type + vi.mocked(agentWorkOrdersService.listWorkOrders).mockResolvedValue(mockWorkOrders as any);This applies to all similar type assertions in this file (lines 70, 95, 133, 190, 249).
Also applies to: 95-95
47-110: Consider adding error scenario tests.The current tests only cover success paths. Consider adding tests for error scenarios to ensure proper error handling:
- Service throws an error
- Network failure
- Invalid response data
Example error test:
it("should handle errors when fetching work orders", async () => { const { agentWorkOrdersService } = await import("../../services/agentWorkOrdersService"); const { useWorkOrders } = await import("../useAgentWorkOrderQueries"); const mockError = new Error("Failed to fetch work orders"); vi.mocked(agentWorkOrdersService.listWorkOrders).mockRejectedValue(mockError); const wrapper = ({ children }: { children: React.ReactNode }) => ( <QueryClientProvider client={queryClient}>{children}</QueryClientProvider> ); const { result } = renderHook(() => useWorkOrders(), { wrapper }); await waitFor(() => expect(result.current.isError).toBe(true)); expect(result.current.error).toEqual(mockError); });This pattern should be applied to
useWorkOrder,useStepHistory, anduseCreateWorkOrdertests as well.
221-264: Add mutation error scenario test.Mutation hooks require error handling tests to ensure the UI can properly respond to failures.
Add a test for mutation errors:
it("should handle errors when creating work order", async () => { const { agentWorkOrdersService } = await import("../../services/agentWorkOrdersService"); const { useCreateWorkOrder } = await import("../useAgentWorkOrderQueries"); const mockError = new Error("Failed to create work order"); vi.mocked(agentWorkOrdersService.createWorkOrder).mockRejectedValue(mockError); const wrapper = ({ children }: { children: React.ReactNode }) => ( <QueryClientProvider client={queryClient}>{children}</QueryClientProvider> ); const { result } = renderHook(() => useCreateWorkOrder(), { wrapper }); const mockRequest = { repository_url: "https://github.com/test/repo", sandbox_type: "git_branch" as const, user_request: "Test", }; result.current.mutate(mockRequest); await waitFor(() => expect(result.current.isError).toBe(true)); expect(result.current.error).toEqual(mockError); });
72-74: Optional: Extract wrapper to reduce duplication.The wrapper component is duplicated across all test cases. Consider extracting it to reduce repetition:
describe("useWorkOrders", () => { let queryClient: QueryClient; let wrapper: React.ComponentType<{ children: React.ReactNode }>; beforeEach(() => { queryClient = new QueryClient({ defaultOptions: { queries: { retry: false }, }, }); wrapper = ({ children }: { children: React.ReactNode }) => ( <QueryClientProvider client={queryClient}>{children}</QueryClientProvider> ); vi.clearAllMocks(); }); it("should fetch work orders without filter", async () => { // ... test implementation uses wrapper directly ... const { result } = renderHook(() => useWorkOrders(), { wrapper }); }); });This pattern could be applied to all describe blocks (lines 72-74, 97-99, 135-137, 151-153, 192-194, 208-210, 251-253).
python/tests/agent_work_orders/test_sse_streams.py (2)
52-52: Move datetime import to module level for consistency.Importing
datetimeinside the test function is inconsistent with the module-level imports. Consider moving this to the top of the file alongside the other imports.Apply this diff to move the import to the module level:
import asyncio import json -from datetime import UTC +from datetime import UTC, datetimeThen remove the import from inside the function:
def test_get_current_timestamp(): """Test timestamp generation in ISO format""" timestamp = get_current_timestamp() # Should be valid ISO format assert isinstance(timestamp, str) assert "T" in timestamp # Should be recent (within last second) - from datetime import datetime - parsed = datetime.fromisoformat(timestamp.replace("Z", "+00:00"))
223-256: Consider reducing hardcoded sleep to minimize test duration.The
0.6second sleep at line 242 might introduce flakiness in CI environments and adds to test execution time. Consider reducing this to a smaller value (e.g.,0.1or0.2seconds) if the streaming implementation polls frequently enough.Apply this diff to reduce the sleep duration:
async def add_new_log(): # Wait a bit then add new log - await asyncio.sleep(0.6) + await asyncio.sleep(0.1) buffer.add_log("wo-123", "info", "new_event")Makefile (1)
85-86: Use consistent host variable pattern for service URLs.Lines 85–86 use hardcoded
localhostforARCHON_SERVER_URLandARCHON_MCP_URL, whereas elsewhere in the file (lines 96, 101, 113–114) the pattern is${HOST:-localhost}. Whilelocalhostis fine for pure-local dev, consistency matters for maintainability and future remote scenarios.Suggested fix:
- export ARCHON_SERVER_URL=http://localhost:$${ARCHON_SERVER_PORT:-8181}; \ - export ARCHON_MCP_URL=http://localhost:$${ARCHON_MCP_PORT:-8051}; \ + export ARCHON_SERVER_URL=http://$${HOST:-localhost}:$${ARCHON_SERVER_PORT:-8181}; \ + export ARCHON_MCP_URL=http://$${HOST:-localhost}:$${ARCHON_MCP_PORT:-8051}; \archon-ui-main/src/features/agent-work-orders/components/WorkOrderCard.tsx (1)
48-60: Reformat long className to meet 120-character limit.Line 59's className prop exceeds the 120-character line length specified in the Biome formatting guidelines for the features directory.
As per coding guidelines.
Apply this diff to break the className into multiple lines:
role={onClick ? "button" : undefined} tabIndex={onClick ? 0 : undefined} - className="bg-gray-800 bg-opacity-50 backdrop-blur-sm border border-gray-700 rounded-lg p-4 hover:border-blue-500 transition-all cursor-pointer" + className={ + "bg-gray-800 bg-opacity-50 backdrop-blur-sm border border-gray-700 rounded-lg p-4 " + + "hover:border-blue-500 transition-all cursor-pointer" + } >python/tests/agent_work_orders/test_workflow_operations.py (5)
22-55: Assert the async executor was awaited in success-path tests.Adds a cheap correctness guard; prevents silent no-op mocks.
@@ def test_run_create_branch_step_success(): mock_executor.build_command.assert_called_once() + mock_executor.execute_async.assert_awaited_once() @@ def test_run_planning_step_success(): mock_command_loader.load_command.assert_called_once_with("planning") + mock_executor.execute_async.assert_awaited_once() @@ def test_run_execute_step_success(): mock_command_loader.load_command.assert_called_once_with("execute") + mock_executor.execute_async.assert_awaited_once() @@ def test_run_commit_step_success(): assert result.agent_name == COMMITTER mock_command_loader.load_command.assert_called_once_with("commit") + mock_executor.execute_async.assert_awaited_once() @@ def test_run_create_pr_step_success(): assert "github.com" in result.output mock_command_loader.load_command.assert_called_once_with("create-pr") + mock_executor.execute_async.assert_awaited_once() @@ def test_run_review_step_success(): assert result.agent_name == REVIEWER mock_command_loader.load_command.assert_called_once_with("prp-review") + mock_executor.execute_async.assert_awaited_once()Also applies to: 89-123, 160-191, 213-242, 245-278, 301-330
58-86: Strengthen failure-path assertions (step/agent) and guard loader calls when preconditions fail.Makes failures self-describing and ensures early-return logic.
@@ def test_run_create_branch_step_failure(): assert result.success is False assert result.error_message == "Branch creation failed" assert result.step == WorkflowStep.CREATE_BRANCH + assert result.agent_name == BRANCH_CREATOR @@ def test_run_execute_step_missing_plan_file(): assert result.success is False assert "No plan file" in result.error_message + assert result.step == WorkflowStep.EXECUTE + assert result.agent_name == IMPLEMENTOR + mock_command_loader.load_command.assert_not_called() @@ def test_run_create_pr_step_missing_branch(): assert result.success is False assert "No branch name" in result.error_message + assert result.step == WorkflowStep.CREATE_PR + assert result.agent_name == PR_CREATOR + mock_command_loader.load_command.assert_not_called() @@ def test_run_review_step_missing_plan(): assert result.success is False assert "No plan file" in result.error_message + assert result.step == WorkflowStep.REVIEW + assert result.agent_name == REVIEWER + mock_command_loader.load_command.assert_not_called()Also applies to: 192-210, 281-298, 332-350
70-72: Use consistent loader return shape in failure test.Avoids accidental attribute errors if the runner reads command.file_path before executing.
- mock_command_loader = MagicMock() - mock_command_loader.load_command = MagicMock(return_value=MagicMock()) + mock_command_loader = MagicMock() + mock_command_loader.load_command = MagicMock( + return_value=MagicMock(file_path="create-branch.md") + )
352-394: Make the context-passing test assert call ordering and counts.Validates stitching flow rather than just presence of a key.
@@ async def test_context_passing_between_steps(): - assert branch_result.success is True - assert planning_result.success is True - assert "create-branch" in context + assert branch_result.success is True + assert planning_result.success is True + assert "create-branch" in context + # Validate call order and counts + assert len(mock_executor.build_command.call_args_list) == 2 + # load_command called for create-branch then planning + assert mock_command_loader.load_command.call_args_list[0][0][0] == "create-branch" + assert mock_command_loader.load_command.call_args_list[1][0][0] == "planning"
1-2: Optional: DRY setup with fixtures/parametrize.Reduce duplication and improve readability.
Example fixtures:
import pytest from unittest.mock import AsyncMock, MagicMock @pytest.fixture def success_executor(): exec = MagicMock() exec.build_command = MagicMock(return_value=("cli command", "prompt")) exec.execute_async = AsyncMock() return exec @pytest.fixture def loader(): ld = MagicMock() ld.load_command = MagicMock(return_value=MagicMock(file_path="cmd.md")) return ldParametrize similar success cases over (command_name, step_enum, agent_const).
python/src/agent_work_orders/utils/log_buffer.py (1)
116-116: String-based timestamp comparison is fragile.While lexicographic comparison works for ISO format timestamps, this approach is fragile and depends on consistent formatting. Consider parsing timestamps to datetime objects for robust comparison, or document the assumption that all timestamps are in ISO format.
Example for more robust comparison:
if since: - logs = [log for log in logs if log.get("timestamp", "") > since] + from datetime import datetime + since_dt = datetime.fromisoformat(since.replace('Z', '+00:00')) + logs = [log for log in logs + if datetime.fromisoformat(log.get("timestamp", "").replace('Z', '+00:00')) > since_dt]Note: The current implementation is acceptable if timestamp format consistency is guaranteed by
add_log.python/src/agent_work_orders/utils/structured_logger.py (1)
164-177: Consider resolving the type ignore with explicit cast.The
type: ignore[no-any-return]comment suggests structlog's typing is incomplete. While acceptable, you could eliminate it with an explicit cast for stronger type safety.+from typing import cast + def get_logger(name: str | None = None) -> structlog.stdlib.BoundLogger: """Get a structured logger instance. Args: name: Optional name for the logger Returns: Configured structlog logger Examples: logger = get_logger(__name__) logger.info("operation_completed", duration_ms=123) """ - return structlog.get_logger(name) # type: ignore[no-any-return] + return cast(structlog.stdlib.BoundLogger, structlog.get_logger(name))python/src/agent_work_orders/api/routes.py (1)
290-343: Consider adding explicit error handling for consistency.While
log_buffer.get_logslikely won't raise exceptions, adding an explicit try/except with structured logging would make error handling consistent with other endpoints and help debug issues if the log buffer ever fails.+ try: # Get logs from buffer log_entries = log_buffer.get_logs( work_order_id=agent_work_order_id, level=level, step=step, limit=limit, offset=offset, ) return { "agent_work_order_id": agent_work_order_id, "log_entries": log_entries, "total": log_buffer.get_log_count(agent_work_order_id), "limit": limit, "offset": offset, } + except Exception as e: + logger.error( + "agent_logs_get_failed", + agent_work_order_id=agent_work_order_id, + error=str(e), + exc_info=True, + ) + raise HTTPException(status_code=500, detail=f"Failed to get logs: {e}") from e
| const ALL_COMMANDS: WorkflowStep[] = ["create-branch", "planning", "execute", "commit", "create-pr"]; | ||
|
|
There was a problem hiding this comment.
Add missing "prp-review" and align presets (also add trailing commas per Biome).
"prp-review" is one of the 6 commands in this PR but isn’t in ALL_COMMANDS, so it can’t be selected. Also align presets (exclude both "create-pr" and "prp-review" in "no-pr").
-const ALL_COMMANDS: WorkflowStep[] = ["create-branch", "planning", "execute", "commit", "create-pr"];
+const ALL_COMMANDS: WorkflowStep[] = ["create-branch", "planning", "execute", "commit", "create-pr", "prp-review",];
...
- case "planning":
- setSelectedCommands(["create-branch", "planning"]);
+ case "planning":
+ setSelectedCommands(["create-branch", "planning",]);
break;
case "no-pr":
- setSelectedCommands(["create-branch", "planning", "execute", "commit"]);
+ setSelectedCommands(["create-branch", "planning", "execute", "commit",]);
break;Based on PR objectives.
Also applies to: 96-107
🤖 Prompt for AI Agents
In
archon-ui-main/src/features/agent-work-orders/components/CreateWorkOrderDialog.tsx
around lines 42-43 (and also apply same changes to lines 96-107), ALL_COMMANDS
is missing the "prp-review" entry and the presets are misaligned: add
"prp-review" into the ALL_COMMANDS list, update the "no-pr" preset to exclude
both "create-pr" and "prp-review", and ensure list items end with trailing
commas to satisfy Biome formatting.
| }; | ||
|
|
||
| return ( | ||
| <Dialog open={open} onOpenChange={handleClose}> |
There was a problem hiding this comment.
Only close on onOpenChange(false).
Passing handleClose triggers on any open change. Gate on the boolean to avoid unintended resets.
- <Dialog open={open} onOpenChange={handleClose}>
+ <Dialog open={open} onOpenChange={(isOpen) => { if (!isOpen) handleClose(); }}>🤖 Prompt for AI Agents
In
archon-ui-main/src/features/agent-work-orders/components/CreateWorkOrderDialog.tsx
around line 111, the dialog currently calls handleClose for any onOpenChange
event; update the handler so it only invokes handleClose when the boolean open
parameter is false (i.e., if (!open) or if (open === false) then call
handleClose()), otherwise do nothing — ensure the handler signature accepts the
boolean open value and remove any unconditional calls to handleClose so state
resets only happen on close.
| className={`w-8 h-8 rounded-full flex items-center justify-center border-2 ${ | ||
| step.success ? "bg-green-500 border-green-400" : "bg-red-500 border-red-400" | ||
| } ${isCurrent ? "animate-pulse" : ""}`} |
There was a problem hiding this comment.
Avoid dynamically constructing Tailwind classes.
The component constructs Tailwind classes dynamically using template literals (lines 54-56, 65, 78-82), which violates the coding guideline. Dynamically constructed classes can cause issues with Tailwind's JIT compiler and purging.
As per coding guidelines.
Use conditional full class strings instead. Apply this pattern:
- className={`w-8 h-8 rounded-full flex items-center justify-center border-2 ${
- step.success ? "bg-green-500 border-green-400" : "bg-red-500 border-red-400"
- } ${isCurrent ? "animate-pulse" : ""}`}
+ className={
+ step.success
+ ? isCurrent
+ ? "w-8 h-8 rounded-full flex items-center justify-center border-2 bg-green-500 border-green-400 animate-pulse"
+ : "w-8 h-8 rounded-full flex items-center justify-center border-2 bg-green-500 border-green-400"
+ : isCurrent
+ ? "w-8 h-8 rounded-full flex items-center justify-center border-2 bg-red-500 border-red-400 animate-pulse"
+ : "w-8 h-8 rounded-full flex items-center justify-center border-2 bg-red-500 border-red-400"
+ }Alternatively, consider using a helper library like clsx or classnames with full class strings:
import { clsx } from "clsx";
className={clsx(
"w-8 h-8 rounded-full flex items-center justify-center border-2",
step.success ? "bg-green-500 border-green-400" : "bg-red-500 border-red-400",
isCurrent && "animate-pulse"
)}Apply similar fixes to lines 65 and 78-82.
Also applies to: 65-65, 78-82
| export function WorkOrderCard({ workOrder, onClick }: WorkOrderCardProps) { | ||
| const statusStyle = STATUS_STYLES[workOrder.status]; | ||
| const repoName = workOrder.repository_url.split("/").slice(-2).join("/"); | ||
| const timeAgo = formatDistanceToNow(new Date(workOrder.created_at), { | ||
| addSuffix: true, | ||
| }); | ||
|
|
There was a problem hiding this comment.
Add fallback for repository name extraction.
The repository name extraction using .split("/").slice(-2).join("/") assumes a well-formed URL. If the URL has fewer than 2 path segments or is malformed, this could produce unexpected results.
Consider adding a fallback or validation:
- const repoName = workOrder.repository_url.split("/").slice(-2).join("/");
+ const urlParts = workOrder.repository_url.split("/").filter(Boolean);
+ const repoName = urlParts.length >= 2
+ ? urlParts.slice(-2).join("/")
+ : workOrder.repository_url;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function WorkOrderCard({ workOrder, onClick }: WorkOrderCardProps) { | |
| const statusStyle = STATUS_STYLES[workOrder.status]; | |
| const repoName = workOrder.repository_url.split("/").slice(-2).join("/"); | |
| const timeAgo = formatDistanceToNow(new Date(workOrder.created_at), { | |
| addSuffix: true, | |
| }); | |
| const urlParts = workOrder.repository_url.split("/").filter(Boolean); | |
| const repoName = urlParts.length >= 2 | |
| ? urlParts.slice(-2).join("/") | |
| : workOrder.repository_url; |
| const filteredWorkOrders = useMemo(() => { | ||
| if (!workOrders) return []; | ||
|
|
||
| return workOrders.filter((wo) => { | ||
| const matchesSearch = | ||
| searchQuery === "" || | ||
| wo.repository_url.toLowerCase().includes(searchQuery.toLowerCase()) || | ||
| wo.agent_work_order_id.toLowerCase().includes(searchQuery.toLowerCase()); | ||
|
|
||
| return matchesSearch; | ||
| }); | ||
| }, [workOrders, searchQuery]); |
There was a problem hiding this comment.
Add null-safety checks for search filter fields.
The search filter calls toLowerCase() on repository_url and agent_work_order_id without null-safety checks. If these fields can be null or undefined, this will cause a runtime error.
Apply this diff to add defensive null checks:
const filteredWorkOrders = useMemo(() => {
if (!workOrders) return [];
return workOrders.filter((wo) => {
const matchesSearch =
searchQuery === "" ||
- wo.repository_url.toLowerCase().includes(searchQuery.toLowerCase()) ||
- wo.agent_work_order_id.toLowerCase().includes(searchQuery.toLowerCase());
+ wo.repository_url?.toLowerCase().includes(searchQuery.toLowerCase()) ||
+ wo.agent_work_order_id?.toLowerCase().includes(searchQuery.toLowerCase());
return matchesSearch;
});
}, [workOrders, searchQuery]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const filteredWorkOrders = useMemo(() => { | |
| if (!workOrders) return []; | |
| return workOrders.filter((wo) => { | |
| const matchesSearch = | |
| searchQuery === "" || | |
| wo.repository_url.toLowerCase().includes(searchQuery.toLowerCase()) || | |
| wo.agent_work_order_id.toLowerCase().includes(searchQuery.toLowerCase()); | |
| return matchesSearch; | |
| }); | |
| }, [workOrders, searchQuery]); | |
| const filteredWorkOrders = useMemo(() => { | |
| if (!workOrders) return []; | |
| return workOrders.filter((wo) => { | |
| const matchesSearch = | |
| searchQuery === "" || | |
| wo.repository_url?.toLowerCase()?.includes(searchQuery.toLowerCase()) || | |
| wo.agent_work_order_id?.toLowerCase()?.includes(searchQuery.toLowerCase()); | |
| return matchesSearch; | |
| }); | |
| }, [workOrders, searchQuery]); |
| const filteredWorkOrders = useMemo(() => { | |
| if (!workOrders) return []; | |
| return workOrders.filter((wo) => { | |
| const matchesSearch = | |
| searchQuery === "" || | |
| wo.repository_url.toLowerCase().includes(searchQuery.toLowerCase()) || | |
| wo.agent_work_order_id.toLowerCase().includes(searchQuery.toLowerCase()); | |
| return matchesSearch; | |
| }); | |
| }, [workOrders, searchQuery]); | |
| const filteredWorkOrders = useMemo(() => { | |
| if (!workOrders) return []; | |
| return workOrders.filter((wo) => { | |
| const matchesSearch = | |
| searchQuery === "" || | |
| (wo.repository_url ?? "").toLowerCase().includes(searchQuery.toLowerCase()) || | |
| (wo.agent_work_order_id ?? "").toLowerCase().includes(searchQuery.toLowerCase()); | |
| return matchesSearch; | |
| }); | |
| }, [workOrders, searchQuery]); |
🤖 Prompt for AI Agents
In archon-ui-main/src/features/agent-work-orders/components/WorkOrderList.tsx
around lines 35 to 46, the search filter calls toLowerCase() on repository_url
and agent_work_order_id without null-safety; modify the filter to defensively
handle null/undefined by coercing values to empty string (or checking existence)
before calling toLowerCase, e.g. derive safeRepository = (item.repository_url ||
"").toLowerCase() and safeId = (item.agent_work_order_id || "").toLowerCase(),
then use those in comparisons so toLowerCase is never invoked on null/undefined.
| # Check all ranges are valid | ||
| for start, end in ranges: | ||
| assert end - start + 1 == 10 | ||
| assert PORT_BASE <= start < PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE) |
There was a problem hiding this comment.
Fix the upper bound check for start port.
Same boundary issue as line 29: the upper bound allows start values that would result in end exceeding the allocated port space.
Apply this diff:
- assert PORT_BASE <= start < PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE)
+ assert PORT_BASE <= start <= PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS - 1) * PORT_RANGE_SIZECommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_port_allocation.py around line 52, the
upper-bound check for the `start` port is incorrect and allows `start` values
that make `end = start + size - 1` exceed the permitted port range; update the
conditional to require start <= (max_port - size + 1) (or start < = max_start)
so that end never surpasses the allocated port space, mirroring the fix applied
at line 29.
| # It's theoretically possible all hash to same slot, but unlikely with very different IDs | ||
| # The important thing is the function works, not that it always distributes perfectly | ||
| assert len(ranges) == 5 # We got 5 results |
There was a problem hiding this comment.
Test doesn't verify different slot assignments.
The test name claims to verify "that the hash function can produce different slot assignments," but it only checks that 5 results were returned, which is always true. This doesn't confirm that any different slots were actually assigned.
Consider verifying that at least some ranges differ:
- # It's theoretically possible all hash to same slot, but unlikely with very different IDs
- # The important thing is the function works, not that it always distributes perfectly
- assert len(ranges) == 5 # We got 5 results
+ # Check that we got 5 results and at least some are in different slots
+ assert len(ranges) == 5
+ unique_ranges = set(ranges)
+ # With very different IDs, should get multiple unique ranges
+ assert len(unique_ranges) > 1, "Hash function should distribute IDs across different slots"Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_port_allocation.py around lines 54-56,
the test currently only asserts that 5 results were returned but does not verify
that the hash function produced different slot assignments; update the test to
assert that the returned assignments are not all identical by checking that
there is more than one unique slot (e.g., assert len(set(slot for slot in
results_slots)) > 1) or by asserting that at least one returned range differs
from another so the test guarantees different slot assignments.
| assert PORT_BASE <= start < PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE) | ||
| assert PORT_BASE < end <= PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE) |
There was a problem hiding this comment.
Fix boundary checks for port ranges.
Both lines have off-by-one errors in the upper bounds:
- Line 294: Same issue as lines 29 and 52 - allows start values too high
- Line 295: Uses
<=but should use<because the maximum valid end port isPORT_BASE + MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE - 1, notPORT_BASE + MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE
Apply this diff:
- assert PORT_BASE <= start < PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE)
- assert PORT_BASE < end <= PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE)
+ assert PORT_BASE <= start <= PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS - 1) * PORT_RANGE_SIZE
+ assert PORT_BASE + PORT_RANGE_SIZE - 1 <= end < PORT_BASE + (MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_port_allocation.py around lines 294 to
295, the upper-bound checks for port ranges are off-by-one: adjust the start and
end boundary comparisons so they do not allow values equal to PORT_BASE +
MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE. Change the start check to enforce
< (or <= with a -1) against PORT_BASE + MAX_CONCURRENT_WORK_ORDERS *
PORT_RANGE_SIZE - 1 (matching lines 29 and 52 fix), and change the end check
from <= to < so the maximum valid end port is PORT_BASE +
MAX_CONCURRENT_WORK_ORDERS * PORT_RANGE_SIZE - 1.
| @pytest.mark.unit | ||
| @patch.dict("os.environ", {"SERVICE_DISCOVERY_MODE": "local"}) | ||
| def test_startup_logs_local_mode(caplog): | ||
| """Test startup logs service discovery mode""" | ||
| from src.agent_work_orders.config import config | ||
|
|
||
| # Verify config is set to local mode | ||
| assert config.SERVICE_DISCOVERY_MODE == "local" | ||
|
|
There was a problem hiding this comment.
Test doesn't verify logging despite its name.
The test name test_startup_logs_local_mode suggests it validates logging output, but it only checks the configuration value. The caplog parameter is declared but never used.
Consider either:
- Renaming to
test_config_local_modeto match actual behavior, or - Adding actual log verification using the
caplogfixture
-def test_startup_logs_local_mode(caplog):
- """Test startup logs service discovery mode"""
+def test_config_local_mode():
+ """Test service discovery mode configuration"""
from src.agent_work_orders.config import config
# Verify config is set to local mode
assert config.SERVICE_DISCOVERY_MODE == "local"Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_server.py around lines 179 to 187, the
test named test_startup_logs_local_mode declares the caplog fixture but never
inspects logs — it only asserts a config value; either rename the test to
test_config_local_mode to reflect its current behavior, or modify the test to
actually verify logging by setting caplog.set_level(logging.INFO) (or the
appropriate level) before startup, invoking the startup code, and asserting the
expected log message(s) appear in caplog.records or caplog.text; ensure to
import logging if needed and keep assertions specific to the expected log
content.
| @pytest.mark.unit | ||
| @patch.dict("os.environ", {"SERVICE_DISCOVERY_MODE": "docker_compose"}) | ||
| def test_startup_logs_docker_mode(caplog): | ||
| """Test startup logs docker_compose mode""" | ||
| import importlib | ||
|
|
||
| import src.agent_work_orders.config as config_module | ||
| importlib.reload(config_module) | ||
| from src.agent_work_orders.config import AgentWorkOrdersConfig | ||
|
|
||
| # Create fresh config instance with env var | ||
| config = AgentWorkOrdersConfig() | ||
|
|
||
| # Verify config is set to docker_compose mode | ||
| assert config.SERVICE_DISCOVERY_MODE == "docker_compose" |
There was a problem hiding this comment.
Test has multiple issues: misleading name, unused parameter, and problematic module reload.
This test has several concerns:
- The test name suggests log verification but only checks configuration
caplogparameter is unused- The
importlib.reloadpattern can cause side effects and affect test isolation
The module reload at lines 193-196 is particularly problematic as it mutates global module state during test execution, which can cause unpredictable behavior in other tests.
Consider refactoring to:
-@patch.dict("os.environ", {"SERVICE_DISCOVERY_MODE": "docker_compose"})
-def test_startup_logs_docker_mode(caplog):
- """Test startup logs docker_compose mode"""
- import importlib
-
- import src.agent_work_orders.config as config_module
- importlib.reload(config_module)
- from src.agent_work_orders.config import AgentWorkOrdersConfig
-
- # Create fresh config instance with env var
- config = AgentWorkOrdersConfig()
-
- # Verify config is set to docker_compose mode
- assert config.SERVICE_DISCOVERY_MODE == "docker_compose"
+@patch.dict("os.environ", {"SERVICE_DISCOVERY_MODE": "docker_compose"}, clear=False)
+def test_config_docker_compose_mode():
+ """Test service discovery mode configuration with docker_compose"""
+ from src.agent_work_orders.config import AgentWorkOrdersConfig
+
+ # Create fresh config instance that reads from patched env
+ config = AgentWorkOrdersConfig()
+
+ # Verify config reads the environment variable correctly
+ assert config.SERVICE_DISCOVERY_MODE == "docker_compose"If you need to test startup logging specifically, create a separate test that actually verifies log output using caplog.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_server.py around lines 189 to 203, the
test currently has a misleading name, an unused caplog parameter, and uses
importlib.reload which mutates global module state; rename the test to reflect
that it only verifies configuration, remove the unused caplog parameter, and
delete the importlib.reload calls — instead import the module normally and
assert the configuration values directly; if you need to verify startup logging,
add a separate test that actually captures logs with caplog by invoking the
explicit startup function (or running the component in a controlled/isolated
way) rather than reloading the module.
| work_order = AgentWorkOrder( | ||
| agent_work_order_id=state.agent_work_order_id, | ||
| repository_url=state.repository_url, | ||
| sandbox_identifier=state.sandbox_identifier, | ||
| git_branch_name=state.git_branch_name, | ||
| agent_session_id=state.agent_session_id, | ||
| sandbox_type=metadata["sandbox_type"], | ||
| github_issue_number=metadata["github_issue_number"], | ||
| status=metadata["status"], | ||
| current_phase=metadata["current_phase"], | ||
| created_at=metadata["created_at"], | ||
| updated_at=metadata["updated_at"], | ||
| github_pull_request_url=metadata.get("github_pull_request_url"), | ||
| git_commit_count=metadata.get("git_commit_count", 0), | ||
| git_files_changed=metadata.get("git_files_changed", 0), | ||
| error_message=metadata.get("error_message"), | ||
| ) |
There was a problem hiding this comment.
Reconstruct enums and datetime from primitives when reading metadata.
Once lines 83-95 are fixed to store enum .value primitives, these lines will fail because they expect enum instances but will receive strings. You must reconstruct the enums and datetime objects from the stored primitive values.
Apply this diff to reconstruct enums and datetime objects:
# Build full model
work_order = AgentWorkOrder(
agent_work_order_id=state.agent_work_order_id,
repository_url=state.repository_url,
sandbox_identifier=state.sandbox_identifier,
git_branch_name=state.git_branch_name,
agent_session_id=state.agent_session_id,
- sandbox_type=metadata["sandbox_type"],
+ sandbox_type=SandboxType(metadata["sandbox_type"]),
github_issue_number=metadata["github_issue_number"],
- status=metadata["status"],
- current_phase=metadata["current_phase"],
- created_at=metadata["created_at"],
- updated_at=metadata["updated_at"],
+ status=AgentWorkOrderStatus(metadata["status"]),
+ current_phase=AgentWorkflowPhase(metadata["current_phase"]) if metadata["current_phase"] else None,
+ created_at=datetime.fromisoformat(metadata["created_at"]),
+ updated_at=datetime.fromisoformat(metadata["updated_at"]),
github_pull_request_url=metadata.get("github_pull_request_url"),
git_commit_count=metadata.get("git_commit_count", 0),
git_files_changed=metadata.get("git_files_changed", 0),
error_message=metadata.get("error_message"),
)Note: You'll need to add SandboxType to the imports at the top of the file.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/api/routes.py around lines 141 to 157, the code
currently expects enum instances and datetimes but will receive primitive
strings/values from metadata; update the deserialization to reconstruct enums
and datetimes from those primitives (e.g., replace direct usage with
SandboxType(stored_value) or the appropriate EnumClass(stored_value) for each
enum field, and parse datetime strings with datetime.fromisoformat(stored_iso)
or datetime.fromtimestamp(...) if timestamps were stored), and add SandboxType
to the file imports at the top; ensure null/absent checks remain (only
reconstruct when a primitive is present) to avoid raising on None.
| work_order = AgentWorkOrder( | ||
| agent_work_order_id=state.agent_work_order_id, | ||
| repository_url=state.repository_url, | ||
| sandbox_identifier=state.sandbox_identifier, | ||
| git_branch_name=state.git_branch_name, | ||
| agent_session_id=state.agent_session_id, | ||
| sandbox_type=metadata["sandbox_type"], | ||
| github_issue_number=metadata["github_issue_number"], | ||
| status=metadata["status"], | ||
| current_phase=metadata["current_phase"], | ||
| created_at=metadata["created_at"], | ||
| updated_at=metadata["updated_at"], | ||
| github_pull_request_url=metadata.get("github_pull_request_url"), | ||
| git_commit_count=metadata.get("git_commit_count", 0), | ||
| git_files_changed=metadata.get("git_files_changed", 0), | ||
| error_message=metadata.get("error_message"), | ||
| ) | ||
| work_orders.append(work_order) |
There was a problem hiding this comment.
Reconstruct enums and datetime in list endpoint.
Same issue as the get endpoint: once metadata stores primitives, this code must reconstruct enums and datetime objects from stored values.
Apply the same pattern as the get endpoint:
work_order = AgentWorkOrder(
agent_work_order_id=state.agent_work_order_id,
repository_url=state.repository_url,
sandbox_identifier=state.sandbox_identifier,
git_branch_name=state.git_branch_name,
agent_session_id=state.agent_session_id,
- sandbox_type=metadata["sandbox_type"],
+ sandbox_type=SandboxType(metadata["sandbox_type"]),
github_issue_number=metadata["github_issue_number"],
- status=metadata["status"],
- current_phase=metadata["current_phase"],
- created_at=metadata["created_at"],
- updated_at=metadata["updated_at"],
+ status=AgentWorkOrderStatus(metadata["status"]),
+ current_phase=AgentWorkflowPhase(metadata["current_phase"]) if metadata["current_phase"] else None,
+ created_at=datetime.fromisoformat(metadata["created_at"]),
+ updated_at=datetime.fromisoformat(metadata["updated_at"]),
github_pull_request_url=metadata.get("github_pull_request_url"),
git_commit_count=metadata.get("git_commit_count", 0),
git_files_changed=metadata.get("git_files_changed", 0),
error_message=metadata.get("error_message"),
)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/api/routes.py around lines 190 to 207, the list
endpoint currently returns metadata with primitives from storage but does not
reconstruct enum and datetime types; update the list endpoint to mirror the get
endpoint by mapping stored primitive values back to their Enum instances (e.g.,
MyEnum(value)) and parsing ISO datetime strings into datetime objects (e.g.,
datetime.fromisoformat or dateutil.parser.parse) for each item before returning.
Locate where stored metadata dicts are collected, apply the same reconstruction
logic used in the get endpoint for enum fields and datetime fields (including
nested fields if present), and ensure the response contains the restored Enum
and datetime objects rather than raw strings/ints.
| if not state.git_branch_name: | ||
| # No branch yet, return minimal snapshot | ||
| current_phase = metadata.get("current_phase") | ||
| return GitProgressSnapshot( | ||
| agent_work_order_id=agent_work_order_id, | ||
| current_phase=current_phase if current_phase else AgentWorkflowPhase.PLANNING, | ||
| git_commit_count=0, | ||
| git_files_changed=0, | ||
| latest_commit_message=None, | ||
| git_branch_name=None, | ||
| ) | ||
|
|
||
| # TODO Phase 2+: Get actual progress from sandbox | ||
| # For MVP, return metadata values | ||
| current_phase = metadata.get("current_phase") | ||
| return GitProgressSnapshot( | ||
| agent_work_order_id=agent_work_order_id, | ||
| current_phase=current_phase if current_phase else AgentWorkflowPhase.PLANNING, | ||
| git_commit_count=metadata.get("git_commit_count", 0), | ||
| git_files_changed=metadata.get("git_files_changed", 0), | ||
| latest_commit_message=None, | ||
| git_branch_name=state.git_branch_name, | ||
| ) |
There was a problem hiding this comment.
Handle current_phase enum reconstruction.
If current_phase is stored as a string primitive (after fixing serialization), lines 259 and 271 will fail because GitProgressSnapshot expects an AgentWorkflowPhase enum instance, not a string.
Apply this diff to safely reconstruct the enum:
if not state.git_branch_name:
# No branch yet, return minimal snapshot
current_phase = metadata.get("current_phase")
+ phase = AgentWorkflowPhase(current_phase) if current_phase else AgentWorkflowPhase.PLANNING
return GitProgressSnapshot(
agent_work_order_id=agent_work_order_id,
- current_phase=current_phase if current_phase else AgentWorkflowPhase.PLANNING,
+ current_phase=phase,
git_commit_count=0,
git_files_changed=0,
latest_commit_message=None,
git_branch_name=None,
)
# TODO Phase 2+: Get actual progress from sandbox
# For MVP, return metadata values
current_phase = metadata.get("current_phase")
+phase = AgentWorkflowPhase(current_phase) if current_phase else AgentWorkflowPhase.PLANNING
return GitProgressSnapshot(
agent_work_order_id=agent_work_order_id,
- current_phase=current_phase if current_phase else AgentWorkflowPhase.PLANNING,
+ current_phase=phase,
git_commit_count=metadata.get("git_commit_count", 0),
git_files_changed=metadata.get("git_files_changed", 0),
latest_commit_message=None,
git_branch_name=state.git_branch_name,
)Committable suggestion skipped: line range outside the PR's diff.
| try: | ||
| while True: | ||
| # Poll for new logs | ||
| new_logs = log_buffer.get_logs_since( | ||
| work_order_id=work_order_id, | ||
| since_timestamp=last_timestamp, | ||
| level=level_filter, | ||
| step=step_filter, | ||
| ) | ||
|
|
||
| # Yield new logs | ||
| for log_entry in new_logs: | ||
| yield format_log_event(log_entry) | ||
| last_timestamp = log_entry["timestamp"] | ||
|
|
||
| # Send heartbeat comment every 15 seconds to keep connection alive | ||
| heartbeat_counter += 1 | ||
| if heartbeat_counter >= heartbeat_interval: | ||
| yield {"comment": "keepalive"} | ||
| heartbeat_counter = 0 | ||
|
|
||
| # Non-blocking sleep before next poll | ||
| await asyncio.sleep(0.5) | ||
|
|
||
| except asyncio.CancelledError: | ||
| # Client disconnected - clean exit | ||
| pass |
There was a problem hiding this comment.
Add comprehensive error handling with logging.
The try-except block only catches asyncio.CancelledError, but operations like log_buffer.get_logs(), log_buffer.get_logs_since(), and format_log_event() may raise other exceptions. Any unhandled exception will crash the entire stream and abruptly disconnect the client without logging the failure.
As per coding guidelines, streaming events should "continue processing but log errors" and "preserve full stack traces" with exc_info=True. Apply this pattern:
try:
while True:
- # Poll for new logs
- new_logs = log_buffer.get_logs_since(
- work_order_id=work_order_id,
- since_timestamp=last_timestamp,
- level=level_filter,
- step=step_filter,
- )
-
- # Yield new logs
- for log_entry in new_logs:
- yield format_log_event(log_entry)
- last_timestamp = log_entry["timestamp"]
+ try:
+ # Poll for new logs
+ new_logs = log_buffer.get_logs_since(
+ work_order_id=work_order_id,
+ since_timestamp=last_timestamp,
+ level=level_filter,
+ step=step_filter,
+ )
+
+ # Yield new logs
+ for log_entry in new_logs:
+ yield format_log_event(log_entry)
+ last_timestamp = log_entry["timestamp"]
+ except Exception as e:
+ # Log error but continue streaming
+ import logging
+ logger = logging.getLogger(__name__)
+ logger.error(
+ f"Error streaming logs for work_order_id={work_order_id}: {e}",
+ exc_info=True
+ )
+ # Optionally yield an error event to the client
+ # yield {"data": json.dumps({"error": str(e), "timestamp": get_current_timestamp()})}
# Send heartbeat comment every 15 seconds to keep connection aliveBased on coding guidelines.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/api/sse_streams.py around lines 69 to 95, the
current try/except only handles asyncio.CancelledError but other calls
(log_buffer.get_logs, log_buffer.get_logs_since, format_log_event, etc.) can
raise exceptions which would terminate the stream and drop the client; wrap the
streaming loop operations in a broader try/except Exception block that logs the
full stack trace (use logger.exception or logger.error(..., exc_info=True)) and
continues processing so the stream remains alive, while still allowing
asyncio.CancelledError to propagate (re-raise it) to support cancellation.
| while True: | ||
| # Poll for new logs | ||
| new_logs = log_buffer.get_logs_since( | ||
| work_order_id=work_order_id, | ||
| since_timestamp=last_timestamp, | ||
| level=level_filter, | ||
| step=step_filter, | ||
| ) | ||
|
|
||
| # Yield new logs | ||
| for log_entry in new_logs: | ||
| yield format_log_event(log_entry) | ||
| last_timestamp = log_entry["timestamp"] | ||
|
|
||
| # Send heartbeat comment every 15 seconds to keep connection alive | ||
| heartbeat_counter += 1 | ||
| if heartbeat_counter >= heartbeat_interval: | ||
| yield {"comment": "keepalive"} | ||
| heartbeat_counter = 0 | ||
|
|
||
| # Non-blocking sleep before next poll | ||
| await asyncio.sleep(0.5) |
There was a problem hiding this comment.
Add completion detection to prevent resource leaks.
The infinite polling loop continues indefinitely with no mechanism to detect when a work order has completed. This can lead to orphaned SSE connections consuming server resources, especially if clients fail to disconnect or if the work order finishes before the client closes the connection.
Consider adding a termination condition:
try:
while True:
+ # Check if work order is complete (fetch status from DB/state)
+ # if work_order_completed:
+ # break
+
# Poll for new logs
new_logs = log_buffer.get_logs_since(Alternatively, implement a timeout or maximum iteration count, or coordinate with the work order state management to signal completion.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/api/sse_streams.py around lines 70 to 91, the
SSE polling loop never terminates which can leak connections; update the loop to
detect work-order completion and break/cleanup by 1) querying the work order
state each iteration (or subscribing to a watcher) and breaking when it reaches
a terminal state, 2) sending a final SSE event (e.g., "completed" or "failed")
before closing, 3) adding a configurable timeout or max-iteration safeguard as a
fallback, and 4) wrapping the loop in try/finally to ensure the SSE
response/connection is closed and any resources are released on exit.
| """ | ||
|
|
||
| MAX_LOGS_PER_WORK_ORDER = 1000 | ||
| CLEANUP_THRESHOLD_HOURS = 1 |
There was a problem hiding this comment.
Consider increasing CLEANUP_THRESHOLD_HOURS for long-running workflows.
A 1-hour cleanup threshold may be too aggressive for agent work orders, which can run for extended periods. Active work orders could lose their log history prematurely. Consider increasing this to 4-8 hours or making it configurable.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/utils/log_buffer.py at line 24 the
CLEANUP_THRESHOLD_HOURS constant is set to 1 hour which is too short for
long-running agent work orders; change this by increasing the default to a safer
value (e.g., 4 hours) or make it configurable: replace the hard-coded constant
with a value read from config/environment (e.g., AGENT_LOG_CLEANUP_HOURS) parsed
as an int with a fallback default of 4, update any related docstring/comments to
reflect the new default and behavior, and adjust/update/add unit tests that
assume the original 1-hour threshold.
| async def cleanup_loop() -> None: | ||
| while True: | ||
| await asyncio.sleep(interval_seconds) | ||
| removed = self.cleanup_old_work_orders() | ||
| if removed > 0: | ||
| # Note: We don't log here to avoid circular dependency | ||
| # The cleanup is logged by the caller if needed | ||
| pass | ||
|
|
||
| self._cleanup_task = asyncio.create_task(cleanup_loop()) |
There was a problem hiding this comment.
Add exception handling to prevent silent task failure.
The cleanup loop has no exception handling. If cleanup_old_work_orders() raises an exception, the background task will crash silently and cleanup will permanently stop. This could lead to unbounded memory growth.
Apply this diff to add error handling:
async def cleanup_loop() -> None:
while True:
- await asyncio.sleep(interval_seconds)
- removed = self.cleanup_old_work_orders()
- if removed > 0:
- # Note: We don't log here to avoid circular dependency
- # The cleanup is logged by the caller if needed
- pass
+ try:
+ await asyncio.sleep(interval_seconds)
+ removed = self.cleanup_old_work_orders()
+ if removed > 0:
+ # Note: We don't log here to avoid circular dependency
+ # The cleanup is logged by the caller if needed
+ pass
+ except Exception:
+ # Silently continue - avoid crashing the cleanup task
+ # The buffer will still accept new logs, just won't clean up old ones
+ passCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/utils/log_buffer.py around lines 208 to 217, the
cleanup background loop currently calls cleanup_old_work_orders() with no
exception handling which can cause the task to crash silently; wrap the call in
a try/except around each iteration, catch Exception as e, log the exception with
a clear message (including the exception details and stacktrace), and ensure the
loop continues (do not re-raise); keep the existing sleep/await delay behavior
so cleanup resumes on the next interval.
| class BufferProcessor: | ||
| """Custom structlog processor to route logs to WorkOrderLogBuffer. | ||
|
|
||
| Only buffers logs that have 'work_order_id' in their context. | ||
| This ensures we only store logs for active work orders. | ||
| """ | ||
|
|
||
| def __init__(self, buffer: WorkOrderLogBuffer) -> None: | ||
| """Initialize processor with a log buffer. | ||
|
|
||
| Args: | ||
| buffer: The WorkOrderLogBuffer instance to write logs to | ||
| """ | ||
| self.buffer = buffer | ||
|
|
||
| def __call__( | ||
| self, logger: Any, method_name: str, event_dict: MutableMapping[str, Any] | ||
| ) -> MutableMapping[str, Any]: | ||
| """Process log event and add to buffer if it has work_order_id. | ||
|
|
||
| Args: | ||
| logger: The logger instance | ||
| method_name: The log level method name | ||
| event_dict: Dictionary containing log event data | ||
|
|
||
| Returns: | ||
| Unmodified event_dict (pass-through processor) | ||
| """ | ||
| work_order_id = event_dict.get("work_order_id") | ||
| if work_order_id: | ||
| # Extract core fields | ||
| level = event_dict.get("level", method_name) | ||
| event = event_dict.get("event", "") | ||
| timestamp = event_dict.get("timestamp", "") | ||
|
|
||
| # Get all extra fields (everything except core fields) | ||
| extra = { | ||
| k: v | ||
| for k, v in event_dict.items() | ||
| if k not in ("work_order_id", "level", "event", "timestamp") | ||
| } | ||
|
|
||
| # Add to buffer | ||
| self.buffer.add_log( | ||
| work_order_id=work_order_id, | ||
| level=level, | ||
| event=event, | ||
| timestamp=timestamp, | ||
| **extra, | ||
| ) | ||
|
|
||
| return event_dict |
There was a problem hiding this comment.
Add error handling around buffer operations.
If buffer.add_log() raises an exception, the logging pipeline could fail or crash. As per coding guidelines, logging infrastructure should continue processing even when individual operations fail.
Apply this diff to add defensive error handling:
# Add to buffer
- self.buffer.add_log(
- work_order_id=work_order_id,
- level=level,
- event=event,
- timestamp=timestamp,
- **extra,
- )
+ try:
+ self.buffer.add_log(
+ work_order_id=work_order_id,
+ level=level,
+ event=event,
+ timestamp=timestamp,
+ **extra,
+ )
+ except Exception:
+ # Log buffer failure should not crash the logging pipeline
+ # Use structlog's internal logger to avoid recursion
+ import logging
+ logging.getLogger(__name__).exception(
+ "Failed to add log to buffer",
+ extra={"work_order_id": work_order_id}
+ )
return event_dictAs per coding guidelines
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/utils/structured_logger.py around lines 16 to
67, add defensive error handling around buffer.add_log() calls so that any
exception from buffer operations does not propagate and crash the logging
pipeline; wrap the buffer.add_log(...) invocation in a try/except Exception as e
block, catch all exceptions, and record the failure using the module/logger
(e.g., logger.exception or logger.error with exc_info=True) with a clear message
including contextual info, then continue without re-raising so logging remains
resilient.
| def configure_structured_logging(log_level: str = "INFO") -> None: | ||
| """Configure structlog with console rendering. | ||
|
|
||
| Event naming convention: {module}_{noun}_{verb_past_tense} | ||
| Examples: | ||
| - agent_work_order_created | ||
| - git_branch_created | ||
| - workflow_phase_started | ||
| - sandbox_cleanup_completed | ||
|
|
||
| Args: | ||
| log_level: Minimum log level (DEBUG, INFO, WARNING, ERROR) | ||
| """ | ||
| structlog.configure( | ||
| processors=[ | ||
| structlog.contextvars.merge_contextvars, | ||
| structlog.stdlib.add_log_level, | ||
| structlog.processors.TimeStamper(fmt="iso"), | ||
| structlog.processors.StackInfoRenderer(), | ||
| structlog.processors.format_exc_info, | ||
| structlog.dev.ConsoleRenderer(), | ||
| ], | ||
| wrapper_class=structlog.stdlib.BoundLogger, | ||
| logger_factory=structlog.stdlib.LoggerFactory(), | ||
| cache_logger_on_first_use=True, | ||
| ) | ||
|
|
There was a problem hiding this comment.
The log_level parameter is accepted but never used.
The function signature accepts a log_level parameter but doesn't apply it to the structlog configuration. This breaks the expected contract and prevents users from controlling log verbosity.
Apply this diff to use the log_level parameter:
def configure_structured_logging(log_level: str = "INFO") -> None:
"""Configure structlog with console rendering.
Event naming convention: {module}_{noun}_{verb_past_tense}
Examples:
- agent_work_order_created
- git_branch_created
- workflow_phase_started
- sandbox_cleanup_completed
Args:
log_level: Minimum log level (DEBUG, INFO, WARNING, ERROR)
"""
+ import logging
+ logging.basicConfig(level=log_level)
+
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
- structlog.stdlib.add_log_level,
+ structlog.stdlib.filter_by_level,
+ structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.dev.ConsoleRenderer(),
],
wrapper_class=structlog.stdlib.BoundLogger,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=True,
)🤖 Prompt for AI Agents
In python/src/agent_work_orders/utils/structured_logger.py around lines 70-96,
the log_level parameter is accepted but ignored; update the function to apply it
by converting the incoming log_level (string or int) to a logging level
constant, call logging.basicConfig(level=level) and set the root logger and any
relevant loggers to that level (e.g., logging.getLogger().setLevel(level)), then
proceed with structlog.configure (keeping existing processors) so structlog uses
the configured stdlib level; ensure you validate/normalize the input (raise or
default on invalid values) and avoid overriding other handlers unexpectedly.
| def configure_structured_logging_with_buffer( | ||
| log_level: str, buffer: WorkOrderLogBuffer | ||
| ) -> None: | ||
| """Configure structlog with both console rendering and log buffering. | ||
|
|
||
| This configuration enables SSE streaming by routing logs to the buffer | ||
| while maintaining console output for local development. | ||
|
|
||
| Args: | ||
| log_level: Minimum log level (DEBUG, INFO, WARNING, ERROR) | ||
| buffer: WorkOrderLogBuffer instance to store logs for streaming | ||
|
|
||
| Examples: | ||
| buffer = WorkOrderLogBuffer() | ||
| configure_structured_logging_with_buffer("INFO", buffer) | ||
| """ | ||
| structlog.configure( | ||
| processors=[ | ||
| structlog.contextvars.merge_contextvars, | ||
| structlog.stdlib.add_log_level, | ||
| structlog.processors.TimeStamper(fmt="iso"), | ||
| structlog.processors.StackInfoRenderer(), | ||
| structlog.processors.format_exc_info, | ||
| BufferProcessor(buffer), | ||
| structlog.dev.ConsoleRenderer(), | ||
| ], | ||
| wrapper_class=structlog.stdlib.BoundLogger, | ||
| logger_factory=structlog.stdlib.LoggerFactory(), | ||
| cache_logger_on_first_use=True, | ||
| ) | ||
|
|
There was a problem hiding this comment.
The log_level parameter is accepted but never used.
Same issue as configure_structured_logging: the log_level parameter is not applied to the configuration. This prevents log level filtering from working.
Apply this diff to use the log_level parameter:
def configure_structured_logging_with_buffer(
log_level: str, buffer: WorkOrderLogBuffer
) -> None:
"""Configure structlog with both console rendering and log buffering.
This configuration enables SSE streaming by routing logs to the buffer
while maintaining console output for local development.
Args:
log_level: Minimum log level (DEBUG, INFO, WARNING, ERROR)
buffer: WorkOrderLogBuffer instance to store logs for streaming
Examples:
buffer = WorkOrderLogBuffer()
configure_structured_logging_with_buffer("INFO", buffer)
"""
+ import logging
+ logging.basicConfig(level=log_level)
+
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
+ structlog.stdlib.filter_by_level,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
BufferProcessor(buffer),
structlog.dev.ConsoleRenderer(),
],
wrapper_class=structlog.stdlib.BoundLogger,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=True,
)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/utils/structured_logger.py around lines 98 to
128, the function accepts a log_level parameter but never applies it to the
logger configuration; update the function to use the passed log_level when
configuring handlers and the root logger (e.g., pass the log_level into
dictConfig's 'level' fields or call logger.setLevel(log_level) on the created
logger/root), ensure any string/int conversion is handled and that log_level
defaults remain intact, and mirror the same approach used in
configure_structured_logging so log level filtering actually takes effect.
- Add WorkOrderLogsPanel with SSE streaming support - Add RealTimeStats component for live metrics - Add useWorkOrderLogs hook for SSE log streaming - Add useLogStats hook for real-time statistics - Update WorkOrderDetailView to display logs panel - Add comprehensive tests for new components - Configure Vite test environment
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
archon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsx (2)
48-51: Harden repository name parsing (use URL, trim trailing slash).Current split is brittle (trailing slashes, query). Use
URLparsing.Apply this diff:
- const repoName = workOrder.repository_url - ? workOrder.repository_url.split("/").slice(-2).join("/") - : "Unknown Repository"; + const repoName = (() => { + try { + if (!workOrder.repository_url) return "Unknown Repository"; + const u = new URL(workOrder.repository_url); + const parts = u.pathname.replace(/\/+$/, "").split("/").filter(Boolean); + return parts.slice(-2).join("/") || "Unknown Repository"; + } catch { + return "Unknown Repository"; + } + })();Based on learnings.
95-106: Avoid dynamic Tailwind construction; map status to full class names.Replace nested ternary with a deterministic mapping.
Apply this diff:
+// at top-level imports +// import type { AgentWorkOrderStatus } from "../types"; + +const statusColorMap: Record<"completed" | "failed" | "running" | "pending", string> = { + completed: "text-green-400", + failed: "text-red-400", + running: "text-blue-400", + pending: "text-gray-400", +}; @@ - <p - className={`text-lg font-semibold ${ - workOrder.status === "completed" - ? "text-green-400" - : workOrder.status === "failed" - ? "text-red-400" - : workOrder.status === "running" - ? "text-blue-400" - : "text-gray-400" - }`} - > + <p className={`text-lg font-semibold ${statusColorMap[workOrder.status] || "text-gray-400"}`}> {workOrder.status.charAt(0).toUpperCase() + workOrder.status.slice(1)} </p>As per coding guidelines.
🧹 Nitpick comments (13)
archon-ui-main/vite.config.ts (2)
300-304: Eliminate target URL duplication.The target URL is defined twice: once in the
targetproperty (line 300) and again within theconfigurefunction (line 304). If the service name or port changes, both locations must be updated, creating a maintenance risk.Apply this diff to eliminate the duplication:
'/api/agent-work-orders': { target: isDocker ? 'http://archon-agent-work-orders:8053' : 'http://localhost:8053', changeOrigin: true, secure: false, configure: (proxy, options) => { - const targetUrl = isDocker ? 'http://archon-agent-work-orders:8053' : 'http://localhost:8053'; + const targetUrl = options.target; proxy.on('error', (err, req, res) => {
298-329: Consider extracting shared proxy configuration logic.The error handling and logging setup for
/api/agent-work-orders(lines 305-313) is nearly identical to the existing/apiproxy (lines 320-328). While functional, this duplication increases maintenance overhead.Consider extracting a helper function:
function createProxyLogger(serviceName: string, targetUrl: string) { return (proxy: any) => { proxy.on('error', (err: any, req: any, res: any) => { console.log(`🚨 [VITE PROXY ERROR - ${serviceName}]:`, err.message); console.log('🚨 [VITE PROXY ERROR] Target:', targetUrl); console.log('🚨 [VITE PROXY ERROR] Request:', req.url); }); proxy.on('proxyReq', (proxyReq: any, req: any, res: any) => { console.log(`🔄 [VITE PROXY - ${serviceName}] Forwarding:`, req.method, req.url, 'to', `${targetUrl}${req.url}`); }); }; }Then use it in the configure functions:
'/api/agent-work-orders': { // ... configure: (proxy, options) => { createProxyLogger('Agent Work Orders', options.target)(proxy); } }, '/api': { // ... configure: (proxy, options) => { createProxyLogger('API', `http://${proxyHost}:${port}`)(proxy); } }archon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsx (4)
37-49: Reuse shared relative-time util instead of duplicating logic.Avoid drift by importing the existing helper.
Apply this diff:
@@ -import { ChevronDown, ChevronUp, RefreshCw, Trash2 } from "lucide-react"; +import { ChevronDown, ChevronUp, RefreshCw, Trash2 } from "lucide-react"; import { useCallback, useEffect, useRef, useState } from "react"; import { Button } from "@/features/ui/primitives/button"; import { useWorkOrderLogs } from "../hooks/useWorkOrderLogs"; import type { LogEntry } from "../types"; +import { formatRelativeTime } from "@/features/projects/shared/api"; @@ -/** - * Format timestamp to relative time - */ -function formatRelativeTime(timestamp: string): string { - const now = Date.now(); - const logTime = new Date(timestamp).getTime(); - const diffSeconds = Math.floor((now - logTime) / 1000); - - if (diffSeconds < 60) return `${diffSeconds}s ago`; - if (diffSeconds < 3600) return `${Math.floor(diffSeconds / 60)}m ago`; - if (diffSeconds < 86400) return `${Math.floor(diffSeconds / 3600)}h ago`; - return `${Math.floor(diffSeconds / 86400)}d ago`; -}Based on learnings.
Also applies to: 8-13
86-90: Auto‑scroll does not trigger on new logs; includelogsin the effect deps.Currently only runs when
autoScrolltoggles, not when logs append.Apply this diff:
- useEffect(() => { + useEffect(() => { if (autoScroll && scrollContainerRef.current) { scrollContainerRef.current.scrollTop = scrollContainerRef.current.scrollHeight; } - }, [autoScroll]); + }, [autoScroll, logs]);
22-35: Tighten typing forgetLogLevelColor.Constrain
levelto the domain type.Apply this diff:
-function getLogLevelColor(level: string): string { +function getLogLevelColor(level: LogEntry["level"]): string {
150-161: Add explicit accessible names to controls (a11y).Relying on
titleis weaker; addaria-labels for robust screen reader support and stable tests.Apply this diff:
<select value={levelFilter || ""} onChange={(e) => setLevelFilter((e.target.value as "info" | "warning" | "error" | "debug") || undefined)} + aria-label="Filter logs by level" className="bg-white/5 border border-white/10 rounded px-2 py-1 text-xs text-gray-300 hover:bg-white/10 transition-colors" > @@ <Button variant="ghost" size="sm" onClick={() => setAutoScroll(!autoScroll)} + aria-label="Toggle auto-scroll" className={autoScroll ? "text-cyan-400" : "text-gray-500"} title={autoScroll ? "Auto-scroll enabled" : "Auto-scroll disabled"} > @@ - <Button variant="ghost" size="sm" onClick={clearLogs} title="Clear logs"> + <Button variant="ghost" size="sm" onClick={clearLogs} aria-label="Clear logs" title="Clear logs"> <Trash2 className="w-4 h-4" /> </Button> @@ - {connectionState === "error" && ( - <Button variant="ghost" size="sm" onClick={reconnect} title="Reconnect"> + {connectionState === "error" && ( + <Button variant="ghost" size="sm" onClick={reconnect} aria-label="Reconnect to log stream" title="Reconnect"> <RefreshCw className="w-4 h-4" /> </Button> )}Also applies to: 163-171, 174-176, 179-183
archon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsx (3)
38-47: Deduplicate relative time helper; use shared util for consistency.Use the existing helper and remove the local implementation to keep UX consistent across features.
Apply:
@@ -import { Activity, Clock, TrendingUp } from "lucide-react"; +import { Activity, Clock, TrendingUp } from "lucide-react"; +import { formatRelativeTime as formatRelative } from "@/features/projects/shared/api"; @@ -function formatRelativeTime(timestamp: string): string { - const now = new Date().getTime(); - const logTime = new Date(timestamp).getTime(); - const diffSeconds = Math.floor((now - logTime) / 1000); - - if (diffSeconds < 1) return "just now"; - if (diffSeconds < 60) return `${diffSeconds}s ago`; - if (diffSeconds < 3600) return `${Math.floor(diffSeconds / 60)}m ago`; - return `${Math.floor(diffSeconds / 3600)}h ago`; -} @@ - <span className="text-gray-500 ml-2 text-xs">{formatRelativeTime(stats.lastActivity)}</span> + <span className="text-gray-500 ml-2 text-xs">{formatRelative(stats.lastActivity)}</span>Also applies to: 146-146, 8-12
82-175: Adopt Radix UI primitives instead of raw divs/tailwind.Per feature guidelines, wrap content using primitives from src/features/ui/primitives (e.g., Card, Separator, Badge) for consistency and accessibility.
As per coding guidelines
77-80: Consider a lightweight placeholder instead of returning null.Render a small Card skeleton or “Awaiting logs…” to avoid layout jumps when runs haven’t started.
archon-ui-main/src/features/agent-work-orders/hooks/useLogStats.ts (1)
108-111: Differentiate “failed” vs “has errors” to prevent false failure states.Treat terminal failure via explicit events; track non-terminal errors separately.
Proposed change (API additive):
export interface LogStats { @@ hasFailed: boolean; + /** Whether any error-level logs exist (non-terminal) */ + hasErrors?: boolean; } @@ - const hasFailed = logs.some( - (log) => log.event === "workflow_failed" || log.event === "agent_work_order_failed" || log.level === "error", - ); + const hasFailed = logs.some( + (log) => log.event === "workflow_failed" || log.event === "agent_work_order_failed", + ); + const hasErrors = logs.some((log) => log.level === "error"); @@ return { @@ hasCompleted, - hasFailed, + hasFailed, + hasErrors, };If backend semantics already equate any error to failure, confirm and keep current logic. Based on learnings
Also applies to: 4-34, 112-124
archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts (3)
90-106: Clear any pending retry before establishing a new connection.Prevents overlapping reconnect attempts after manual reconnect or filter changes.
const connect = useCallback(() => { const url = buildUrl(); if (!url) return; + // Cancel any pending scheduled reconnect + if (retryTimeoutRef.current) { + clearTimeout(retryTimeoutRef.current); + retryTimeoutRef.current = null; + } @@ if (autoReconnect && workOrderId) { reconnectAttemptRef.current += 1; const delay = Math.min(retryDelayRef.current * 2 ** (reconnectAttemptRef.current - 1), MAX_RETRY_DELAY); - retryTimeoutRef.current = setTimeout(() => { connect(); }, delay); }Also applies to: 139-146
195-202: Include filters in the effect dependency list for clarity.Make the reconnection-on-filter-change effect explicit and resilient to refactorings.
- }, [workOrderId, connect]); + }, [workOrderId, levelFilter, stepFilter, connect]);
49-58: Align with frontend guidance: prefer TanStack Query + smart HTTP polling over streaming.Confirm if SSE is an approved exception for this feature. If not, replace SSE with polling via TanStack Query (with since/cursor and refetchInterval).
Example skeleton:
import { useQuery } from "@tanstack/react-query"; export function useWorkOrderLogsPolling({ workOrderId, levelFilter, stepFilter }: UseWorkOrderLogsOptions) { const qs = new URLSearchParams(); if (levelFilter) qs.append("level", levelFilter); if (stepFilter) qs.append("step", stepFilter); return useQuery({ queryKey: ["work-order-logs", workOrderId, levelFilter, stepFilter], queryFn: async () => { if (!workOrderId) return []; const res = await fetch(`${API_BASE_URL}/agent-work-orders/${workOrderId}/logs?${qs.toString()}`); if (!res.ok) throw new Error(`Failed to fetch logs: ${res.status}`); return (await res.json()) as LogEntry[]; }, refetchInterval: (data) => (data?.length ? 1500 : 5000), refetchOnWindowFocus: false, }); }If SSE stays, document the exception in the feature README and provide an HTTP polling fallback for environments where EventSource is unavailable. As per coding guidelines
Also applies to: 100-151, 170-214
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
archon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsx(1 hunks)archon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsx(1 hunks)archon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsx(1 hunks)archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsx(1 hunks)archon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.ts(1 hunks)archon-ui-main/src/features/agent-work-orders/hooks/useLogStats.ts(1 hunks)archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts(1 hunks)archon-ui-main/src/features/agent-work-orders/types/index.ts(1 hunks)archon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsx(1 hunks)archon-ui-main/vite.config.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (12)
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
archon-ui-main/src/**/*.{ts,tsx}: Frontend TypeScript must use strict mode with no implicit any
Use TanStack Query for all data fetching; avoid prop drilling
Use database values directly in the frontend; avoid mapping layers between BE and FE typesUse TanStack Query for all data fetching; avoid WebSockets in the frontend (use smart HTTP polling)
Files:
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.tsarchon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsxarchon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.tsarchon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useLogStats.tsarchon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsxarchon-ui-main/src/features/agent-work-orders/types/index.ts
archon-ui-main/src/features/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use Biome in features: 120 character line length, double quotes, and trailing commas
archon-ui-main/src/features/**/*.{ts,tsx}: Biome formatting in features: 120-character line length, double quotes, and trailing commas
Do not construct Tailwind classes dynamically
Use database enum/string values directly in the frontend (no mapping layer); maintain end-to-end type safety
Files:
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.tsarchon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsxarchon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.tsarchon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useLogStats.tsarchon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsxarchon-ui-main/src/features/agent-work-orders/types/index.ts
archon-ui-main/src/features/*/components/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Place new UI components under src/features/[feature]/components
Files:
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsxarchon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsxarchon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsxarchon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsx
archon-ui-main/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/**/*.{ts,tsx}: TypeScript: strict mode; no implicit any
Never return null to indicate failure in the frontend; throw an error with details
Files:
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.tsarchon-ui-main/vite.config.tsarchon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsxarchon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.tsarchon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useLogStats.tsarchon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsxarchon-ui-main/src/features/agent-work-orders/types/index.ts
archon-ui-main/src/**/*.test.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Frontend tests use Vitest with React Testing Library
Files:
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.tsarchon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsx
archon-ui-main/src/features/**/components/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use Radix UI primitives from src/features/ui/primitives/ in feature components
Files:
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsxarchon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsxarchon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsxarchon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsx
**/*.{ts,tsx,py}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,py}: Do not include keywords SIMPLIFIED, ENHANCED, LEGACY, CHANGED, or REMOVED in code comments
Code comments should describe functionality and reasoning only; do not mention beta status or global rules
Files:
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.tsarchon-ui-main/vite.config.tsarchon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsxarchon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.tsarchon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsxarchon-ui-main/src/features/agent-work-orders/hooks/useLogStats.tsarchon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsxarchon-ui-main/src/features/agent-work-orders/types/index.ts
archon-ui-main/src/features/*/hooks/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use feature-scoped TanStack Query hooks in src/features/[feature]/hooks
Files:
archon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.tsarchon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.tsarchon-ui-main/src/features/agent-work-orders/hooks/useLogStats.ts
archon-ui-main/src/features/**/hooks/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use TanStack Query hooks from src/features/[feature]/hooks/ for data fetching
Files:
archon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.tsarchon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.tsarchon-ui-main/src/features/agent-work-orders/hooks/useLogStats.ts
archon-ui-main/vite.config.{ts,js}
📄 CodeRabbit inference engine (CLAUDE.md)
Configure development API access via Vite proxy (do not hardcode dev API URLs in frontend code)
Files:
archon-ui-main/vite.config.ts
archon-ui-main/src/features/*/types/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Define shared types under src/features/[feature]/types
Files:
archon-ui-main/src/features/agent-work-orders/types/index.ts
archon-ui-main/src/features/**/types/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Define TypeScript types under src/features/[feature]/types/
Files:
archon-ui-main/src/features/agent-work-orders/types/index.ts
🧠 Learnings (3)
📚 Learning: 2025-10-23T11:01:08.360Z
Learnt from: CR
PR: coleam00/Archon#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-23T11:01:08.360Z
Learning: Applies to archon-ui-main/vite.config.{ts,js} : Configure development API access via Vite proxy (do not hardcode dev API URLs in frontend code)
Applied to files:
archon-ui-main/vite.config.ts
📚 Learning: 2025-10-23T11:01:08.360Z
Learnt from: CR
PR: coleam00/Archon#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-23T11:01:08.360Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Do not construct Tailwind classes dynamically
Applied to files:
archon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsx
📚 Learning: 2025-10-23T11:01:08.360Z
Learnt from: CR
PR: coleam00/Archon#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-23T11:01:08.360Z
Learning: Applies to archon-ui-main/src/features/**/types/**/*.{ts,tsx} : Define TypeScript types under src/features/[feature]/types/
Applied to files:
archon-ui-main/src/features/agent-work-orders/types/index.ts
🧬 Code graph analysis (8)
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsx (3)
archon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsx (1)
WorkOrderLogsPanel(70-225)archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts (1)
useWorkOrderLogs(49-214)archon-ui-main/src/features/agent-work-orders/types/index.ts (1)
LogEntry(145-187)
archon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.ts (2)
archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts (1)
useWorkOrderLogs(49-214)archon-ui-main/src/features/agent-work-orders/types/index.ts (1)
LogEntry(145-187)
archon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsx (3)
archon-ui-main/src/features/projects/shared/api.ts (1)
formatRelativeTime(6-17)archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts (1)
useWorkOrderLogs(49-214)archon-ui-main/src/features/agent-work-orders/hooks/useLogStats.ts (1)
useLogStats(42-125)
archon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsx (4)
archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts (1)
useWorkOrderLogs(49-214)archon-ui-main/src/features/agent-work-orders/hooks/useLogStats.ts (1)
useLogStats(42-125)archon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsx (1)
RealTimeStats(49-176)archon-ui-main/src/features/agent-work-orders/types/index.ts (1)
LogEntry(145-187)
archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts (2)
archon-ui-main/src/features/agent-work-orders/types/index.ts (2)
LogEntry(145-187)SSEConnectionState(192-192)archon-ui-main/src/config/api.ts (1)
API_BASE_URL(35-35)
archon-ui-main/src/features/agent-work-orders/views/WorkOrderDetailView.tsx (6)
archon-ui-main/src/features/agent-work-orders/hooks/useAgentWorkOrderQueries.ts (2)
useWorkOrder(61-80)useStepHistory(89-110)archon-ui-main/src/features/ui/primitives/button.tsx (1)
Button(11-130)archon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsx (1)
RealTimeStats(49-176)archon-ui-main/src/features/agent-work-orders/components/WorkOrderProgressBar.tsx (1)
WorkOrderProgressBar(28-97)archon-ui-main/src/features/agent-work-orders/components/StepHistoryTimeline.tsx (1)
StepHistoryTimeline(27-112)archon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsx (1)
WorkOrderLogsPanel(70-225)
archon-ui-main/src/features/agent-work-orders/hooks/useLogStats.ts (1)
archon-ui-main/src/features/agent-work-orders/types/index.ts (1)
LogEntry(145-187)
archon-ui-main/src/features/agent-work-orders/components/WorkOrderLogsPanel.tsx (3)
archon-ui-main/src/features/projects/shared/api.ts (1)
formatRelativeTime(6-17)archon-ui-main/src/features/agent-work-orders/types/index.ts (1)
LogEntry(145-187)archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts (1)
useWorkOrderLogs(49-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (4)
archon-ui-main/vite.config.ts (1)
298-314: Proxy configuration correctly follows coding guidelines.The new
/api/agent-work-ordersproxy is properly configured for development access and correctly placed before the general/apiproxy to ensure proper route matching. This aligns with the guideline to configure development API access via Vite proxy rather than hardcoding URLs in frontend code.As per coding guidelines.
archon-ui-main/src/features/agent-work-orders/components/__tests__/RealTimeStats.test.tsx (2)
32-55: LGTM for base render guard.Correctly asserts non-render when no logs/hasStarted=false. Mock configuration is clear and isolated per-test.
92-124: Progress bar assertion approach is sound.Querying by inline style width is a pragmatic check for percentage rendering.
archon-ui-main/src/features/agent-work-orders/components/__tests__/WorkOrderLogsPanel.test.tsx (1)
131-156: LGTM for Clear Logs behavior.Verifies the handler is called via the button. Once the component adds explicit
aria-labels, the query will remain robust.
| {stats.progressPct !== null ? ( | ||
| <div className="space-y-1"> | ||
| <div className="flex items-center gap-2"> | ||
| <div className="flex-1 h-2 bg-gray-700 rounded-full overflow-hidden"> | ||
| <div | ||
| className="h-full bg-gradient-to-r from-cyan-500 to-blue-500 transition-all duration-500 ease-out" | ||
| style={{ width: `${stats.progressPct}%` }} | ||
| /> | ||
| </div> | ||
| <span className="text-sm font-medium text-cyan-400">{stats.progressPct}%</span> | ||
| </div> | ||
| </div> |
There was a problem hiding this comment.
Clamp and format progress to avoid overflow/NaN.
Ensure width stays in [0,100] and display whole percent.
- <div
- className="h-full bg-gradient-to-r from-cyan-500 to-blue-500 transition-all duration-500 ease-out"
- style={{ width: `${stats.progressPct}%` }}
- />
+ <div
+ className="h-full bg-gradient-to-r from-cyan-500 to-blue-500 transition-all duration-500 ease-out"
+ style={{ width: `${Math.min(100, Math.max(0, stats.progressPct))}%` }}
+ />
@@
- <span className="text-sm font-medium text-cyan-400">{stats.progressPct}%</span>
+ <span className="text-sm font-medium text-cyan-400">{Math.round(stats.progressPct)}%</span>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {stats.progressPct !== null ? ( | |
| <div className="space-y-1"> | |
| <div className="flex items-center gap-2"> | |
| <div className="flex-1 h-2 bg-gray-700 rounded-full overflow-hidden"> | |
| <div | |
| className="h-full bg-gradient-to-r from-cyan-500 to-blue-500 transition-all duration-500 ease-out" | |
| style={{ width: `${stats.progressPct}%` }} | |
| /> | |
| </div> | |
| <span className="text-sm font-medium text-cyan-400">{stats.progressPct}%</span> | |
| </div> | |
| </div> | |
| {stats.progressPct !== null ? ( | |
| <div className="space-y-1"> | |
| <div className="flex items-center gap-2"> | |
| <div className="flex-1 h-2 bg-gray-700 rounded-full overflow-hidden"> | |
| <div | |
| className="h-full bg-gradient-to-r from-cyan-500 to-blue-500 transition-all duration-500 ease-out" | |
| style={{ width: `${Math.min(100, Math.max(0, stats.progressPct))}%` }} | |
| /> | |
| </div> | |
| <span className="text-sm font-medium text-cyan-400">{Math.round(stats.progressPct)}%</span> | |
| </div> | |
| </div> |
🤖 Prompt for AI Agents
In archon-ui-main/src/features/agent-work-orders/components/RealTimeStats.tsx
around lines 109 to 120, the progressPct value can be NaN or outside [0,100],
causing overflow or bad rendering; clamp the numeric value used for the style
width to the 0–100 range (and coerce non-numeric values to 0), and use a rounded
integer for the displayed percent text (e.g., compute a safeProgress =
Math.min(100, Math.max(0, Number(stats.progressPct) || 0)) and use
`${safeProgress}%` for style and safeProgress (or Math.round) for the span
text). Ensure you only change the computed value, not the original stats object.
| // Mock EventSource | ||
| class MockEventSource { | ||
| public onopen: ((event: Event) => void) | null = null; | ||
| public onmessage: ((event: MessageEvent) => void) | null = null; | ||
| public onerror: ((event: Event) => void) | null = null; | ||
| public readyState = 0; // CONNECTING | ||
| public url: string; | ||
|
|
||
| constructor(url: string) { | ||
| this.url = url; | ||
| // Simulate connection opening after a tick | ||
| setTimeout(() => { | ||
| this.readyState = 1; // OPEN | ||
| if (this.onopen) { | ||
| this.onopen(new Event("open")); | ||
| } | ||
| }, 0); | ||
| } | ||
|
|
||
| close() { | ||
| this.readyState = 2; // CLOSED | ||
| } | ||
|
|
||
| // Test helper: simulate receiving a message | ||
| simulateMessage(data: string) { | ||
| if (this.onmessage) { | ||
| this.onmessage(new MessageEvent("message", { data })); | ||
| } | ||
| } | ||
|
|
||
| // Test helper: simulate an error | ||
| simulateError() { | ||
| if (this.onerror) { | ||
| this.onerror(new Event("error")); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Replace global EventSource with mock | ||
| global.EventSource = MockEventSource as unknown as typeof EventSource; | ||
|
|
There was a problem hiding this comment.
Fix brittle EventSource mocking; capture instances deterministically to make tests meaningful.
Current tests try to find instances via Object.values(global), which won’t contain constructed instances, producing no-ops or failures (e.g., spying after unmount). Track the last instance in the mock constructor and use it in tests. Also set spies before unmount(), and actually simulate messages for clear/limit tests.
Apply these diffs:
- Capture the created EventSource deterministically
class MockEventSource {
+ static lastInstance: MockEventSource | null = null;
public onopen: ((event: Event) => void) | null = null;
public onmessage: ((event: MessageEvent) => void) | null = null;
public onerror: ((event: Event) => void) | null = null;
public readyState = 0; // CONNECTING
public url: string;
constructor(url: string) {
this.url = url;
+ MockEventSource.lastInstance = this;
// Simulate connection opening after a tick
setTimeout(() => {
this.readyState = 1; // OPEN
if (this.onopen) {
this.onopen(new Event("open"));
}
}, 0);
}- Use
lastInstancein tests and actually simulate events
- // Get the EventSource instance
- const eventSource = (global.EventSource as unknown as typeof MockEventSource).prototype;
+ // Use the last created EventSource instance
+ const instance = (global.EventSource as unknown as typeof MockEventSource).lastInstance!;- await act(async () => {
- if (result.current.logs.length === 0) {
- // Access the actual EventSource instance created by the hook
- const instances = Object.values(global).filter(
- (v) => v instanceof MockEventSource,
- ) as MockEventSource[];
- if (instances.length > 0) {
- instances[0].simulateMessage(JSON.stringify(logEntry1));
- instances[0].simulateMessage(JSON.stringify(logEntry2));
- }
- }
- });
+ await act(async () => {
+ instance.simulateMessage(JSON.stringify(logEntry1));
+ instance.simulateMessage(JSON.stringify(logEntry2));
+ });Malformed JSON test:
-const instances = Object.values(global).filter((v) => v instanceof MockEventSource) as MockEventSource[];
-if (instances.length > 0) {
- await act(async () => {
- instances[0].simulateMessage("{ invalid json }");
- });
-}
+const instance = (global.EventSource as unknown as typeof MockEventSource).lastInstance!;
+await act(async () => {
+ instance.simulateMessage("{ invalid json }");
+});URL parameters test:
-const instances = Object.values(global).filter((v) => v instanceof MockEventSource) as MockEventSource[];
-if (instances.length > 0) {
- const url = instances[0].url;
- expect(url).toContain("level=error");
- expect(url).toContain("step=planning");
-}
+const url = (global.EventSource as unknown as typeof MockEventSource).lastInstance!.url;
+expect(url).toContain("level=error");
+expect(url).toContain("step=planning");Cleanup/unmount test:
-const instances = Object.values(global).filter((v) => v instanceof MockEventSource) as MockEventSource[];
-const closeSpy = vi.spyOn(instances[0], "close");
+const instance = (global.EventSource as unknown as typeof MockEventSource).lastInstance!;
+const closeSpy = vi.spyOn(instance, "close");
// Unmount hook
unmount();
// EventSource should be closed
expect(closeSpy).toHaveBeenCalled();Limit test: actually push >500 logs
-// This test would verify the 500 log limit
-// In practice, we'd need to simulate 501+ messages
-// and verify only the last 500 are kept
-expect(result.current.logs.length).toBeLessThanOrEqual(500);
+const instance = (global.EventSource as unknown as typeof MockEventSource).lastInstance!;
+await act(async () => {
+ for (let i = 0; i < 505; i++) {
+ const entry: LogEntry = {
+ work_order_id: workOrderId,
+ level: "info",
+ event: `e-${i}`,
+ timestamp: new Date().toISOString(),
+ };
+ instance.simulateMessage(JSON.stringify(entry));
+ }
+});
+await waitFor(() => {
+ expect(result.current.logs.length).toBe(500);
+ expect(result.current.logs[0].event).toBe("e-5");
+});These changes make the tests deterministic and actually validate the hook’s behavior.
Also applies to: 102-104, 123-134, 155-170, 187-196, 236-247, 250-263
🤖 Prompt for AI Agents
In
archon-ui-main/src/features/agent-work-orders/hooks/__tests__/useWorkOrderLogs.test.ts
around lines 10 to 50, the EventSource mock is brittle because tests search
global objects for constructed instances and set spies after unmount; update the
mock to store the most recent instance deterministically (e.g., add a static or
global lastInstance assigned in the constructor and expose it), replace all
Object.values(global) searches with direct references to that lastInstance in
tests, ensure spies are attached to lastInstance before calling unmount(), and
use the mock's simulateMessage/simulateError helpers to actually push events
(for the malformed JSON, URL params, cleanup/unmount, and limit tests push >500
messages to validate trimming); finally ensure close/unsubscribe logic uses
lastInstance.close() during cleanup and reset lastInstance between tests.
| const retryTimeoutRef = useRef<NodeJS.Timeout | null>(null); | ||
| const retryDelayRef = useRef<number>(INITIAL_RETRY_DELAY); | ||
| const reconnectAttemptRef = useRef<number>(0); |
There was a problem hiding this comment.
Fix browser timer type.
NodeJS.Timeout is incorrect in browser builds under strict mode. Use ReturnType.
- const retryTimeoutRef = useRef<NodeJS.Timeout | null>(null);
+ const retryTimeoutRef = useRef<ReturnType<typeof setTimeout> | null>(null);🤖 Prompt for AI Agents
In archon-ui-main/src/features/agent-work-orders/hooks/useWorkOrderLogs.ts
around lines 60 to 62, the retryTimeoutRef is typed as NodeJS.Timeout which
breaks in browser/strict builds; change its type to use ReturnType<typeof
setTimeout> so it matches the browser timer return type (e.g.
useRef<ReturnType<typeof setTimeout> | null>(null)); update any other timer refs
the same way and ensure you clear timers using clearTimeout with the typed ref.
| /** Current workflow phase/step being executed (null if not started) */ | ||
| current_phase: string | null; | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use domain enum for current_phase to preserve end‑to‑end type safety.
current_phase is typed as string | null while WorkflowStep exists. This weakens type guarantees across consumers (e.g., progress bar, timeline). Switch to WorkflowStep | null.
Apply this diff:
- /** Current workflow phase/step being executed (null if not started) */
- current_phase: string | null;
+ /** Current workflow phase/step being executed (null if not started) */
+ current_phase: WorkflowStep | null;As per coding guidelines (use database enum/string values directly in the frontend; maintain end-to-end type safety).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** Current workflow phase/step being executed (null if not started) */ | |
| current_phase: string | null; | |
| /** Current workflow phase/step being executed (null if not started) */ | |
| current_phase: WorkflowStep | null; | |
🤖 Prompt for AI Agents
In archon-ui-main/src/features/agent-work-orders/types/index.ts around lines 59
to 61, current_phase is declared as string | null but should use the domain enum
type WorkflowStep to maintain end-to-end type safety; change the type to
WorkflowStep | null (or import/alias WorkflowStep if not in scope), update any
related exports/imports to use the enum/string union used by the backend, and
run type checks to ensure consumers (progress bar, timeline) compile against the
stricter type.
Add ENABLE_AGENT_WORK_ORDERS configuration flag to allow disabling the agent work orders microservice. Service discovery now gracefully handles unavailable services, and health checks return appropriate status when feature is disabled. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/src/server/config/service_discovery.py (1)
104-125: Fix type mismatches and runtime error introduced by breaking change to get_service_url().The convenience functions
get_api_url(),get_mcp_url(), andget_agents_url()(lines 240-256) declare return typestrbut directly return results fromget_service_url()which now returnsstr | None. Additionally, thehealth_check()method (lines 163-165) uses the returned URL in an f-string without checking forNone, causing a runtime crash when services are unavailable.Required fixes:
- Lines 240-256: Update
get_api_url(),get_mcp_url(), andget_agents_url()to either (1) change return type tostr | None, or (2) addNonechecks and raise an exception for required services- Lines 163-165: Add
if url is None: return Falsecheck before usingurlinf"{url}/health"- Lines 199-204: Update
get_all_services()return type todict[str, str | None]or filter outNonevalues
♻️ Duplicate comments (3)
python/src/agent_work_orders/server.py (3)
44-77: Fail fast when critical dependencies are unavailable during startup.The startup validation logs errors for missing Claude CLI and Git but doesn't fail fast. Claude CLI and Git are critical dependencies—the service cannot function without them. Additionally, line 67 catches generic
Exceptionand logging doesn't useexc_info=True.As per coding guidelines: "On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors."
Based on coding guidelines.
107-115: CORS configuration violates security best practices.Combining
allow_origins=["*"]withallow_credentials=Trueviolates the CORS specification and browsers will reject this configuration. This is the same issue present inconfig.pyline 39.Based on coding guidelines.
139-214: Use specific exception types and preserve stack traces in health checks.The health check catches generic
Exceptionin multiple places (lines 150, 175, 192, 209) and doesn't useexc_info=Truefor logging. This violates coding guidelines:
- "Use specific exception types; avoid catching generic Exception"
- "Preserve full stack traces in Python logging by using exc_info=True"
Based on coding guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.env.example(1 hunks)python/src/agent_work_orders/config.py(1 hunks)python/src/agent_work_orders/server.py(1 hunks)python/src/server/config/service_discovery.py(9 hunks)python/tests/agent_work_orders/test_api.py(1 hunks)python/tests/agent_work_orders/test_config.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- python/tests/agent_work_orders/test_api.py
- python/tests/agent_work_orders/test_config.py
🧰 Additional context used
📓 Path-based instructions (4)
python/src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code
Files:
python/src/server/config/service_discovery.pypython/src/agent_work_orders/config.pypython/src/agent_work_orders/server.py
python/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/**/*.py: Preserve full stack traces in Python logging by using exc_info=True
Use specific exception types; avoid catching generic Exception
Include relevant IDs, URLs, and operation context in error messages
Never return None to indicate failure; raise an exception with details instead
Files:
python/src/server/config/service_discovery.pypython/src/agent_work_orders/config.pypython/src/agent_work_orders/server.py
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: External API calls should retry with exponential backoff and then fail with a clear message
Never accept corrupted data: skip failed items entirely; do not store zero embeddings, null foreign keys, or malformed JSON
Use Pydantic validation to raise on invalid data; do not silently accept bad data
Do not hardcode GitHub repo owner/name or URLs; import GITHUB_REPO_OWNER/NAME from python/src/server/config/version.py (or GITHUB_REPO env override)
Files:
python/src/server/config/service_discovery.py
**/*.{ts,tsx,py}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,py}: Do not include keywords SIMPLIFIED, ENHANCED, LEGACY, CHANGED, or REMOVED in code comments
Code comments should describe functionality and reasoning only; do not mention beta status or global rules
Files:
python/src/server/config/service_discovery.pypython/src/agent_work_orders/config.pypython/src/agent_work_orders/server.py
🧬 Code graph analysis (1)
python/src/agent_work_orders/server.py (3)
python/src/agent_work_orders/utils/structured_logger.py (2)
configure_structured_logging_with_buffer(98-127)get_logger(164-177)python/src/server/main.py (2)
lifespan(69-145)health_check(216-253)python/src/agent_work_orders/utils/log_buffer.py (2)
start_cleanup_task(196-217)stop_cleanup_task(219-231)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 44-44: [UnorderedKey] The ARCHON_SERVER_PORT key should go before the HOST key
(UnorderedKey)
[warning] 45-45: [UnorderedKey] The ARCHON_MCP_PORT key should go before the ARCHON_SERVER_PORT key
(UnorderedKey)
[warning] 46-46: [UnorderedKey] The ARCHON_AGENTS_PORT key should go before the ARCHON_MCP_PORT key
(UnorderedKey)
[warning] 49-49: [UnorderedKey] The AGENT_WORK_ORDERS_PORT key should go before the ARCHON_AGENTS_PORT key
(UnorderedKey)
[warning] 50-50: [UnorderedKey] The ARCHON_UI_PORT key should go before the HOST key
(UnorderedKey)
[warning] 51-51: [UnorderedKey] The ARCHON_DOCS_PORT key should go before the ARCHON_MCP_PORT key
(UnorderedKey)
[warning] 78-78: [UnorderedKey] The FILE_STATE_DIRECTORY key should go before the STATE_STORAGE_TYPE key
(UnorderedKey)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (1)
.env.example (1)
30-78: LGTM! Well-documented Agent Work Orders configuration.The new environment variables are clearly documented with helpful comments explaining their purpose, required scopes, and default values. The logical grouping by feature area is appropriate for a configuration template.
Note: The static analysis warnings about key ordering are cosmetic and can be safely ignored—logical grouping enhances readability in this context.
| SERVICE_DISCOVERY_MODE: str = os.getenv("SERVICE_DISCOVERY_MODE", "local") | ||
|
|
||
| # CORS configuration | ||
| CORS_ORIGINS: str = os.getenv("CORS_ORIGINS", "http://localhost:3737,http://host.docker.internal:3737,*") |
There was a problem hiding this comment.
CORS wildcard default creates security vulnerability when combined with credentials.
The default value for CORS_ORIGINS includes "*" which will cause CORS violations when combined with allow_credentials=True in server.py (line 112). Browsers reject this combination per the CORS specification.
Apply this diff to use explicit origins by default:
- CORS_ORIGINS: str = os.getenv("CORS_ORIGINS", "http://localhost:3737,http://host.docker.internal:3737,*")
+ CORS_ORIGINS: str = os.getenv("CORS_ORIGINS", "http://localhost:3737,http://host.docker.internal:3737,http://localhost:3000,http://localhost:5173")Based on coding guidelines.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CORS_ORIGINS: str = os.getenv("CORS_ORIGINS", "http://localhost:3737,http://host.docker.internal:3737,*") | |
| CORS_ORIGINS: str = os.getenv("CORS_ORIGINS", "http://localhost:3737,http://host.docker.internal:3737,http://localhost:3000,http://localhost:5173") |
🤖 Prompt for AI Agents
In python/src/agent_work_orders/config.py around line 39, the default
CORS_ORIGINS includes a "*" wildcard which causes a security issue when
allow_credentials=True in server.py; remove the "*" from the default and replace
it with only explicit origins (e.g.
"http://localhost:3737,http://host.docker.internal:3737") so the default does
not use a wildcard, and ensure any environment-provided CORS_ORIGINS follow the
explicit-origin format or are validated at startup.
| archon_server_url = os.getenv("ARCHON_SERVER_URL") | ||
| archon_mcp_url = os.getenv("ARCHON_MCP_URL") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use config methods for service URL resolution.
Lines 80-81, 183, and 200 use os.getenv() directly to retrieve Archon service URLs, bypassing the config module's get_archon_server_url() and get_archon_mcp_url() methods. These methods handle service discovery mode and provide consistent URL resolution logic.
Apply this diff:
# Log service URLs
- archon_server_url = os.getenv("ARCHON_SERVER_URL")
- archon_mcp_url = os.getenv("ARCHON_MCP_URL")
+ archon_server_url = config.get_archon_server_url()
+ archon_mcp_url = config.get_archon_mcp_url()
if archon_server_url:
logger.info(And in the health check:
# Check Archon server connectivity (if configured)
- archon_server_url = os.getenv("ARCHON_SERVER_URL")
+ archon_server_url = config.get_archon_server_url()
if archon_server_url: # Check MCP server connectivity (if configured)
- archon_mcp_url = os.getenv("ARCHON_MCP_URL")
+ archon_mcp_url = config.get_archon_mcp_url()
if archon_mcp_url:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| archon_server_url = os.getenv("ARCHON_SERVER_URL") | |
| archon_mcp_url = os.getenv("ARCHON_MCP_URL") | |
| archon_server_url = config.get_archon_server_url() | |
| archon_mcp_url = config.get_archon_mcp_url() |
🤖 Prompt for AI Agents
In python/src/agent_work_orders/server.py around lines 80-81 (and also update
usages at ~183 and ~200), replace direct os.getenv("ARCHON_SERVER_URL") and
os.getenv("ARCHON_MCP_URL") calls with the config module helpers
get_archon_server_url() and get_archon_mcp_url(); update the health-check logic
to call these config methods instead of getenv so service-discovery mode and
consistent URL resolution are used throughout. Ensure you import the config
functions if not already imported and use their return values wherever the
ENV-based variables were previously referenced.
- Add archon_configured_repositories table migration with production-ready sandbox type constraints - Implement SupabaseWorkOrderRepository for CRUD operations with comprehensive error handling - Add defensive validation in _row_to_model with detailed logging for invalid enum values - Implement granular exception handling (409 duplicates, 422 validation, 502 GitHub API errors) - Document async/await pattern for interface consistency across repository implementations - Add Supabase health check to verify table existence - Expand test coverage from 10 to 17 tests with error handling and edge case validation - Add supabase dependency to agent-work-orders group - Enable ENABLE_AGENT_WORK_ORDERS flag in docker-compose for production deployment
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (11)
python/src/agent_work_orders/server.py (4)
79-91: Use config URL resolvers instead of getenv for service URLs.Keep URL resolution consistent with service-discovery logic.
- archon_server_url = os.getenv("ARCHON_SERVER_URL") - archon_mcp_url = os.getenv("ARCHON_MCP_URL") + archon_server_url = config.get_archon_server_url() + archon_mcp_url = config.get_archon_mcp_url() @@ - archon_server_url = os.getenv("ARCHON_SERVER_URL") + archon_server_url = config.get_archon_server_url() @@ - archon_mcp_url = os.getenv("ARCHON_MCP_URL") + archon_mcp_url = config.get_archon_mcp_url()Also applies to: 183-201
44-76: Fail fast on missing critical dependencies (Claude CLI, git).Current code only logs and continues. Per guidelines, raise on startup when missing. Also log with stack traces.
As per coding guidelines.
# Validate Claude CLI is available try: result = subprocess.run( [config.CLAUDE_CLI_PATH, "--version"], capture_output=True, text=True, timeout=5, ) if result.returncode == 0: logger.info( "Claude CLI validation successful", extra={"version": result.stdout.strip()}, ) else: - logger.error( - "Claude CLI validation failed", - extra={"error": result.stderr}, - ) + error_msg = f"Claude CLI validation failed: {result.stderr}" + logger.error("Claude CLI validation failed", extra={"error": result.stderr}) + raise RuntimeError(error_msg) except FileNotFoundError: - logger.error( - "Claude CLI not found", - extra={"path": config.CLAUDE_CLI_PATH}, - ) - except Exception as e: - logger.error( - "Claude CLI validation error", - extra={"error": str(e)}, - ) + logger.error("Claude CLI not found", extra={"path": config.CLAUDE_CLI_PATH}) + raise + except (subprocess.TimeoutExpired, subprocess.SubprocessError, OSError) as e: + logger.error("Claude CLI validation error", extra={"error": str(e)}, exc_info=True) + raise @@ - if not shutil.which("git"): - logger.error("Git not found in PATH") + if not shutil.which("git"): + error_msg = "Git not found in PATH - required for repository operations" + logger.error(error_msg) + raise RuntimeError(error_msg)
107-115: CORS: wildcard origins with credentials is invalid and insecure.Replace "*" with explicit origins from configuration; keep credentials only with concrete origins.
As per coding guidelines.
-# CORS middleware with permissive settings for development -cors_origins = os.getenv("CORS_ORIGINS", "*").split(",") +# CORS middleware - explicit origins; default to local dev +default_origins = "http://localhost:3000,http://localhost:5173" +cors_origins = os.getenv("CORS_ORIGINS", default_origins).split(",") app.add_middleware( CORSMiddleware, allow_origins=cors_origins, allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )
121-155: Narrow exception handling in health check and log stack traces.Catch specific exceptions and include exc_info for diagnostics.
As per coding guidelines.
@app.get("/health") async def health_check() -> dict[str, Any]: """Health check endpoint with dependency validation""" + logger = get_logger(__name__) @@ - except Exception as e: + except (FileNotFoundError, subprocess.SubprocessError, OSError) as e: + logger.error("Claude CLI health check failed", extra={"error": str(e)}, exc_info=True) health_status["dependencies"]["claude_cli"] = { "available": False, "error": str(e), } @@ - except Exception as e: + except (FileNotFoundError, subprocess.SubprocessError, OSError) as e: + logger.error("GitHub CLI health check failed", extra={"error": str(e)}, exc_info=True) health_status["dependencies"]["github_cli"] = { "available": False, "authenticated": False, "error": str(e), } @@ - if archon_server_url: + if archon_server_url: try: async with httpx.AsyncClient(timeout=5.0) as client: response = await client.get(f"{archon_server_url}/health") health_status["dependencies"]["archon_server"] = { "available": response.status_code == 200, "url": archon_server_url, } - except Exception as e: + except (httpx.HTTPError, httpx.TimeoutException) as e: + logger.error("Archon server health check failed", extra={"url": archon_server_url, "error": str(e)}, exc_info=True) health_status["dependencies"]["archon_server"] = { "available": False, "url": archon_server_url, "error": str(e), } @@ - if archon_mcp_url: + if archon_mcp_url: try: async with httpx.AsyncClient(timeout=5.0) as client: response = await client.get(f"{archon_mcp_url}/health") health_status["dependencies"]["archon_mcp"] = { "available": response.status_code == 200, "url": archon_mcp_url, } - except Exception as e: + except (httpx.HTTPError, httpx.TimeoutException) as e: + logger.error("MCP server health check failed", extra={"url": archon_mcp_url, "error": str(e)}, exc_info=True) health_status["dependencies"]["archon_mcp"] = { "available": False, "url": archon_mcp_url, "error": str(e), } @@ - if supabase_url: + if supabase_url: try: from .state_manager.repository_config_repository import get_supabase_client - client = get_supabase_client() # Check if archon_configured_repositories table exists response = client.table("archon_configured_repositories").select("id").limit(1).execute() health_status["dependencies"]["supabase"] = { "available": True, "table_exists": True, "url": supabase_url.split("@")[-1] if "@" in supabase_url else supabase_url.split("//")[-1], } - except Exception as e: + except Exception as e: + logger.error("Supabase health check failed", extra={"error": str(e)}, exc_info=True) health_status["dependencies"]["supabase"] = { "available": False, "table_exists": False, "error": str(e), }Also applies to: 161-181, 183-198, 199-215, 216-236
python/src/agent_work_orders/models.py (3)
296-321: REVIEW is unreachable; sequence omits it. Also sequence should respect user selection.
Minimal fix: append REVIEW. Optional: drive sequence from selected_commands if provided.step_sequence = [ WorkflowStep.CREATE_BRANCH, WorkflowStep.PLANNING, WorkflowStep.EXECUTE, WorkflowStep.COMMIT, WorkflowStep.CREATE_PR, + WorkflowStep.REVIEW, ]Optional (make sequence dynamic):
class StepHistory(BaseModel): @@ - steps: list[StepExecutionResult] = Field(default_factory=list) + steps: list[StepExecutionResult] = Field(default_factory=list) + selected_commands: list[WorkflowStep] | None = None @@ - step_sequence = [ + step_sequence = self.selected_commands or [ WorkflowStep.CREATE_BRANCH, WorkflowStep.PLANNING, WorkflowStep.EXECUTE, WorkflowStep.COMMIT, WorkflowStep.CREATE_PR, + WorkflowStep.REVIEW, ]
110-113: Default must include all 6 commands (missing prp-review).
Align with PR objective: include "prp-review" in the default.- selected_commands: list[str] = Field( - default=["create-branch", "planning", "execute", "commit", "create-pr"], + selected_commands: list[str] = Field( + default=["create-branch", "planning", "execute", "commit", "create-pr", "prp-review"], description="Commands to run in sequence" )
294-294: Mutable default list leaks state across instances.
Use default_factory.- steps: list[StepExecutionResult] = [] + steps: list[StepExecutionResult] = Field(default_factory=list)python/src/agent_work_orders/api/routes.py (4)
90-101: Serialize enums and datetimes before persisting metadata.
FileStateRepository uses json.dump; storing Enum/datetime causes TypeError.- metadata = { - "sandbox_type": request.sandbox_type, + now_iso = datetime.now().isoformat() + metadata = { + "sandbox_type": request.sandbox_type.value, "github_issue_number": request.github_issue_number, - "status": AgentWorkOrderStatus.PENDING, + "status": AgentWorkOrderStatus.PENDING.value, "current_phase": None, - "created_at": datetime.now(), - "updated_at": datetime.now(), + "created_at": now_iso, + "updated_at": now_iso, "github_pull_request_url": None, "git_commit_count": 0, "git_files_changed": 0, "error_message": None, }
460-476: Reconstruct enums and datetimes when reading metadata. Add missing import.
After serializing primitives, convert back for the response model.-from ..models import ( +from ..models import ( AgentPromptRequest, AgentWorkflowPhase, AgentWorkOrder, AgentWorkOrderResponse, AgentWorkOrderState, AgentWorkOrderStatus, + SandboxType, @@ - sandbox_type=metadata["sandbox_type"], + sandbox_type=SandboxType(metadata["sandbox_type"]), github_issue_number=metadata["github_issue_number"], - status=metadata["status"], - current_phase=metadata["current_phase"], - created_at=metadata["created_at"], - updated_at=metadata["updated_at"], + status=AgentWorkOrderStatus(metadata["status"]), + current_phase=AgentWorkflowPhase(metadata["current_phase"]) if metadata["current_phase"] else None, + created_at=datetime.fromisoformat(metadata["created_at"]), + updated_at=datetime.fromisoformat(metadata["updated_at"]),
509-525: Apply the same reconstruction in list endpoint.
Mirror get() logic for each item.- work_order = AgentWorkOrder( + work_order = AgentWorkOrder( agent_work_order_id=state.agent_work_order_id, repository_url=state.repository_url, sandbox_identifier=state.sandbox_identifier, git_branch_name=state.git_branch_name, agent_session_id=state.agent_session_id, - sandbox_type=metadata["sandbox_type"], + sandbox_type=SandboxType(metadata["sandbox_type"]), github_issue_number=metadata["github_issue_number"], - status=metadata["status"], - current_phase=metadata["current_phase"], - created_at=metadata["created_at"], - updated_at=metadata["updated_at"], + status=AgentWorkOrderStatus(metadata["status"]), + current_phase=AgentWorkflowPhase(metadata["current_phase"]) if metadata["current_phase"] else None, + created_at=datetime.fromisoformat(metadata["created_at"]), + updated_at=datetime.fromisoformat(metadata["updated_at"]),
573-596: Ensure current_phase is an enum, not a string, in git-progress.
Construct AgentWorkflowPhase from stored primitive.- current_phase = metadata.get("current_phase") + current_phase = metadata.get("current_phase") + phase = AgentWorkflowPhase(current_phase) if current_phase else AgentWorkflowPhase.PLANNING return GitProgressSnapshot( agent_work_order_id=agent_work_order_id, - current_phase=current_phase if current_phase else AgentWorkflowPhase.PLANNING, + current_phase=phase, git_commit_count=0, git_files_changed=0, latest_commit_message=None, git_branch_name=None, ) @@ - current_phase = metadata.get("current_phase") + current_phase = metadata.get("current_phase") + phase = AgentWorkflowPhase(current_phase) if current_phase else AgentWorkflowPhase.PLANNING return GitProgressSnapshot( agent_work_order_id=agent_work_order_id, - current_phase=current_phase if current_phase else AgentWorkflowPhase.PLANNING, + current_phase=phase, git_commit_count=metadata.get("git_commit_count", 0), git_files_changed=metadata.get("git_files_changed", 0), latest_commit_message=None, git_branch_name=state.git_branch_name, )
🧹 Nitpick comments (9)
docker-compose.yml (2)
150-199: Ensure dependencies are healthy before starting work-orders; add MCP dependency.Avoid race conditions by waiting for archon-server and archon-mcp to be healthy.
archon-agent-work-orders: profiles: - work-orders # Only starts when explicitly using --profile work-orders @@ - depends_on: - - archon-server + depends_on: + archon-server: + condition: service_healthy + archon-mcp: + condition: service_healthy
188-195: Healthcheck style consistency (optional).Consider CMD-SHELL (as used elsewhere) for simpler env interpolation and parity.
- healthcheck: - test: - [ - "CMD", - "python", - "-c", - 'import urllib.request; urllib.request.urlopen("http://localhost:${AGENT_WORK_ORDERS_PORT:-8053}/health")', - ] + healthcheck: + test: > + CMD-SHELL bash -lc 'python - << "PY" + import urllib.request + urllib.request.urlopen("http://localhost:${AGENT_WORK_ORDERS_PORT:-8053}/health") + PY'python/tests/agent_work_orders/test_repository_config_repository.py (1)
6-9: Remove unused import.AsyncMock is unused; keep imports minimal to appease Ruff.
-from unittest.mock import AsyncMock, MagicMock, patch +from unittest.mock import MagicMock, patchpython/src/agent_work_orders/state_manager/repository_config_repository.py (1)
227-234: Use timezone-aware timestamps.Prefer UTC-aware ISO strings for DB writes.
-from datetime import datetime +from datetime import datetime, timezone @@ - data["last_verified_at"] = datetime.now().isoformat() + data["last_verified_at"] = datetime.now(timezone.utc).isoformat() @@ - prepared_updates["updated_at"] = datetime.now().isoformat() + prepared_updates["updated_at"] = datetime.now(timezone.utc).isoformat()Also applies to: 280-283
migration/agent_work_orders_repositories.sql (1)
58-61: Redundant unique index.repository_url has UNIQUE constraint; Postgres creates a unique index automatically. This explicit unique index duplicates it—drop to reduce bloat.
-CREATE UNIQUE INDEX IF NOT EXISTS idx_configured_repositories_url - ON archon_configured_repositories(repository_url); +-- implicit unique index exists via column UNIQUE; explicit index not neededpython/src/agent_work_orders/models.py (2)
116-124: Tighten validation: enforce non-empty, dedupe while preserving order.
Prevents empty runs and accidental duplicates.- def validate_commands(cls, v: list[str]) -> list[str]: + def validate_commands(cls, v: list[str]) -> list[str]: """Validate that all commands are valid WorkflowStep values""" valid = {step.value for step in WorkflowStep} - for cmd in v: + if not v: + raise ValueError("selected_commands must contain at least one command") + seen: set[str] = set() + deduped: list[str] = [] + for cmd in v: if cmd not in valid: raise ValueError(f"Invalid command: {cmd}. Must be one of {valid}") - return v + if cmd not in seen: + seen.add(cmd) + deduped.append(cmd) + return deduped
21-26: Remove unused AgentWorkflowType.
Legacy artifact from pre-PR model; drop to avoid confusion and lint warnings.python/src/agent_work_orders/api/routes.py (2)
107-116: Surface background task failures.
Unobserved task exceptions get swallowed. Add a done callback to log exceptions.- asyncio.create_task( - orchestrator.execute_workflow( + task = asyncio.create_task( + orchestrator.execute_workflow( agent_work_order_id=agent_work_order_id, repository_url=request.repository_url, sandbox_type=request.sandbox_type, user_request=request.user_request, selected_commands=request.selected_commands, github_issue_number=request.github_issue_number, ) - ) + ) + task.add_done_callback(lambda t: logger.error( + "workflow_task_failed", agent_work_order_id=agent_work_order_id, + error=str(t.exception())) if t.exception() else None)As per coding guidelines.
129-131: Prefer specific exceptions over bare Exception in route handlers.
Catch and map known errors (validation, repository, GitHub) explicitly; fall back to generic last.As per coding guidelines.
Also applies to: 160-165, 324-333, 361-370, 436-445, 531-534, 599-607
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
python/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
docker-compose.yml(2 hunks)migration/agent_work_orders_repositories.sql(1 hunks)python/pyproject.toml(4 hunks)python/src/agent_work_orders/api/routes.py(1 hunks)python/src/agent_work_orders/models.py(1 hunks)python/src/agent_work_orders/server.py(1 hunks)python/src/agent_work_orders/state_manager/repository_config_repository.py(1 hunks)python/tests/agent_work_orders/test_repository_config_repository.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- python/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (4)
python/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/**/*.py: Preserve full stack traces in Python logging by using exc_info=True
Use specific exception types; avoid catching generic Exception
Include relevant IDs, URLs, and operation context in error messages
Never return None to indicate failure; raise an exception with details instead
Files:
python/tests/agent_work_orders/test_repository_config_repository.pypython/src/agent_work_orders/server.pypython/src/agent_work_orders/models.pypython/src/agent_work_orders/state_manager/repository_config_repository.pypython/src/agent_work_orders/api/routes.py
python/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Backend tests use pytest with async support
Files:
python/tests/agent_work_orders/test_repository_config_repository.py
**/*.{ts,tsx,py}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,py}: Do not include keywords SIMPLIFIED, ENHANCED, LEGACY, CHANGED, or REMOVED in code comments
Code comments should describe functionality and reasoning only; do not mention beta status or global rules
Files:
python/tests/agent_work_orders/test_repository_config_repository.pypython/src/agent_work_orders/server.pypython/src/agent_work_orders/models.pypython/src/agent_work_orders/state_manager/repository_config_repository.pypython/src/agent_work_orders/api/routes.py
python/src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code
Files:
python/src/agent_work_orders/server.pypython/src/agent_work_orders/models.pypython/src/agent_work_orders/state_manager/repository_config_repository.pypython/src/agent_work_orders/api/routes.py
🧬 Code graph analysis (4)
python/tests/agent_work_orders/test_repository_config_repository.py (2)
python/src/agent_work_orders/models.py (3)
ConfiguredRepository(178-208)SandboxType(27-33)WorkflowStep(43-51)python/src/agent_work_orders/state_manager/repository_config_repository.py (7)
RepositoryConfigRepository(39-351)list_repositories(126-152)get_repository(154-192)create_repository(194-250)update_repository(252-313)delete_repository(315-351)_row_to_model(66-124)
python/src/agent_work_orders/server.py (3)
python/src/agent_work_orders/utils/structured_logger.py (2)
configure_structured_logging_with_buffer(98-127)get_logger(164-177)python/src/agent_work_orders/utils/log_buffer.py (2)
start_cleanup_task(196-217)stop_cleanup_task(219-231)python/src/agent_work_orders/state_manager/repository_config_repository.py (1)
get_supabase_client(19-36)
python/src/agent_work_orders/state_manager/repository_config_repository.py (1)
python/src/agent_work_orders/models.py (3)
ConfiguredRepository(178-208)SandboxType(27-33)WorkflowStep(43-51)
python/src/agent_work_orders/api/routes.py (10)
python/src/agent_work_orders/github_integration/github_client.py (2)
verify_repository_access(24-69)get_repository_info(71-132)python/src/agent_work_orders/models.py (14)
AgentPromptRequest(135-139)AgentWorkflowPhase(36-40)AgentWorkOrder(68-93)AgentWorkOrderResponse(127-132)AgentWorkOrderState(54-65)AgentWorkOrderStatus(12-18)ConfiguredRepository(178-208)CreateAgentWorkOrderRequest(96-124)CreateRepositoryRequest(211-223)GitHubRepositoryVerificationRequest(153-156)GitHubRepositoryVerificationResponse(159-166)GitProgressSnapshot(142-150)StepHistory(290-320)UpdateRepositoryRequest(226-240)python/src/agent_work_orders/state_manager/repository_config_repository.py (5)
create_repository(194-250)list_repositories(126-152)update_repository(252-313)delete_repository(315-351)get_repository(154-192)python/src/agent_work_orders/state_manager/repository_factory.py (1)
create_repository(15-43)python/src/agent_work_orders/utils/id_generator.py (1)
generate_work_order_id(9-18)python/src/agent_work_orders/utils/log_buffer.py (3)
WorkOrderLogBuffer(15-252)get_logs(72-125)get_log_count(242-252)python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py (2)
WorkflowOrchestrator(32-315)execute_workflow(50-283)python/src/agent_work_orders/api/sse_streams.py (1)
stream_work_order_logs(16-95)python/src/agent_work_orders/state_manager/file_state_repository.py (4)
create(109-128)get(130-147)delete(322-335)get_step_history(306-320)python/src/agent_work_orders/state_manager/work_order_repository.py (3)
create(30-43)get(45-60)get_step_history(164-174)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (1)
docker-compose.yml (1)
30-33: AGENT_WORK_ORDERS_PORT is properly read and used by the server.Verification confirms the environment variable is correctly wired:
- The agent-work-orders server reads the port in
python/src/agent_work_orders/server.pyline 265 and passes it to uvicorn- Service discovery reads the port in
python/src/server/config/service_discovery.pyline 35 for configuration- Port is mapped through docker-compose, Dockerfile args/env, and actively used for service binding
| -- Per-repository preferences | ||
| -- Note: default_sandbox_type is intentionally restricted to production-ready types only. | ||
| -- Experimental types (git_branch, e2b, dagger) are blocked for safety and stability. | ||
| default_sandbox_type TEXT DEFAULT 'git_worktree' | ||
| CHECK (default_sandbox_type IN ('git_worktree', 'full_clone', 'tmp_dir')), | ||
| default_commands JSONB DEFAULT '["create-branch", "planning", "execute", "commit", "create-pr"]'::jsonb, | ||
|
|
There was a problem hiding this comment.
CHECK constraint conflicts with code and tests.
DB permits ('git_worktree','full_clone','tmp_dir'), but code/test use SandboxType.GIT_BRANCH. This will 409/500 on update and fail to read rows. Align the allowed set.
- default_sandbox_type TEXT DEFAULT 'git_worktree'
- CHECK (default_sandbox_type IN ('git_worktree', 'full_clone', 'tmp_dir')),
+ default_sandbox_type TEXT DEFAULT 'git_worktree'
+ CHECK (default_sandbox_type IN ('git_worktree', 'git_branch')),🤖 Prompt for AI Agents
In migration/agent_work_orders_repositories.sql around lines 37 to 43, the CHECK
constraint for default_sandbox_type currently allows only
('git_worktree','full_clone','tmp_dir') which conflicts with code/tests that use
SandboxType.GIT_BRANCH; modify the CHECK to include 'git_branch' (and, if
desired, any other production-safe values used by code) and ensure the DEFAULT
remains consistent (or change the DEFAULT to a value present in the updated
set); apply a migration to alter the constraint so existing rows with
'git_branch' are accepted without causing 409/500 errors.
| -- Policy 2: Authenticated users can read and update (for frontend operations) | ||
| CREATE POLICY "Allow authenticated users to read and update archon_configured_repositories" | ||
| ON archon_configured_repositories | ||
| FOR ALL | ||
| TO authenticated | ||
| USING (true); | ||
|
|
There was a problem hiding this comment.
RLS policy is overly permissive.
Policy claims “read and update” but FOR ALL gives authenticated users DELETE/INSERT. Split into SELECT and UPDATE only.
-CREATE POLICY "Allow authenticated users to read and update archon_configured_repositories"
- ON archon_configured_repositories
- FOR ALL
- TO authenticated
- USING (true);
+CREATE POLICY "Allow authenticated users to read archon_configured_repositories"
+ ON archon_configured_repositories
+ FOR SELECT
+ TO authenticated
+ USING (true);
+
+CREATE POLICY "Allow authenticated users to update archon_configured_repositories"
+ ON archon_configured_repositories
+ FOR UPDATE
+ TO authenticated
+ USING (true)
+ WITH CHECK (true);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -- Policy 2: Authenticated users can read and update (for frontend operations) | |
| CREATE POLICY "Allow authenticated users to read and update archon_configured_repositories" | |
| ON archon_configured_repositories | |
| FOR ALL | |
| TO authenticated | |
| USING (true); | |
| -- Policy 2: Authenticated users can read and update (for frontend operations) | |
| CREATE POLICY "Allow authenticated users to read archon_configured_repositories" | |
| ON archon_configured_repositories | |
| FOR SELECT | |
| TO authenticated | |
| USING (true); | |
| CREATE POLICY "Allow authenticated users to update archon_configured_repositories" | |
| ON archon_configured_repositories | |
| FOR UPDATE | |
| TO authenticated | |
| USING (true) | |
| WITH CHECK (true); |
🤖 Prompt for AI Agents
In migration/agent_work_orders_repositories.sql around lines 98 to 104, the RLS
policy uses FOR ALL which wrongly grants authenticated users INSERT and DELETE
in addition to SELECT and UPDATE; replace the single FOR ALL policy with two
explicit policies: one FOR SELECT TO authenticated USING (true) and one FOR
UPDATE TO authenticated USING (true), ensuring no INSERT or DELETE policies are
created and removing the original FOR ALL policy.
| default_sandbox_type: SandboxType = Field( | ||
| default=SandboxType.GIT_WORKTREE, | ||
| description="Default sandbox type for work orders: git_worktree (default), full_clone, or tmp_dir" | ||
| ) |
There was a problem hiding this comment.
Docstring mismatch for SandboxType choices.
Values mention full_clone/tmp_dir, which are not in the enum. Update text to match: git_branch, git_worktree (default), e2b, dagger.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/models.py around lines 193 to 196, the Field
description for default_sandbox_type lists invalid choices (full_clone/tmp_dir);
update the description to list the actual SandboxType enum members and mark the
default correctly — e.g. "Default sandbox type for work orders: git_branch,
git_worktree (default), e2b, or dagger". Ensure the text exactly matches the
enum member names and notes git_worktree as the default.
| default_commands: list[WorkflowStep] = Field( | ||
| default=[ | ||
| WorkflowStep.CREATE_BRANCH, | ||
| WorkflowStep.PLANNING, | ||
| WorkflowStep.EXECUTE, | ||
| WorkflowStep.COMMIT, | ||
| WorkflowStep.CREATE_PR, | ||
| ], | ||
| description="Default workflow commands for work orders" | ||
| ) |
There was a problem hiding this comment.
Repository default_commands also missing REVIEW.
Keep repository defaults consistent with the full 6-step workflow.
default_commands: list[WorkflowStep] = Field(
default=[
WorkflowStep.CREATE_BRANCH,
WorkflowStep.PLANNING,
WorkflowStep.EXECUTE,
WorkflowStep.COMMIT,
WorkflowStep.CREATE_PR,
+ WorkflowStep.REVIEW,
],
description="Default workflow commands for work orders"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default_commands: list[WorkflowStep] = Field( | |
| default=[ | |
| WorkflowStep.CREATE_BRANCH, | |
| WorkflowStep.PLANNING, | |
| WorkflowStep.EXECUTE, | |
| WorkflowStep.COMMIT, | |
| WorkflowStep.CREATE_PR, | |
| ], | |
| description="Default workflow commands for work orders" | |
| ) | |
| default_commands: list[WorkflowStep] = Field( | |
| default=[ | |
| WorkflowStep.CREATE_BRANCH, | |
| WorkflowStep.PLANNING, | |
| WorkflowStep.EXECUTE, | |
| WorkflowStep.COMMIT, | |
| WorkflowStep.CREATE_PR, | |
| WorkflowStep.REVIEW, | |
| ], | |
| description="Default workflow commands for work orders" | |
| ) |
🤖 Prompt for AI Agents
In python/src/agent_work_orders/models.py around lines 197 to 206, the
repository default_commands list omits the REVIEW step and should include all
six workflow steps; update the default list to include WorkflowStep.REVIEW in
the appropriate position (after EXECUTE and before COMMIT) so the default
matches the full 6-step workflow and update the description if needed to reflect
the complete workflow.
| # Convert default_sandbox_type from string to SandboxType enum | ||
| sandbox_type_raw = row.get("default_sandbox_type", "git_worktree") | ||
| try: | ||
| sandbox_type = SandboxType(sandbox_type_raw) | ||
| except ValueError as e: | ||
| self._logger.error( | ||
| "invalid_sandbox_type_in_database", | ||
| repository_id=repository_id, | ||
| invalid_type=sandbox_type_raw, | ||
| error=str(e), | ||
| exc_info=True | ||
| ) | ||
| raise ValueError( | ||
| f"Database contains invalid sandbox type for repository {repository_id}: {sandbox_type_raw}" | ||
| ) from e | ||
|
|
There was a problem hiding this comment.
Model↔DB enum mismatch will break reads/writes.
SandboxType supports ["git_branch","git_worktree","e2b","dagger"], but migration CHECK allows ["git_worktree","full_clone","tmp_dir"]. This will raise ValueError on "full_clone"/"tmp_dir" and block "git_branch". Align before merge (update migration to allow git_branch, or update enums and code to match migration).
🤖 Prompt for AI Agents
In python/src/agent_work_orders/state_manager/repository_config_repository.py
around lines 96 to 111, the code casts the DB value to SandboxType but the DB
migration allows values ("git_worktree","full_clone","tmp_dir") while the enum
supports ("git_branch","git_worktree","e2b","dagger"), causing ValueError for
"full_clone"/"tmp_dir" and blocking "git_branch"; fix by aligning the model and
DB: either (A) update the DB migration to include "git_branch" and remove or
alias "full_clone"/"tmp_dir", or (B) update SandboxType to include the migration
values (full_clone, tmp_dir) and add a deterministic mapping/normalization layer
in this code that converts legacy names (e.g., "full_clone"->"git_branch" or
vice versa) before casting, and add unit tests to cover all current DB values to
prevent regressions.
| async def test_update_repository_success(repository_instance, mock_supabase_client): | ||
| """Test updating a repository""" | ||
| mock_data = [{ | ||
| "id": "repo-1", | ||
| "repository_url": "https://github.com/test/repo1", | ||
| "display_name": "test/repo1", | ||
| "owner": "test", | ||
| "default_branch": "main", | ||
| "is_verified": True, | ||
| "last_verified_at": datetime.now().isoformat(), | ||
| "default_sandbox_type": "git_branch", # Updated value (valid enum) | ||
| "default_commands": ["create-branch", "execute"], # Updated value | ||
| "created_at": datetime.now().isoformat(), | ||
| "updated_at": datetime.now().isoformat(), | ||
| }] | ||
| mock_supabase_client.execute.return_value = MagicMock(data=mock_data) | ||
|
|
||
| repository = await repository_instance.update_repository( | ||
| "repo-1", | ||
| default_sandbox_type=SandboxType.GIT_BRANCH, | ||
| default_commands=[WorkflowStep.CREATE_BRANCH, WorkflowStep.EXECUTE], | ||
| ) | ||
|
|
There was a problem hiding this comment.
Enum vs DB constraint mismatch will fail in production.
Test uses SandboxType.GIT_BRANCH as valid, but the migration blocks git_branch (allows full_clone/tmp_dir instead). This will 500 at runtime when persisting/reading. Align enums and DB before merge. See migration lines 37-43.
🤖 Prompt for AI Agents
In python/tests/agent_work_orders/test_repository_config_repository.py around
lines 191-213, the test sets default_sandbox_type=SandboxType.GIT_BRANCH which
conflicts with the DB migration that only allows "full_clone" and "tmp_dir";
update the test to use one of the allowed enum values (e.g.,
SandboxType.FULL_CLONE or SandboxType.TMP_DIR) or alternatively update the
application enum/migration to include "git_branch" so they match, and ensure the
value written/read from the DB maps to the exact string the migration expects.
There was a problem hiding this comment.
Actionable comments posted: 13
♻️ Duplicate comments (4)
python/src/agent_work_orders/server.py (4)
80-92: Use config URL helpers for Archon services + specific exceptions and logging.Replace direct getenv with config helpers and log with exc_info; limit except clauses to httpx types.
- # Log service URLs - archon_server_url = os.getenv("ARCHON_SERVER_URL") - archon_mcp_url = os.getenv("ARCHON_MCP_URL") + # Log service URLs + archon_server_url = config.get_archon_server_url() + archon_mcp_url = config.get_archon_mcp_url() @@ - archon_server_url = os.getenv("ARCHON_SERVER_URL") + archon_server_url = config.get_archon_server_url() if archon_server_url: try: async with httpx.AsyncClient(timeout=5.0) as client: response = await client.get(f"{archon_server_url}/health") health_status["dependencies"]["archon_server"] = { "available": response.status_code == 200, "url": archon_server_url, } - except Exception as e: + except (httpx.HTTPError, httpx.TimeoutException) as e: + logger.error("archon_server_health_check_failed", url=archon_server_url, error=str(e), exc_info=True) health_status["dependencies"]["archon_server"] = { "available": False, "url": archon_server_url, "error": str(e), } @@ - archon_mcp_url = os.getenv("ARCHON_MCP_URL") + archon_mcp_url = config.get_archon_mcp_url() if archon_mcp_url: try: async with httpx.AsyncClient(timeout=5.0) as client: response = await client.get(f"{archon_mcp_url}/health") health_status["dependencies"]["archon_mcp"] = { "available": response.status_code == 200, "url": archon_mcp_url, } - except Exception as e: + except (httpx.HTTPError, httpx.TimeoutException) as e: + logger.error("mcp_server_health_check_failed", url=archon_mcp_url, error=str(e), exc_info=True) health_status["dependencies"]["archon_mcp"] = { "available": False, "url": archon_mcp_url, "error": str(e), }Also applies to: 183-199, 208-224
45-79: Fail fast on missing critical dependencies (Claude CLI, Git).Startup only logs errors; per guidelines this must raise and stop the service. Also narrow exceptions and include exc_info.
@@ - # Validate Claude CLI is available + # Validate Claude CLI is available (fail fast) try: result = subprocess.run( [config.CLAUDE_CLI_PATH, "--version"], capture_output=True, text=True, timeout=5, ) if result.returncode == 0: logger.info( "Claude CLI validation successful", - extra={"version": result.stdout.strip()}, + version=result.stdout.strip(), ) else: - logger.error( - "Claude CLI validation failed", - extra={"error": result.stderr}, - ) + error_msg = f"Claude CLI validation failed: {result.stderr}" + logger.error("claude_cli_validation_failed", error=result.stderr, exc_info=True) + raise RuntimeError(error_msg) - except FileNotFoundError: - logger.error( - "Claude CLI not found", - extra={"path": config.CLAUDE_CLI_PATH}, - ) - except Exception as e: - logger.error( - "Claude CLI validation error", - extra={"error": str(e)}, - ) + except FileNotFoundError: + logger.error("claude_cli_not_found", path=config.CLAUDE_CLI_PATH, exc_info=True) + raise + except (subprocess.TimeoutExpired, subprocess.SubprocessError, OSError) as e: + logger.error("claude_cli_validation_error", error=str(e), exc_info=True) + raise @@ - # Validate git is available - if not shutil.which("git"): - logger.error("Git not found in PATH") + # Validate git is available (fail fast) + if not shutil.which("git"): + logger.error("git_not_found", path="PATH", exc_info=True) + raise RuntimeError("Git not found in PATH - required for repository operations") else: - logger.info("Git validation successful") + logger.info("Git validation successful")
108-116: CORS: avoid wildcard with credentials.allow_origins=["*"] + allow_credentials=True violates CORS and browsers reject it. Default to explicit localhost origins; keep credentials only with explicit origins.
-# CORS middleware with permissive settings for development -cors_origins = os.getenv("CORS_ORIGINS", "*").split(",") +# CORS middleware - configure allowed origins via env; default to localhost for dev +default_origins = "http://localhost:3000,http://localhost:5173" +cors_origins = os.getenv("CORS_ORIGINS", default_origins).split(",") +allow_credentials = "*" not in cors_origins app.add_middleware( CORSMiddleware, allow_origins=cors_origins, - allow_credentials=True, + allow_credentials=allow_credentials, allow_methods=["*"], allow_headers=["*"], )
122-156: Health checks: avoid generic Exception and preserve stack traces.Narrow exceptions and log with exc_info=True to retain the traceback.
@@ - try: + logger = get_logger(__name__) + try: result = subprocess.run( @@ - except Exception as e: + except (FileNotFoundError, subprocess.SubprocessError, OSError, ValueError) as e: + logger.error("claude_cli_health_check_failed", error=str(e), exc_info=True) health_status["dependencies"]["claude_cli"] = { "available": False, "error": str(e), } @@ - try: + try: result = subprocess.run( @@ - except Exception as e: + except (FileNotFoundError, subprocess.SubprocessError, OSError) as e: + logger.error("github_cli_health_check_failed", error=str(e), exc_info=True) health_status["dependencies"]["github_cli"] = { "available": False, "authenticated": False, "error": str(e), } @@ - supabase_url = os.getenv("SUPABASE_URL") + supabase_url = os.getenv("SUPABASE_URL") if supabase_url: try: from .state_manager.repository_config_repository import get_supabase_client @@ - except Exception as e: + except (ValueError, Exception) as e: + logger.error("supabase_health_check_failed", url=supabase_url, error=str(e), exc_info=True) health_status["dependencies"]["supabase"] = { "available": False, "table_exists": False, "error": str(e), }Also applies to: 162-176, 225-245
🧹 Nitpick comments (12)
docker-compose.yml (1)
188-199: Inconsistent healthcheck command formatting.The healthcheck command uses regular double quotes for the URL, whereas archon-server (line 60) and archon-mcp (line 106) use escaped single quotes (
''). While both approaches work due to Docker Compose variable substitution, maintaining consistency improves readability and reduces cognitive load.Consider aligning the quoting style with existing healthcheck patterns:
- 'import urllib.request; urllib.request.urlopen("http://localhost:${AGENT_WORK_ORDERS_PORT:-8053}/health")', + 'python -c "import urllib.request; urllib.request.urlopen(''http://localhost:${AGENT_WORK_ORDERS_PORT:-8053}/health'')"',migration/agent_work_orders_state.sql (3)
60-63: Prefer DOUBLE PRECISION and UTC defaults for temporal fields.
- Use DOUBLE PRECISION instead of FLOAT for clarity in Postgres.
- Consider defaulting executed_at to now() AT TIME ZONE 'UTC' if client omits it.
- duration_seconds FLOAT NOT NULL, -- Execution duration + duration_seconds DOUBLE PRECISION NOT NULL, -- Execution duration - executed_at TIMESTAMP WITH TIME ZONE NOT NULL, -- When step was executed + executed_at TIMESTAMP WITH TIME ZONE NOT NULL, -- When step was executedOptionally:
ALTER TABLE archon_agent_work_order_steps ALTER COLUMN executed_at SET DEFAULT NOW();Would you like me to apply the DEFAULT now() change, or do you prefer client-sourced timestamps only?
90-97: Add composite index to match query pattern and enforce step ordering uniqueness.Your reads filter by agent_work_order_id and order by step_order. Add (agent_work_order_id, step_order) index and an optional uniqueness constraint:
+CREATE INDEX IF NOT EXISTS idx_awos_work_order_id_step_order + ON archon_agent_work_order_steps(agent_work_order_id, step_order); + +ALTER TABLE archon_agent_work_order_steps + ADD CONSTRAINT uq_awos_order UNIQUE (agent_work_order_id, step_order);
217-229: Verification DO blocks raise EXCEPTION; consider WARN/NOTICE for idempotent migrations.Raising EXCEPTION aborts the migration in shared environments (e.g., partially provisioned dev DBs). Prefer RAISE WARNING/NOTICE or separate smoke tests.
Also applies to: 231-244, 245-257, 258-270, 271-284, 285-297, 298-310, 311-324
python/src/agent_work_orders/state_manager/supabase_repository.py (2)
248-252: Avoid manually setting updated_at; let the DB trigger manage it.Remove application‑side updated_at writes to prevent timezone inconsistency and race conditions with the trigger.
- "updated_at": datetime.now().isoformat(),Also applies to: 309-311, 343-345
197-199: Use explicit None check for enum filters.Truthiness of Enums is always True. Use is not None:
- if status_filter: + if status_filter is not None: query = query.eq("status", status_filter.value)python/src/agent_work_orders/state_manager/repository_factory.py (1)
25-30: Fail-fast on invalid STATE_STORAGE_TYPE values?You already fall back to memory with a warning. If this service runs in prod, consider failing fast instead to avoid silent misconfigurations. Optional:
- else: - logger.warning( - "unknown_storage_type", - storage_type=storage_type, - fallback="memory" - ) - return WorkOrderRepository() + else: + raise ValueError(f"Unknown STATE_STORAGE_TYPE={storage_type!r}; expected 'supabase' | 'file' | 'memory'")python/src/agent_work_orders/utils/state_reconciliation.py (3)
48-53: Filter None identifiers and make results deterministic.Avoid including None/empty sandbox identifiers and return a stable order for reproducibility.
- work_orders = await repository.list() - database_identifiers = {state.sandbox_identifier for state, _ in work_orders} + work_orders = await repository.list() + database_identifiers = { + state.sandbox_identifier + for state, _ in work_orders + if getattr(state, "sandbox_identifier", None) + } @@ - orphans = filesystem_worktrees - database_identifiers + orphans = sorted(filesystem_worktrees - database_identifiers) @@ - orphans=list(orphans)[:10], # Log first 10 to avoid spam + orphans=orphans[:10], # Log first 10 to avoid spam @@ - return [str(worktree_base / name) for name in orphans] + return [str(worktree_base / name) for name in orphans]Also applies to: 62-62
89-95: Skip entries without sandbox identifiers and sort output.Don’t flag records lacking sandbox_identifier as dangling; return a sorted list for consistent logs.
- dangling = [] - for state, _ in work_orders: - worktree_path = worktree_base / state.sandbox_identifier + dangling: list[str] = [] + for state, _ in work_orders: + if not getattr(state, "sandbox_identifier", None): + continue + worktree_path = worktree_base / state.sandbox_identifier if not worktree_path.exists(): dangling.append(state.agent_work_order_id) @@ - return dangling + return sorted(dangling)Also applies to: 102-102
39-46: Consider symlink and name-pattern hardening.To reduce accidental deletions, also skip symlinks (unlink instead) and optionally enforce a name pattern (e.g., prefix) for valid worktrees. I can add this guard if you specify the canonical sandbox_identifier pattern.
Also applies to: 84-94, 140-151
python/src/agent_work_orders/server.py (2)
34-40: Use structured logging fields instead of extra dict.Pass fields directly; this aligns with structlog configuration.
- logger.info( - "Starting Agent Work Orders service", - extra={ - "port": os.getenv("AGENT_WORK_ORDERS_PORT", "8053"), - "service_discovery_mode": os.getenv("SERVICE_DISCOVERY_MODE", "local"), - }, - ) + logger.info( + "Starting Agent Work Orders service", + port=os.getenv("AGENT_WORK_ORDERS_PORT", "8053"), + service_discovery_mode=os.getenv("SERVICE_DISCOVERY_MODE", "local"), + )
246-254: Degrade status when DB is unhealthy under Supabase storage.Reflect database failure in overall status.
- critical_deps_ok = ( - health_status["dependencies"].get("claude_cli", {}).get("available", False) - and health_status["dependencies"].get("git", {}).get("available", False) - ) + critical_deps_ok = ( + health_status["dependencies"].get("claude_cli", {}).get("available", False) + and health_status["dependencies"].get("git", {}).get("available", False) + ) + if config.STATE_STORAGE_TYPE.lower() == "supabase": + critical_deps_ok = critical_deps_ok and health_status.get("database", {}).get("status") == "healthy"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
docker-compose.yml(2 hunks)migration/AGENT_WORK_ORDERS.md(1 hunks)migration/agent_work_orders_state.sql(1 hunks)python/src/agent_work_orders/README.md(1 hunks)python/src/agent_work_orders/database/__init__.py(1 hunks)python/src/agent_work_orders/database/client.py(1 hunks)python/src/agent_work_orders/server.py(1 hunks)python/src/agent_work_orders/state_manager/repository_factory.py(1 hunks)python/src/agent_work_orders/state_manager/supabase_repository.py(1 hunks)python/src/agent_work_orders/utils/state_reconciliation.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- migration/AGENT_WORK_ORDERS.md
🧰 Additional context used
📓 Path-based instructions (3)
python/src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/**/*.py: On service startup, missing configuration, DB connection failures, auth/authorization failures, critical dependency outages, or invalid/corrupting data: fail fast and bubble errors
For batch processing, background tasks, WebSocket events, optional features, and external API calls: continue processing but log errors (with retries/backoff for APIs)
Never accept or persist corrupted data; skip failed items entirely (e.g., zero embeddings, null FKs, malformed JSON)
Error messages must include operation context, IDs/URLs, use specific exception types, preserve full stack traces (logging with exc_info=True), and avoid returning None/null—raise exceptions instead; for batches report success counts and detailed failures
Backend code targets Python 3.12 and adheres to a 120 character line length
Use Ruff for linting (errors, warnings, unused imports) in backend code
Use Mypy for static type checking in backend code
Files:
python/src/agent_work_orders/utils/state_reconciliation.pypython/src/agent_work_orders/state_manager/supabase_repository.pypython/src/agent_work_orders/server.pypython/src/agent_work_orders/database/client.pypython/src/agent_work_orders/state_manager/repository_factory.pypython/src/agent_work_orders/database/__init__.py
python/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/**/*.py: Preserve full stack traces in Python logging by using exc_info=True
Use specific exception types; avoid catching generic Exception
Include relevant IDs, URLs, and operation context in error messages
Never return None to indicate failure; raise an exception with details instead
Files:
python/src/agent_work_orders/utils/state_reconciliation.pypython/src/agent_work_orders/state_manager/supabase_repository.pypython/src/agent_work_orders/server.pypython/src/agent_work_orders/database/client.pypython/src/agent_work_orders/state_manager/repository_factory.pypython/src/agent_work_orders/database/__init__.py
**/*.{ts,tsx,py}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,py}: Do not include keywords SIMPLIFIED, ENHANCED, LEGACY, CHANGED, or REMOVED in code comments
Code comments should describe functionality and reasoning only; do not mention beta status or global rules
Files:
python/src/agent_work_orders/utils/state_reconciliation.pypython/src/agent_work_orders/state_manager/supabase_repository.pypython/src/agent_work_orders/server.pypython/src/agent_work_orders/database/client.pypython/src/agent_work_orders/state_manager/repository_factory.pypython/src/agent_work_orders/database/__init__.py
🧬 Code graph analysis (5)
python/src/agent_work_orders/utils/state_reconciliation.py (1)
python/src/agent_work_orders/state_manager/supabase_repository.py (1)
update_status(219-290)
python/src/agent_work_orders/state_manager/supabase_repository.py (2)
python/src/agent_work_orders/database/client.py (1)
get_agent_work_orders_client(17-42)python/src/agent_work_orders/models.py (1)
AgentWorkOrderState(54-65)
python/src/agent_work_orders/server.py (4)
python/src/agent_work_orders/database/client.py (1)
check_database_health(45-74)python/src/agent_work_orders/utils/structured_logger.py (2)
configure_structured_logging_with_buffer(98-127)get_logger(164-177)python/src/agent_work_orders/utils/log_buffer.py (2)
start_cleanup_task(196-217)stop_cleanup_task(219-231)python/src/agent_work_orders/state_manager/repository_config_repository.py (1)
get_supabase_client(19-36)
python/src/agent_work_orders/state_manager/repository_factory.py (3)
python/src/agent_work_orders/state_manager/file_state_repository.py (1)
FileStateRepository(22-343)python/src/agent_work_orders/state_manager/supabase_repository.py (1)
SupabaseWorkOrderRepository(31-484)python/src/agent_work_orders/state_manager/work_order_repository.py (1)
WorkOrderRepository(16-174)
python/src/agent_work_orders/database/__init__.py (1)
python/src/agent_work_orders/database/client.py (2)
check_database_health(45-74)get_agent_work_orders_client(17-42)
🪛 markdownlint-cli2 (0.18.1)
python/src/agent_work_orders/README.md
136-136: Bare URL used
(MD034, no-bare-urls)
177-177: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
181-181: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
185-185: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
261-261: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
276-276: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
409-409: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Backend Tests (Python + pytest)
🔇 Additional comments (4)
docker-compose.yml (2)
30-30: Port variable addition is appropriately wired.The AGENT_WORK_ORDERS_PORT environment variable enables service discovery within the archon-server container.
150-180: No issues found. The configuration is correct and well-documented.The three concerns in the original review comment are not valid:
Dependency guarantee via archon-server: When
archon-agent-work-ordersstarts using--profile work-orders, its dependency onarchon-servertriggersarchon-serverto start. Sincearchon-mcphas no profile restriction (starts by default) and explicitly depends onarchon-serverbecoming healthy, the transitive dependency ensuresarchon-mcpis running and healthy beforearchon-agent-work-ordersneeds to access it. Docker Compose respects this ordering.GITHUB_PAT_TOKEN documentation: The requirement is clearly documented in
.env.examplewith full context: token source, required scopes (repo, workflow), and purpose for gh CLI authentication.Dockerfile.agent-work-orders: Verified to exist at
./python/Dockerfile.agent-work-orders.Likely an incorrect or invalid review comment.
python/src/agent_work_orders/database/__init__.py (1)
1-8: LGTM.Concise re-exports and explicit all keep the API clean.
python/src/agent_work_orders/server.py (1)
271-280: Based on the gathered evidence, I have sufficient information to conclude. The import path"src.agent_work_orders.server:app"is verified to be correct:
- The README.md at
./src/agent_work_orders/README.mdexplicitly documents:uv run python -m uvicorn src.agent_work_orders.server:app --port 8053 --reload- Test files consistently import using this path:
from src.agent_work_orders.server import app- The import path is consistent across the codebase (multiple test files, agent executor, API tests)
The path will resolve correctly when uvicorn runs from the repository root with
src/as the Python package base.The import path is correct; no changes needed.
| -- Apply auto-update trigger for updated_at timestamp | ||
| -- Reuses existing update_updated_at_column() function from Archon migrations | ||
| CREATE TRIGGER update_agent_work_orders_updated_at | ||
| BEFORE UPDATE ON archon_agent_work_orders | ||
| FOR EACH ROW | ||
| EXECUTE FUNCTION update_updated_at_column(); | ||
|
|
There was a problem hiding this comment.
Trigger creation is not idempotent; re-running migration will fail. Guard CREATE TRIGGER with an existence check.
Wrap in DO block to only create when missing:
-CREATE TRIGGER update_agent_work_orders_updated_at
- BEFORE UPDATE ON archon_agent_work_orders
- FOR EACH ROW
- EXECUTE FUNCTION update_updated_at_column();
+DO $$
+BEGIN
+ IF NOT EXISTS (
+ SELECT 1 FROM pg_trigger
+ WHERE tgname = 'update_agent_work_orders_updated_at'
+ AND tgrelid = 'archon_agent_work_orders'::regclass
+ ) THEN
+ CREATE TRIGGER update_agent_work_orders_updated_at
+ BEFORE UPDATE ON archon_agent_work_orders
+ FOR EACH ROW
+ EXECUTE FUNCTION update_updated_at_column();
+ END IF;
+END $$;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -- Apply auto-update trigger for updated_at timestamp | |
| -- Reuses existing update_updated_at_column() function from Archon migrations | |
| CREATE TRIGGER update_agent_work_orders_updated_at | |
| BEFORE UPDATE ON archon_agent_work_orders | |
| FOR EACH ROW | |
| EXECUTE FUNCTION update_updated_at_column(); | |
| -- Apply auto-update trigger for updated_at timestamp | |
| -- Reuses existing update_updated_at_column() function from Archon migrations | |
| DO $$ | |
| BEGIN | |
| IF NOT EXISTS ( | |
| SELECT 1 FROM pg_trigger | |
| WHERE tgname = 'update_agent_work_orders_updated_at' | |
| AND tgrelid = 'archon_agent_work_orders'::regclass | |
| ) THEN | |
| CREATE TRIGGER update_agent_work_orders_updated_at | |
| BEFORE UPDATE ON archon_agent_work_orders | |
| FOR EACH ROW | |
| EXECUTE FUNCTION update_updated_at_column(); | |
| END IF; | |
| END $$; |
🤖 Prompt for AI Agents
In migration/agent_work_orders_state.sql around lines 102 to 108, the CREATE
TRIGGER statement is not idempotent and will fail on re-run; wrap trigger
creation in a DO block that checks pg_trigger (or pg_class/pg_proc) for an
existing trigger named update_agent_work_orders_updated_at and only executes the
CREATE TRIGGER when it does not already exist, ensuring the trigger creation is
conditional and safe to re-run.
| -- Policy 2: Authenticated users can read and update (for frontend operations) | ||
| CREATE POLICY "Allow authenticated users to read and update archon_agent_work_orders" | ||
| ON archon_agent_work_orders | ||
| FOR ALL | ||
| TO authenticated | ||
| USING (true); | ||
|
|
||
| CREATE POLICY "Allow authenticated users to read and update archon_agent_work_order_steps" | ||
| ON archon_agent_work_order_steps | ||
| FOR ALL | ||
| TO authenticated | ||
| USING (true); | ||
|
|
There was a problem hiding this comment.
RLS policies are overly permissive; authenticated can read/write/delete all rows. Lock down by owner and per-operation policies.
Current FOR ALL TO authenticated USING (true) grants full table access to any signed‑in user. Add an ownership column/predicate and split policies by operation.
Option A (preferred): introduce a dedicated owner column and secure by auth.uid():
--- a/migration/agent_work_orders_state.sql
+++ b/migration/agent_work_orders_state.sql
@@
CREATE TABLE IF NOT EXISTS archon_agent_work_orders (
...
- status TEXT NOT NULL CHECK (status IN ('pending', 'running', 'completed', 'failed')),
+ status TEXT NOT NULL CHECK (status IN ('pending', 'running', 'completed', 'failed')),
+ owner_user_id UUID NOT NULL, -- who owns this work order
...
);
@@
-ALTER TABLE archon_agent_work_orders ENABLE ROW LEVEL SECURITY;
-ALTER TABLE archon_agent_work_order_steps ENABLE ROW LEVEL SECURITY;
+ALTER TABLE archon_agent_work_orders ENABLE ROW LEVEL SECURITY;
+ALTER TABLE archon_agent_work_order_steps ENABLE ROW LEVEL SECURITY;
@@
-CREATE POLICY "Allow authenticated users to read and update archon_agent_work_orders"
- ON archon_agent_work_orders
- FOR ALL
- TO authenticated
- USING (true);
+-- Authenticated users: read own rows
+CREATE POLICY "awo_select_own"
+ ON archon_agent_work_orders FOR SELECT TO authenticated
+ USING (owner_user_id = auth.uid());
+-- Insert own rows
+CREATE POLICY "awo_insert_own"
+ ON archon_agent_work_orders FOR INSERT TO authenticated
+ WITH CHECK (owner_user_id = auth.uid());
+-- Update own rows
+CREATE POLICY "awo_update_own"
+ ON archon_agent_work_orders FOR UPDATE TO authenticated
+ USING (owner_user_id = auth.uid())
+ WITH CHECK (owner_user_id = auth.uid());
+-- Delete own rows (optional)
+-- CREATE POLICY "awo_delete_own" ON archon_agent_work_orders FOR DELETE TO authenticated USING (owner_user_id = auth.uid());
@@
-CREATE POLICY "Allow authenticated users to read and update archon_agent_work_order_steps"
- ON archon_agent_work_order_steps
- FOR ALL
- TO authenticated
- USING (true);
+-- Steps table policies scope via parent work order ownership
+CREATE POLICY "awos_select_own"
+ ON archon_agent_work_order_steps FOR SELECT TO authenticated
+ USING (EXISTS (
+ SELECT 1 FROM archon_agent_work_orders awo
+ WHERE awo.agent_work_order_id = archon_agent_work_order_steps.agent_work_order_id
+ AND awo.owner_user_id = auth.uid()
+ ));
+CREATE POLICY "awos_insert_own"
+ ON archon_agent_work_order_steps FOR INSERT TO authenticated
+ WITH CHECK (EXISTS (
+ SELECT 1 FROM archon_agent_work_orders awo
+ WHERE awo.agent_work_order_id = archon_agent_work_order_steps.agent_work_order_id
+ AND awo.owner_user_id = auth.uid()
+ ));
+CREATE POLICY "awos_update_own"
+ ON archon_agent_work_order_steps FOR UPDATE TO authenticated
+ USING (EXISTS (
+ SELECT 1 FROM archon_agent_work_orders awo
+ WHERE awo.agent_work_order_id = archon_agent_work_order_steps.agent_work_order_id
+ AND awo.owner_user_id = auth.uid()
+ ))
+ WITH CHECK (EXISTS (
+ SELECT 1 FROM archon_agent_work_orders awo
+ WHERE awo.agent_work_order_id = archon_agent_work_order_steps.agent_work_order_id
+ AND awo.owner_user_id = auth.uid()
+ ));Option B (no new column): if ownership lives in metadata, scope via JSONB (less ideal, slower):
USING ((metadata->>'owner_user_id')::uuid = auth.uid())This change prevents cross‑tenant data exposure/modification.
🤖 Prompt for AI Agents
In migration/agent_work_orders_state.sql around lines 128 to 140 the current RLS
policies use FOR ALL TO authenticated USING (true) which permits any signed-in
user to read/write/delete all rows; replace the blanket policies with
per-operation policies scoped to an owner predicate: add an owner column
(preferred) or reference owner in metadata JSONB, then create separate policies
for SELECT (USING (owner_user_id = auth.uid())), INSERT (WITH CHECK
(owner_user_id = auth.uid())), UPDATE (USING (owner_user_id = auth.uid()) AND
WITH CHECK (owner_user_id = auth.uid())), and DELETE (USING (owner_user_id =
auth.uid())); if you cannot add a column, use the metadata JSONB predicate
(metadata->>'owner_user_id')::uuid = auth.uid() in the USING/WITH CHECK clauses
to enforce row-level ownership.
| async def check_database_health() -> dict[str, Any]: | ||
| """Check if agent work orders tables exist and are accessible. | ||
|
|
||
| Verifies that both archon_agent_work_orders and archon_agent_work_order_steps | ||
| tables exist and can be queried. This is a lightweight check using limit(0) | ||
| to avoid fetching actual data. | ||
|
|
||
| Returns: | ||
| Dictionary with health check results: | ||
| - status: "healthy" or "unhealthy" | ||
| - tables_exist: True if both tables are accessible, False otherwise | ||
| - error: Error message if check failed (only present when unhealthy) | ||
|
|
||
| Example: | ||
| >>> health = await check_database_health() | ||
| >>> if health["status"] == "healthy": | ||
| ... print("Database is ready") | ||
| """ | ||
| try: | ||
| client = get_agent_work_orders_client() | ||
|
|
||
| # Try to query both tables (limit 0 to avoid fetching data) | ||
| client.table("archon_agent_work_orders").select("agent_work_order_id").limit(0).execute() | ||
| client.table("archon_agent_work_order_steps").select("id").limit(0).execute() | ||
|
|
||
| logger.info("database_health_check_passed", tables=["archon_agent_work_orders", "archon_agent_work_order_steps"]) | ||
| return {"status": "healthy", "tables_exist": True} | ||
| except Exception as e: | ||
| logger.error("database_health_check_failed", error=str(e), exc_info=True) | ||
| return {"status": "unhealthy", "tables_exist": False, "error": str(e)} |
There was a problem hiding this comment.
Async health check performs blocking I/O; offload to a thread.
Prevent event loop stalls by moving .execute() calls off the loop:
-from typing import Any
+from typing import Any
+import asyncio
@@
async def check_database_health() -> dict[str, Any]:
@@
- client.table("archon_agent_work_orders").select("agent_work_order_id").limit(0).execute()
- client.table("archon_agent_work_order_steps").select("id").limit(0).execute()
+ await asyncio.to_thread(
+ lambda: client.table("archon_agent_work_orders").select("agent_work_order_id").limit(0).execute()
+ )
+ await asyncio.to_thread(
+ lambda: client.table("archon_agent_work_order_steps").select("id").limit(0).execute()
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def check_database_health() -> dict[str, Any]: | |
| """Check if agent work orders tables exist and are accessible. | |
| Verifies that both archon_agent_work_orders and archon_agent_work_order_steps | |
| tables exist and can be queried. This is a lightweight check using limit(0) | |
| to avoid fetching actual data. | |
| Returns: | |
| Dictionary with health check results: | |
| - status: "healthy" or "unhealthy" | |
| - tables_exist: True if both tables are accessible, False otherwise | |
| - error: Error message if check failed (only present when unhealthy) | |
| Example: | |
| >>> health = await check_database_health() | |
| >>> if health["status"] == "healthy": | |
| ... print("Database is ready") | |
| """ | |
| try: | |
| client = get_agent_work_orders_client() | |
| # Try to query both tables (limit 0 to avoid fetching data) | |
| client.table("archon_agent_work_orders").select("agent_work_order_id").limit(0).execute() | |
| client.table("archon_agent_work_order_steps").select("id").limit(0).execute() | |
| logger.info("database_health_check_passed", tables=["archon_agent_work_orders", "archon_agent_work_order_steps"]) | |
| return {"status": "healthy", "tables_exist": True} | |
| except Exception as e: | |
| logger.error("database_health_check_failed", error=str(e), exc_info=True) | |
| return {"status": "unhealthy", "tables_exist": False, "error": str(e)} | |
| async def check_database_health() -> dict[str, Any]: | |
| """Check if agent work orders tables exist and are accessible. | |
| Verifies that both archon_agent_work_orders and archon_agent_work_order_steps | |
| tables exist and can be queried. This is a lightweight check using limit(0) | |
| to avoid fetching actual data. | |
| Returns: | |
| Dictionary with health check results: | |
| - status: "healthy" or "unhealthy" | |
| - tables_exist: True if both tables are accessible, False otherwise | |
| - error: Error message if check failed (only present when unhealthy) | |
| Example: | |
| >>> health = await check_database_health() | |
| >>> if health["status"] == "healthy": | |
| ... print("Database is ready") | |
| """ | |
| try: | |
| client = get_agent_work_orders_client() | |
| # Try to query both tables (limit 0 to avoid fetching data) | |
| await asyncio.to_thread( | |
| lambda: client.table("archon_agent_work_orders").select("agent_work_order_id").limit(0).execute() | |
| ) | |
| await asyncio.to_thread( | |
| lambda: client.table("archon_agent_work_order_steps").select("id").limit(0).execute() | |
| ) | |
| logger.info("database_health_check_passed", tables=["archon_agent_work_orders", "archon_agent_work_order_steps"]) | |
| return {"status": "healthy", "tables_exist": True} | |
| except Exception as e: | |
| logger.error("database_health_check_failed", error=str(e), exc_info=True) | |
| return {"status": "unhealthy", "tables_exist": False, "error": str(e)} |
🤖 Prompt for AI Agents
In python/src/agent_work_orders/database/client.py around lines 45 to 74, the
health check calls client.table(...).select(...).limit(0).execute()
synchronously which blocks the event loop; change to run these blocking calls on
a thread and await their completion (e.g., use asyncio.to_thread or
loop.run_in_executor to call the execute() invocations), capture/propagate any
exceptions from the awaited thread calls, and keep the same logging/return
behavior so the function remains async and non-blocking.
| 1. **Reuse existing Archon Supabase credentials** - No new database or credentials needed. The agent work orders service shares the same Supabase project as the main Archon server. | ||
|
|
||
| 2. **Apply database migration**: | ||
| - Navigate to your Supabase project dashboard at https://app.supabase.com |
There was a problem hiding this comment.
Wrap bare URL in markdown link format (MD034).
The URL on line 136 should be formatted as a markdown link rather than appearing bare in the text.
- - Navigate to your Supabase project dashboard at https://app.supabase.com
+ - Navigate to your Supabase project dashboard at [https://app.supabase.com](https://app.supabase.com)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Navigate to your Supabase project dashboard at https://app.supabase.com | |
| - Navigate to your Supabase project dashboard at [https://app.supabase.com](https://app.supabase.com) |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
136-136: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In python/src/agent_work_orders/README.md around line 136, a bare URL
"https://app.supabase.com" is present; replace it with markdown link syntax
(e.g. [Supabase dashboard](https://app.supabase.com)) so the URL is not bare and
conforms to MD034, keeping the surrounding sentence text unchanged.
|
|
||
| #### Troubleshooting | ||
|
|
||
| **Error: "tables_exist": false** |
There was a problem hiding this comment.
Use markdown headings instead of bold emphasis (MD036).
Lines 177, 181, and 185 use bold text with colons (**Error: ...**:) as section headings. These should be proper markdown headings using ### for consistency with the rest of the document.
-**Error: "tables_exist": false**
-- Migration not applied - see `database/migrations/README.md`
+### Error: "tables_exist": false
+
+Migration not applied - see `database/migrations/README.md`Apply the same pattern to the other two occurrences on lines 181 and 185.
Also applies to: 181-181, 185-185
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
177-177: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In python/src/agent_work_orders/README.md around lines 177, 181, and 185, the
review flags use of bold text as section headings (e.g., **Error:
"tables_exist": false**) — replace each bolded, colon-terminated line with a
markdown heading (use ### Error: ... ) to match document style; update the three
occurrences at lines 177, 181, and 185 to use "###" headings (retain the exact
error text inside the heading, remove the bold markers, and keep or remove the
trailing colon consistently so headings follow the rest of the document).
| """Repository Factory | ||
|
|
||
| Creates appropriate repository instances based on configuration. | ||
| Supports in-memory (dev/testing), file-based (legacy), and Supabase (production) storage. | ||
| """ |
There was a problem hiding this comment.
🧩 Analysis chain
Remove forbidden keyword from comment per coding guidelines.
Guidelines disallow the word “LEGACY” in comments. Rephrase to avoid policy violations.
-Supports in-memory (dev/testing), file-based (legacy), and Supabase (production) storage.
+Supports in-memory (dev/testing), file-based, and Supabase (production) storage.Please scan other comments for the same forbidden keywords across the repo before merging.
🏁 Script executed:
#!/bin/bash
# Verify the target file content and search for forbidden keywords
# 1. Check the exact content of the target file (lines 1-5)
echo "=== Target file content (lines 1-5) ==="
head -5 python/src/agent_work_orders/state_manager/repository_factory.py
# 2. Search for forbidden keywords in comments across all Python files
echo ""
echo "=== Scanning for forbidden keywords in Python files ==="
forbidden_keywords=("SIMPLIFIED" "ENHANCED" "LEGACY" "CHANGED" "REMOVED")
for keyword in "${forbidden_keywords[@]}"; do
echo ""
echo "--- Occurrences of '$keyword' ---"
# Search in Python files, excluding common exclude patterns
rg -n --type py "(#.*\b$keyword\b|\"\"\".*\b$keyword\b|'''.*\b$keyword\b)" || echo "No matches found"
doneLength of output: 536
Remove forbidden keyword "legacy" from docstring.
The docstring contains "legacy" which violates the coding guideline that forbids using LEGACY (and variants) in code comments. The suggested removal is appropriate:
-Supports in-memory (dev/testing), file-based (legacy), and Supabase (production) storage.
+Supports in-memory (dev/testing), file-based, and Supabase (production) storage.Codebase scan confirms no other instances of forbidden keywords (SIMPLIFIED, ENHANCED, LEGACY, CHANGED, REMOVED) exist in Python file comments or docstrings.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """Repository Factory | |
| Creates appropriate repository instances based on configuration. | |
| Supports in-memory (dev/testing), file-based (legacy), and Supabase (production) storage. | |
| """ | |
| """Repository Factory | |
| Creates appropriate repository instances based on configuration. | |
| Supports in-memory (dev/testing), file-based, and Supabase (production) storage. | |
| """ |
🤖 Prompt for AI Agents
In python/src/agent_work_orders/state_manager/repository_factory.py around lines
1 to 5, the module docstring contains the forbidden keyword "legacy"
("file-based (legacy)"); remove or replace that term—e.g., change "file-based
(legacy)" to "file-based" or "file-based (on-disk)"—so the docstring no longer
includes the banned keyword while preserving the original meaning.
| self.client.table(self.table_name).insert(data).execute() | ||
|
|
||
| self._logger.info( | ||
| "work_order_created", | ||
| agent_work_order_id=work_order.agent_work_order_id, | ||
| repository_url=work_order.repository_url, | ||
| ) | ||
| except Exception as e: |
There was a problem hiding this comment.
Async methods perform blocking I/O; offload Supabase .execute() calls to a thread.
Supabase Python client is sync (requests). Calling .execute() in async funcs blocks the event loop and can stall the service under load.
Add a small helper and use it across methods:
@@
-from supabase import Client
+import asyncio
+from supabase import Client
@@
class SupabaseWorkOrderRepository:
@@
def __init__(self) -> None:
@@
self._logger.info("supabase_repository_initialized")
+ async def _exec(self, req):
+ return await asyncio.to_thread(lambda: req.execute())
@@
- self.client.table(self.table_name).insert(data).execute()
+ await self._exec(self.client.table(self.table_name).insert(data))
@@
- response = self.client.table(self.table_name).select("*").eq("agent_work_order_id", agent_work_order_id).execute()
+ response = await self._exec(
+ self.client.table(self.table_name).select("*").eq("agent_work_order_id", agent_work_order_id)
+ )
@@
- response = query.order("created_at", desc=True).execute()
+ response = await self._exec(query.order("created_at", desc=True))
@@
- response = (
- self.client.table(self.table_name)
- .update(updates)
- .eq("agent_work_order_id", agent_work_order_id)
- .execute()
- )
+ response = await self._exec(
+ self.client.table(self.table_name).update(updates).eq("agent_work_order_id", agent_work_order_id)
+ )
@@
- self.client.table(self.table_name).update({
+ response = await self._exec(self.client.table(self.table_name).update({
"git_branch_name": git_branch_name,
- "updated_at": datetime.now().isoformat(),
- }).eq("agent_work_order_id", agent_work_order_id).execute()
+ }).eq("agent_work_order_id", agent_work_order_id))
+ if not response.data:
+ self._logger.warning("work_order_not_found_for_update", agent_work_order_id=agent_work_order_id)
+ return
@@
- self.client.table(self.table_name).update({
+ response = await self._exec(self.client.table(self.table_name).update({
"agent_session_id": agent_session_id,
- "updated_at": datetime.now().isoformat(),
- }).eq("agent_work_order_id", agent_work_order_id).execute()
+ }).eq("agent_work_order_id", agent_work_order_id))
+ if not response.data:
+ self._logger.warning("work_order_not_found_for_update", agent_work_order_id=agent_work_order_id)
+ return
@@
- self.client.table(self.steps_table_name).delete().eq("agent_work_order_id", agent_work_order_id).execute()
+ await self._exec(
+ self.client.table(self.steps_table_name).delete().eq("agent_work_order_id", agent_work_order_id)
+ )
@@
- self.client.table(self.steps_table_name).insert(steps_data).execute()
+ await self._exec(self.client.table(self.steps_table_name).insert(steps_data))
@@
- response = (
- self.client.table(self.steps_table_name)
- .select("*")
- .eq("agent_work_order_id", agent_work_order_id)
- .order("step_order")
- .execute()
- )
+ response = await self._exec(
+ self.client.table(self.steps_table_name)
+ .select("*")
+ .eq("agent_work_order_id", agent_work_order_id)
+ .order("step_order")
+ )Also applies to: 160-168, 195-203, 264-276, 307-312, 342-346, 394-415, 448-454
| except Exception as e: | ||
| self._logger.exception( | ||
| "create_work_order_failed", | ||
| agent_work_order_id=work_order.agent_work_order_id, | ||
| error=str(e), | ||
| ) | ||
| raise | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
Catching broad Exception everywhere; prefer the Supabase/PostgREST exception types.
Tighten except clauses to known client exceptions (e.g., PostgrestAPIError, APIError) and keep logging with exc_info.
If uncertain which classes are exposed in your version, run this web check and adjust:
Also applies to: 168-175, 210-218, 283-291, 318-324, 352-359, 421-427, 478-484
🌐 Web query:
What exception classes does the current supabase-py/PostgREST Python client raise (e.g., PostgrestAPIError)? Provide canonical names and module paths.
💡 Result:
I can look this up and return the canonical exception class names and module paths. Do you want me to fetch the current source from the supabase/postgrest-py (or supabase‑py monorepo) repository now?
I'll help verify this review comment. Let me gather the necessary information about supabase-py exceptions and inspect the code.Now let me search for more specific information about postgrest-py exception types:
Now let me inspect the actual code file to understand the context better:
Wait, let me adjust that to actually search the repository:
Now let me check the imports and exception handling in the file:
And check all the exception blocks mentioned in the review:
Let me also check what's being imported at the top of the file:
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and display the supabase_repository.py file with all exception handling blocks
find . -name "supabase_repository.py" -type f
Length of output: 67
🏁 Script executed:
#!/bin/bash
# Search for supabase_repository.py and display exception handling
fd supabase_repository.pyLength of output: 65
🏁 Script executed:
#!/bin/bash
# Display the file with context around exception handlers
rg -n "except.*Error|except.*Exception|from.*import|import.*" python/src/agent_work_orders/state_manager/supabase_repository.py -B2 -A8 | head -100Length of output: 3111
🏁 Script executed:
#!/bin/bash
# Show all exception handlers in the supabase_repository.py file
rg -n "except\s+Exception" python/src/agent_work_orders/state_manager/supabase_repository.py -B5 -A5Length of output: 3708
🏁 Script executed:
#!/bin/bash
# Display imports and the exception handling blocks
cat python/src/agent_work_orders/state_manager/supabase_repository.py | head -50Length of output: 1739
Let me search for the exact postgrest-py exception import path one more time to be certain:
Replace broad Exception with postgrest.APIError across all database operation handlers.
The Supabase client raises postgrest.exceptions.APIError, which can be imported from supabase as PostgrestAPIError. Update all exception handlers to catch this specific type instead of the generic Exception.
Affected locations (8 blocks):
- Lines 134–141
- Lines 168–175
- Lines 210–218
- Lines 283–291
- Lines 318–324
- Lines 352–359
- Lines 421–427
- Lines 478–484
Add the import at the top: from postgrest.exceptions import APIError (or use from supabase import APIError as PostgrestAPIError), then replace each except Exception as e: with except APIError as e:. The logging with exc_info=True is already correct.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/state_manager/supabase_repository.py around
lines 134-141 (and also update the other mentioned blocks at 168-175, 210-218,
283-291, 318-324, 352-359, 421-427, 478-484), the handlers currently catch broad
Exception; import the PostgREST API error class at the top (e.g., from
postgrest.exceptions import APIError or from supabase import APIError as
PostgrestAPIError) and replace each "except Exception as e:" with "except
APIError as e:" so the code specifically handles database API errors while
leaving logging (exc_info=True) as-is.
| for i, step in enumerate(step_history.steps): | ||
| steps_data.append({ | ||
| "agent_work_order_id": agent_work_order_id, | ||
| "step": step.step.value, | ||
| "agent_name": step.agent_name, | ||
| "success": step.success, | ||
| "output": step.output, | ||
| "error_message": step.error_message, | ||
| "duration_seconds": step.duration_seconds, | ||
| "session_id": step.session_id, | ||
| "executed_at": step.timestamp.isoformat(), | ||
| "step_order": i, | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
Ensure timestamps are timezone‑aware on write and parsed on read.
- StepExecutionResult.timestamp may be naive; writing isoformat() without tz stores ambiguous times into TIMESTAMPTZ.
- On read, you pass strings directly to the model.
Suggested changes:
+from datetime import datetime, timezone
@@
- "executed_at": step.timestamp.isoformat(),
+ "executed_at": (
+ step.timestamp if step.timestamp.tzinfo else step.timestamp.replace(tzinfo=timezone.utc)
+ ).astimezone(timezone.utc).isoformat(),
@@
- timestamp=row["executed_at"],
+ timestamp=datetime.fromisoformat(row["executed_at"].replace("Z", "+00:00")),Confirm StepExecutionResult.timestamp is a datetime field parsed by Pydantic v2. If so, the read path may already coerce strings; otherwise apply the explicit parse.
Also applies to: 472-475
I'll verify this review comment by examining the codebase to understand the timestamp handling, the StepExecutionResult model, and Pydantic configuration.
🏁 Script executed:
#!/bin/bash
# Find StepExecutionResult class definition
rg -n "class StepExecutionResult" --type pyLength of output: 80
🏁 Script executed:
#!/bin/bash
# Check Pydantic version in project configuration
fd -e "pyproject.toml" -o -e "requirements.txt" -o -e "requirements-*.txt" | xargs grep -l "pydantic\|Pydantic" 2>/dev/null || echo "No requirements found with grep"Length of output: 148
🏁 Script executed:
#!/bin/bash
# Examine the supabase_repository.py file around the flagged lines and the similar pattern at 472-475
sed -n '395,415p;465,480p' python/src/agent_work_orders/state_manager/supabase_repository.pyLength of output: 1710
🏁 Script executed:
#!/bin/bash
# Search for the method containing the read path (where row["executed_at"] is used)
rg -n "executed_at" python/src/agent_work_orders/state_manager/supabase_repository.py -A 2 -B 2Length of output: 481
🏁 Script executed:
#!/bin/bash
# Look for StepExecutionResult usage and import to understand its full module path
rg -n "StepExecutionResult\|from.*step\|import.*step" python/src/agent_work_orders/state_manager/supabase_repository.py -B 1 -A 1 | head -30Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check the full pyproject.toml to understand Pydantic version
cat pyproject.toml | grep -A 10 -B 2 "pydantic\|dependencies"Length of output: 46
🏁 Script executed:
#!/bin/bash
# Find where StepExecutionResult is defined (likely in a models or schema file)
fd -e "\.py" -path "*/agent_work_orders/*" | xargs rg "class StepExecutionResult" -lLength of output: 207
🏁 Script executed:
#!/bin/bash
# Check the timestamp field in StepExecutionResult
rg -n "class StepExecutionResult" -A 20 --type pyLength of output: 1503
🏁 Script executed:
#!/bin/bash
# Look for the read/parse path to see how timestamps are currently being handled
rg -n "datetime.fromisoformat\|isoformat()\|timezone" python/src/agent_work_orders/state_manager/supabase_repository.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check imports at top of supabase_repository.py file
head -50 python/src/agent_work_orders/state_manager/supabase_repository.py | grep -E "^import|^from"Length of output: 211
🏁 Script executed:
#!/bin/bash
# Search for how timestamp is currently being read from database (find the model instantiation)
rg -n "StepExecutionResult(" python/src/agent_work_orders/state_manager/supabase_repository.py -A 15 -B 5Length of output: 90
🏁 Script executed:
#!/bin/bash
# Look for all places where executed_at is being used or parsed
rg -n '"executed_at"' python/src/agent_work_orders/state_manager/supabase_repository.py -B 3 -A 3Length of output: 704
I need to verify the Pydantic version and check if there's configuration elsewhere:
🏁 Script executed:
#!/bin/bash
# Look for pyproject.toml in python directory
find . -name "pyproject.toml" -type f 2>/dev/null | head -5Length of output: 23
🏁 Script executed:
#!/bin/bash
# Check models.py imports to confirm Pydantic version
head -30 python/src/agent_work_orders/models.pyLength of output: 599
🏁 Script executed:
#!/bin/bash
# Look for any existing timezone handling in the models file
rg -n "timezone\|utc\|tzinfo" python/src/agent_work_orders/models.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check if datetime.now is used elsewhere and how timestamps are handled
rg -n "datetime.now\|tzinfo\|timezone.utc\|UTC" python/src/agent_work_orders/ --type py | head -20Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find poetry.lock or requirements to determine Pydantic version
find . -name "poetry.lock" -o -name "*requirements*.txt" 2>/dev/null | xargs grep -i "pydantic" 2>/dev/null | head -5Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check the database schema or migrations for the steps table
find . -path "*migration*" -o -path "*schema*" | xargs grep -l "executed_at\|TIMESTAMPTZ" 2>/dev/null | head -5Length of output: 120
🏁 Script executed:
#!/bin/bash
# Look at the supabase client initialization to understand database setup
rg -n "steps_table\|table_name" python/src/agent_work_orders/state_manager/supabase_repository.py -B 2 -A 2 | head -30Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check full models.py file imports section
rg -n "^from|^import" python/src/agent_work_orders/models.py | head -15Length of output: 112
🏁 Script executed:
#!/bin/bash
# Verify what datetime.now actually does - check if it's being wrapped anywhere
rg -n "default_factory" python/src/agent_work_orders/models.py -B 1 -A 1Length of output: 108
Let me verify the schema and Pydantic version:
🏁 Script executed:
#!/bin/bash
# Check the migration file for steps table schema, specifically executed_at column
grep -n "executed_at\|TIMESTAMPTZ" ./migration/agent_work_orders_state.sql -B 2 -A 2Length of output: 789
🏁 Script executed:
#!/bin/bash
# Check pyproject.toml for Pydantic version
grep -n "pydantic" ./python/pyproject.toml -A 1 -B 1Length of output: 523
🏁 Script executed:
#!/bin/bash
# Check the complete steps table schema definition
rg -n "steps" ./migration/agent_work_orders_state.sql -i | head -20Length of output: 1280
🏁 Script executed:
#!/bin/bash
# Get full steps table creation from migration
rg -n "CREATE TABLE.*steps\|executed_at" ./migration/agent_work_orders_state.sql -A 20 -i | head -40Length of output: 0
Apply timezone awareness to StepExecutionResult.timestamp and update read/write paths accordingly.
The issue is real: StepExecutionResult.timestamp uses datetime.now() which produces naive datetime objects. Writing isoformat() without timezone info to the TIMESTAMP WITH TIME ZONE NOT NULL column (line 62 of migration) creates ambiguous times. On read, Pydantic v2 auto-coerces the naive string back to a naive datetime, breaking timezone semantics.
Changes needed:
- Lines 400-412: Ensure timestamps are UTC-aware before writing
- Lines 472-475: Explicitly parse with timezone handling on read
- models.py: Update default_factory to use
datetime.now(timezone.utc)or similar
The suggested fix in the review comment is correct and should be applied.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/state_manager/supabase_repository.py around
lines 400-412 and 472-475, StepExecutionResult.timestamp values are being
written as naive ISO strings and read back as naive datetimes; ensure timestamps
are UTC-aware before write and explicitly preserve/parse timezone on read: when
building steps_data (lines ~400-412) convert or replace naive timestamps to
UTC-aware datetimes (e.g., ensure timestamp has tzinfo or call
.astimezone(timezone.utc)) before calling isoformat(), and when reading rows
back (lines ~472-475) parse the stored timestamp strings into timezone-aware
datetimes (treat naive or missing zone as UTC) so Pydantic gets aware datetimes;
also update models.py default_factory for StepExecutionResult.timestamp to use
datetime.now(timezone.utc) (or equivalent) so newly created instances are
UTC-aware by default.
| if fix: | ||
| # Clean up orphaned worktrees | ||
| for orphan_path in orphans: | ||
| try: | ||
| shutil.rmtree(orphan_path) | ||
| actions.append(f"Deleted orphaned worktree: {orphan_path}") | ||
| logger.info("orphaned_worktree_deleted", path=orphan_path) | ||
| except Exception as e: | ||
| actions.append(f"Failed to delete {orphan_path}: {e}") | ||
| logger.error("orphaned_worktree_delete_failed", path=orphan_path, error=str(e), exc_info=True) | ||
|
|
||
| # Update dangling state to mark as failed | ||
| for work_order_id in dangling: | ||
| try: | ||
| await repository.update_status( | ||
| work_order_id, | ||
| AgentWorkOrderStatus.FAILED, | ||
| error_message="Worktree missing - state/filesystem divergence detected during reconciliation" | ||
| ) | ||
| actions.append(f"Marked work order {work_order_id} as failed (worktree missing)") | ||
| logger.info("dangling_state_updated", work_order_id=work_order_id) | ||
| except Exception as e: | ||
| actions.append(f"Failed to update {work_order_id}: {e}") | ||
| logger.error("dangling_state_update_failed", work_order_id=work_order_id, error=str(e), exc_info=True) | ||
|
|
There was a problem hiding this comment.
Async blocking and deletion safety.
shutil.rmtree blocks the event loop and lacks path safety checks. Offload to a thread and ensure only direct children of WORKTREE_BASE_DIR are removed; narrow exceptions where possible.
+ # Resolve base once for safety checks
+ worktree_base = Path(config.WORKTREE_BASE_DIR).resolve()
if fix:
# Clean up orphaned worktrees
- for orphan_path in orphans:
- try:
- shutil.rmtree(orphan_path)
- actions.append(f"Deleted orphaned worktree: {orphan_path}")
- logger.info("orphaned_worktree_deleted", path=orphan_path)
- except Exception as e:
- actions.append(f"Failed to delete {orphan_path}: {e}")
- logger.error("orphaned_worktree_delete_failed", path=orphan_path, error=str(e), exc_info=True)
+ import asyncio # local import to avoid module import if fix=False
+ for orphan_path in orphans:
+ try:
+ target = Path(orphan_path).resolve(strict=False)
+ # Only delete direct children under the configured base dir
+ if target.parent != worktree_base:
+ actions.append(f"Skipped unsafe deletion outside base: {orphan_path}")
+ logger.warning(
+ "orphaned_worktree_delete_skipped_unsafe",
+ path=orphan_path,
+ base=str(worktree_base),
+ )
+ continue
+ # Offload blocking I/O
+ await asyncio.to_thread(shutil.rmtree, target)
+ actions.append(f"Deleted orphaned worktree: {orphan_path}")
+ logger.info("orphaned_worktree_deleted", path=orphan_path)
+ except (FileNotFoundError, PermissionError, OSError) as e:
+ actions.append(f"Failed to delete {orphan_path}: {e}")
+ logger.error(
+ "orphaned_worktree_delete_failed",
+ path=orphan_path,
+ error=str(e),
+ exc_info=True,
+ )
@@
- for work_order_id in dangling:
+ for work_order_id in dangling:
try:
- await repository.update_status(
+ # Avoid overwriting terminal states
+ current = await repository.get(work_order_id)
+ terminal = {
+ AgentWorkOrderStatus.COMPLETED,
+ AgentWorkOrderStatus.CANCELLED,
+ AgentWorkOrderStatus.FAILED,
+ }
+ if current and getattr(current[0], "status", None) in terminal:
+ actions.append(f"Skipped status update for terminal work order {work_order_id}")
+ logger.info(
+ "dangling_state_skipped_terminal",
+ work_order_id=work_order_id,
+ status=current[0].status.value if current else None,
+ )
+ continue
+ await repository.update_status(
work_order_id,
AgentWorkOrderStatus.FAILED,
error_message="Worktree missing - state/filesystem divergence detected during reconciliation"
)
actions.append(f"Marked work order {work_order_id} as failed (worktree missing)")
logger.info("dangling_state_updated", work_order_id=work_order_id)
- except Exception as e:
+ except Exception as e:
actions.append(f"Failed to update {work_order_id}: {e}")
logger.error("dangling_state_update_failed", work_order_id=work_order_id, error=str(e), exc_info=True)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In python/src/agent_work_orders/utils/state_reconciliation.py around lines
140-164, the code calls shutil.rmtree directly (blocking) and deletes paths
without verifying they are safe; change to offload removal to a thread via
asyncio.to_thread or loop.run_in_executor and await it, validate each
orphan_path before deletion by resolving and ensuring its parent is exactly
WORKTREE_BASE_DIR (or it's a direct child) to avoid deleting arbitrary
locations, use shutil.rmtree only inside the worker thread, and narrow exception
handling to specific exceptions (FileNotFoundError, PermissionError, OSError)
with distinct log messages while still catching unexpected exceptions as a last
resort; update the actions/log messages to reflect awaited async deletion and
failure reasons.
Pull Request
Summary
Refactors the Agent Work Orders workflow orchestrator from a hardcoded 11-step atomic operation sequence to a user-selectable 6-command compositional architecture. This enables users to run full workflows or partial command sequences (e.g., planning-only, or resume from implementation).
Key Improvements:
Changes Made
Core Refactoring
create-branch,planning,execute,commit,create-pr,prp-review)workflow_typefield, addedselected_commandsparameter with validationselected_commandsarray instead ofworkflow_typeCleanup & Maintenance
.claude/commandsTests
test_workflow_orchestrator.py(375 lines)test_models.pyfor new model structuretest_workflow_operations.pyfor new command functionsCommand Files
create-branch.md,create-pr.mdcommit.md,prime.md.claude/andpython/.claude/Type of Change
Affected Services
Testing
Test Evidence
Checklist
Breaking Changes
API Changes
workflow_typefield fromCreateAgentWorkOrderRequestselected_commandsfield (list[str], defaults to full workflow)selected_commandsor accept the defaultModel Changes
workflow_typefield fromAgentWorkOrdermodelWorkflowStepenum values (incompatible with old step history)Configuration Changes
ENABLE_TEST_PHASE,ENABLE_REVIEW_PHASE,MAX_TEST_RETRY_ATTEMPTS,MAX_REVIEW_RETRY_ATTEMPTSMigration Steps
For existing deployments:
selected_commandsinstead ofworkflow_typeWorkflowStepvaluesAdditional Notes
Validation Results
See
VALIDATION_RESULTS.md(not committed, per .gitignore) for complete validation report including:Statistics
Code Quality
selected_commandsparameterFuture Enhancements
This architecture enables:
Summary by CodeRabbit
New Features
Documentation
Chores
Tests