feat(web-evals): add task log viewing, export failed logs, and new run options #9637

hannesrudolph · 2025-11-26T23:57:36Z

Summary

This PR enhances the web-evals application with several new features:

Task Log Viewing

Added a full-screen dialog to view task logs with syntax highlighting
Logs are color-coded for timestamps, log levels (INFO, WARN, ERROR, DEBUG), and task identifiers
Copy to clipboard functionality for easy sharing
Click on any completed task row to view its log
ESC key closes the log dialog

Export Failed Logs

Added "Export Failed Logs" option in the run dropdown menu
Downloads a zip file containing all failed task logs for a run
Only available when the run has at least one failed task

New Run Options

Added "Use Multiple Native Tool Calls" checkbox for all providers (Roo, OpenRouter, Other)
Added "Reasoning Effort" dropdown for Roo Code Cloud provider (None, Low, Medium, High)
Improved job token field with tooltip showing how to generate tokens
Added validation requiring job token for Roo Code Cloud provider
Added "Iterations per Exercise" slider (1-10) to run each exercise multiple times

Iterations Support

New iterations slider in new run form to run each exercise multiple times (1-10)
Task table displays iteration number for repeated exercises (e.g., "go/hello-world (Fix vscode compatibility issue #2)")
Database migration adds iteration column to tasks table
Supports comparing results across multiple runs of the same exercise

Docker Configuration

Added log file mount in docker-compose.yml so web container can access task logs
Added PRODUCTION_DATABASE_URL environment variable
Added docker-compose.override.yml for local development

Dependencies

Added archiver package for zip file generation
Added @types/archiver for TypeScript support

Important

Enhance web-evals with task log viewing, export failed logs, new run options, and database schema updates for iterations.

Task Log Viewing:
- Added full-screen dialog for task logs with syntax highlighting in run.tsx.
- Logs color-coded for timestamps, log levels, and task identifiers.
- Copy to clipboard feature for logs.
Export Failed Logs:
- Added API endpoint in route.ts to export failed task logs as a zip file.
- UI option in run.tsx to trigger log export.
New Run Options:
- Added "Use Multiple Native Tool Calls" and "Reasoning Effort" options in new-run.tsx.
- Added "Iterations per Exercise" slider in new-run.tsx.
Iterations Support:
- Updated createRun in runs.ts to handle multiple iterations per exercise.
- Database migration in 0004_sloppy_black_knight.sql to add iteration column to tasks table.
Docker Configuration:
- Updated docker-compose.yml and docker-compose.override.yml to mount log files for web access.
Dependencies:
- Added archiver and @types/archiver for zip file generation in package.json.

^{This description was created by}^{for 13016ee. You can customize this summary. It will automatically update as commits are pushed.}

roomote · 2025-11-26T23:57:56Z

Oroocle Follow along on Roo Cloud

Review updated for the latest commit. One issue in the task log viewer JSX/highlighting remains open.

Escape or safely render task log lines instead of using dangerouslySetInnerHTML or unsafe HTML so logs cannot inject HTML or scripts.

Previous reviews

382d25f: Review #1

ed51c0e: Review #2

70f6c8c: Review #3

dd20ee1: Review #4

898dc13: Review #5

_{Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.}

apps/web-evals/src/app/runs/[id]/run.tsx

apps/web-evals/src/app/api/runs/[id]/logs/[taskId]/route.ts

apps/web-evals/src/app/api/runs/[id]/logs/failed/route.ts

…n options - Add task log viewing dialog with syntax highlighting and copy to clipboard - Add export failed logs functionality (downloads zip file) - Add 'Use Multiple Native Tool Calls' option for all providers - Add reasoning effort dropdown for Roo Code Cloud provider - Improve job token field with tooltip and validation - Mount log files in docker-compose for web access - Add archiver dependency for zip exports

…d logs export - Add /api/runs/[id]/logs/[taskId] route to retrieve individual task logs - Add /api/runs/[id]/logs/failed route to export failed task logs as zip - Add archiver dependency for zip file generation - Remove redundant ESC key handler (Radix Dialog handles this) - Fixes missing functionality from original PR

- Fix XSS vulnerability in formatLogContent by escaping HTML before injecting spans - Use async fs.readFile instead of sync fs.readFileSync to avoid blocking event loop - Add path sanitization to prevent path traversal attacks in log file APIs - Add defense-in-depth path validation to ensure resolved paths stay within LOG_BASE_PATH - Add archiver error handler to properly handle archive generation errors - Fix event listener ordering: register 'end' handler before calling finalize() - Add empty zip detection: return 404 error if no log files found on disk - Fix toolProtocol not being applied for 'other' provider in new-run.tsx

- Add iterations slider (1-10) to new run form - Add iteration column to tasks table schema - Add ESC key handler to close task log dialog - Update run display to show iteration number for repeated tasks - Add database migration for iteration column

apps/web-evals/src/actions/runs.ts

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

…log highlighting - Fixed malformed JSX in formatLogContent function (duplicate nested div elements) - Replaced HTML string injection with proper React elements for XSS-safe syntax highlighting - Addresses review feedback about dangerouslySetInnerHTML security concern

apps/web-evals/src/app/runs/[id]/run.tsx

* fix: Filter native tools by mode restrictions (RooCodeInc#9246) * fix: filter native tools by mode restrictions Native tools are now filtered based on mode restrictions before being sent to the API, matching the behavior of XML tools. Previously, all native tools were sent to the API regardless of mode, causing the model to attempt using disallowed tools. Changes: - Created filterNativeToolsForMode() and filterMcpToolsForMode() utility functions - Extracted filtering logic from Task.ts into dedicated module - Applied same filtering approach used for XML tools in system prompt - Added comprehensive test coverage (10 tests) Impact: - Model only sees tools allowed by current mode - No more failed tool attempts due to mode restrictions - Consistent behavior between XML and Native protocols - Better UX with appropriate tool suggestions per mode * refactor: eliminate repetitive tool checking using group-based approach - Add getAvailableToolsInGroup() helper to check tools by group instead of individually - Refactor filterNativeToolsForMode() to reuse getToolsForMode() instead of duplicating logic - Simplify capabilities.ts by using group-based checks (60% reduction) - Refactor rules.ts to use group helper (56% reduction) - Remove debug console.log statements - Update tests and snapshots Benefits: - Eliminates code duplication - Leverages existing TOOL_GROUPS structure - More maintainable - new tools in groups work automatically - All tests passing (26/26) * fix: add fallback to default mode when mode config not found Ensures the agent always has functional tools even if: - A custom mode is deleted while tasks still reference it - Mode configuration becomes corrupted - An invalid mode slug is provided Without this fallback, the agent would have zero tools (not even ask_followup_question or attempt_completion), completely breaking it. * Fix broken share button (RooCodeInc#9253) fix(webview-ui): make Share button popover work by forwarding ref in LucideIconButton - Convert LucideIconButton to forwardRef so Radix PopoverTrigger(asChild) receives a focusable element - Enables Share popover and shareCurrentTask flow - Verified with ShareButton/TaskActions Vitest suites * Add GPT-5.1 models and clean up reasoning effort logic (RooCodeInc#9252) * Reasoning effort: capability-driven; add disable/none/minimal; remove GPT-5 minimal special-casing; document UI semantics; remove temporary logs * Remove Unused supportsReasoningNone * Roo reasoning: omit field on 'disable'; UI: do not flip enableReasoningEffort when selecting 'disable' * Update packages/types/src/model.ts Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Update webview-ui/src/components/settings/SimpleThinkingBudget.tsx Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> --------- Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * fix: make line_ranges optional in read_file tool schema (RooCodeInc#9254) The OpenAI tool schema required both 'path' and 'line_ranges' in FileEntry, but the TypeScript type definition marks lineRanges as optional. This caused the AI to fail when trying to read files without specifying line_ranges. Changes: - Updated read_file tool schema to only require 'path' parameter - line_ranges remains available but optional, matching TypeScript types - Aligns with implementation which treats lineRanges as optional throughout Fixes issue where read_file tool kept failing with missing parameters. * fix: prevent consecutive user messages on streaming retry (RooCodeInc#9249) * feat(openai): OpenAI Responses: model-driven prompt caching and generic reasoning options refactor (RooCodeInc#9259) * revert out of scope changes from RooCodeInc#9252 (RooCodeInc#9258) * Revert "refactor(task): switch to <feedback> wrapper to prevent focus drift after context-management event (condense/truncate)" (RooCodeInc#9261) * Release v3.32.0 (RooCodeInc#9264) * Changeset version bump (RooCodeInc#9265) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * [FIX] Fix OpenAI Native handling of encrypted reasoning blocks to prevent error when condensing (RooCodeInc#9263) * fix: prevent duplicate tool_result blocks in native protocol mode for read_file (RooCodeInc#9272) When read_file encountered errors (e.g., file not found), it would call handleError() which internally calls pushToolResult(), then continue to call pushToolResult() again with the final XML. In native protocol mode, this created two tool_result blocks with the same tool_call_id, causing 400 errors on subsequent API calls. This fix replaces handleError() with task.say() for error notifications. The agent still receives error details through the XML in the single final pushToolResult() call. This change works for both protocols: - Native: Only one tool_result per tool_call_id (fixes duplicate issue) - XML: Only one text block with complete XML (cleaner than before) Agent visibility preserved: Errors are included in the XML response sent to the agent via pushToolResult(). Tests: All 44 tests passing. Updated test to verify say() is called. * Fix duplicate tool blocks causing 'tool has already been used' error (RooCodeInc#9275) * feat(openai-native): add abort controller for request cancellation (RooCodeInc#9276) * Disable XML parser for native tool protocol (RooCodeInc#9277) * Release v3.32.1 (RooCodeInc#9278) * Changeset version bump (RooCodeInc#9280) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * refactor: centralize toolProtocol configuration checks (RooCodeInc#9279) * refactor: centralize toolProtocol configuration checks - Created src/utils/toolProtocol.ts with getToolProtocolFromSettings() utility - Replaced all direct vscode.workspace.getConfiguration() calls with centralized utility - Updated 6 files to use the new utility function - All tests pass and TypeScript compilation succeeds * refactor: use isNativeProtocol function from types package * fix: format tool responses for native protocol (RooCodeInc#9270) * fix: format tool responses for native protocol - Add toolResultFormatting utilities for protocol detection - ReadFileTool now builds both XML and native formats - Native format returns clean, readable text without XML tags - Legacy conversation history conversion is protocol-aware - All tests passing (55 total) * refactor: use isNativeProtocol from @roo-code/types Remove duplicate implementation and import from types package instead * fix: prevent duplicate tool_result blocks in native tool protocol (RooCodeInc#9248) * Merge remote-tracking branch 'upstream/main' into roo-to-main * Fix duplicate import (RooCodeInc#9281) * chore(core): remove unused TelemetryEventName import * feat: implement dynamic tool protocol resolution with proper precedence hierarchy (RooCodeInc#9286) Co-authored-by: Roo Code <[email protected]> * web: Roo Code Cloud Provider pricing page and changes elsewhere (RooCodeInc#9195) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * feat(zgsm): add abort signal handling for streaming responses * Move the native tool call toggle to experimental settings (RooCodeInc#9297) Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: daniel-lxs <[email protected]> * fix: Replace broken badgen.net badges with shields.io (RooCodeInc#9318) Co-authored-by: Roo Code <[email protected]> * fix: preserve tool blocks for native protocol in conversation history (RooCodeInc#9319) * feat: add git status to environment details (RooCodeInc#9310) * feat: Move Import/Export to Modes view toolbar (RooCodeInc#8686) Cleanup of Mode Edit view (RooCodeInc#9077) * Add max git status files to evals settings (RooCodeInc#9322) * Release: v1.86.0 (RooCodeInc#9323) * fix: prevent infinite loop when attempt_completion succeeds (RooCodeInc#9325) * feat: add tool protocol selector to advanced settings (RooCodeInc#9324) Co-authored-by: Roo Code <[email protected]> Co-authored-by: Matt Rubens <[email protected]> * Remove experimental setting for native tool calls (RooCodeInc#9333) * Fix the type of the list files recursive parameter (RooCodeInc#9337) * fix: use VSCode theme color for outline button borders (RooCodeInc#9336) Co-authored-by: Roo Code <[email protected]> Co-authored-by: Bruno Bergher <[email protected]> * feat: update cloud agent CTA to point to setup page (RooCodeInc#9338) Co-authored-by: Roo Code <[email protected]> * Improve Google Gemini defaults, temperature, and cost reporting (RooCodeInc#9327) * fix: sync parser state with profile/model changes (RooCodeInc#9355) * feat: enable native tool calling for openai-native provider (RooCodeInc#9348) Co-authored-by: daniel-lxs <[email protected]> * Add Gemini 3 Pro Preview model (RooCodeInc#9357) * fix: pass tool protocol parameter to lineCountTruncationError (RooCodeInc#9358) * Remove the Roo model defaults (RooCodeInc#9340) * chore: add changeset and announcement for v3.33.0 (RooCodeInc#9360) * Changeset version bump (RooCodeInc#9362) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * fix: resolve native tool protocol race condition causing 400 errors (RooCodeInc#9363) * Retry eval tasks if API instability detected (RooCodeInc#9365) * fix: exclude XML tool examples from MODES section when native protocol enabled (RooCodeInc#9367) * Add native tool calling support to OpenAI-compatible (RooCodeInc#9369) * Add native tool calling support to OpenAI-compatible * Fix OpenAI strict mode schema validation by adding converter methods to BaseProvider - Add convertToolsForOpenAI() and convertToolSchemaForOpenAI() methods to BaseProvider - These methods ensure all properties are in required array and convert nullable types - Remove line_ranges from required array in read_file tool (converter handles it) - Update OpenAiHandler and BaseOpenAiCompatibleProvider to use helper methods - Eliminates code duplication across multiple tool usage sites - Fixes: OpenAI completion error: 400 Invalid schema for function 'read_file' --------- Co-authored-by: daniel-lxs <[email protected]> * fix: ensure no XML parsing when protocol is native (RooCodeInc#9371) * fix: ensure no XML parsing when protocol is native * refactor: remove redundant non-null assertions * fix: gemini maxOutputTokens and reasoning config (RooCodeInc#9375) * fix: gemini maxOutputTokens and reasoning config * test: tighten gemini reasoning typings * fix: Update tools to return structured JSON for native protocol (RooCodeInc#9373) * feat: add toolProtocol property to PostHog tool usage telemetry (RooCodeInc#9374) Co-authored-by: Roo Code <[email protected]> Co-authored-by: Matt Rubens <[email protected]> Co-authored-by: daniel-lxs <[email protected]> * fix: Include nativeArgs in tool repetition detection (RooCodeInc#9377) * fix: Include nativeArgs in tool repetition detection Fixes false positive 'stuck in a loop' error for native protocol tools like read_file that store parameters in nativeArgs instead of params. Previously, the ToolRepetitionDetector only compared the params object, which was empty for native protocol tools. This caused all read_file calls to appear identical, triggering false loop detection even when reading different files. Changes: - Updated serializeToolUse() to include nativeArgs in comparison - Added comprehensive tests for native protocol scenarios - Maintains backward compatibility with XML protocol tools Closes: Issue reported in Discord about read_file loop detection * Try to use safe-stable-stringify in the tool repetition detector --------- Co-authored-by: Matt Rubens <[email protected]> * Fix Gemini thought signature validation and token counting errors (RooCodeInc#9380) Co-authored-by: Matt Rubens <[email protected]> * Release v3.33.1 (RooCodeInc#9383) * Changeset version bump (RooCodeInc#9384) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * fix: preserve user images in native tool call results (RooCodeInc#9401) * feat: migrate PostHog client to ph.roocode.com (RooCodeInc#9402) Co-authored-by: Roo Code <[email protected]> Co-authored-by: Matt Rubens <[email protected]> * feat: enable native tool calling for gemini provider (RooCodeInc#9343) Co-authored-by: daniel-lxs <[email protected]> * Add a RCC credit balance display (RooCodeInc#9386) * Add a RCC credit balance display * Replace the provider docs with the balance when logged in * PR feedback --------- Co-authored-by: Matt Rubens <[email protected]> * perf: reduce excessive getModel() calls & implement disk cache fallback (RooCodeInc#9410) * Show zero price for free models (RooCodeInc#9419) * Release v3.33.2 (RooCodeInc#9420) * Changeset version bump (RooCodeInc#9421) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * Improve read_file tool description with examples (RooCodeInc#9422) * Improve read_file tool description with examples - Add explicit JSON structure documentation - Include three concrete examples (single file, with line ranges, multiple files) - Clarify that 'path' is required and 'line_ranges' is optional - Better explain line range format (1-based inclusive) This addresses agent confusion by providing clear examples similar to the XML tool definition. * Make read_file tool dynamic based on partialReadsEnabled setting - Convert read_file from static export to createReadFileTool() factory function - Add getNativeTools() function that accepts partialReadsEnabled parameter - Create buildNativeToolsArray() helper to encapsulate tool building logic - Update Task.ts to build native tools dynamically using maxReadFileLine setting - When partialReadsEnabled is false, line_ranges parameter is excluded from schema - Examples and descriptions adjust based on whether line ranges are supported This matches the behavior of the XML tool definition which dynamically adjusts its documentation based on settings, reducing confusion for agents. * Fix Marketplace crash by removing wildcard activation event (RooCodeInc#9423) * Revert "Fix Marketplace crash by removing wildcard activation event" (RooCodeInc#9432) * Fix OpenAI Native parallel tool calls for native protocol (RooCodeInc#9433) Fixes an issue where using the OpenAI Native provider together with Native Tool Calling could cause OpenAI’s Responses API to fail with errors like: * feat: add Google Gemini 3 Pro Image Preview to image generation models (RooCodeInc#9440) Co-authored-by: Roo Code <[email protected]> Co-authored-by: Matt Rubens <[email protected]> * fix: prevent duplicate environment_details when resuming cancelled tasks (RooCodeInc#9442) - Filter out complete environment_details blocks before appending fresh ones - Check for both opening and closing tags to ensure we're matching complete blocks - Prevents stale environment data from being kept during task resume - Add tests to verify deduplication logic and edge cases * Update glob to ^11.1.0 (RooCodeInc#9449) * chore: update tar-fs to 3.1.1 via pnpm override (RooCodeInc#9450) Co-authored-by: Roo Code <[email protected]> * Store reasoning in conversation history for all providers (RooCodeInc#9451) * Fix preserveReasoning flag to control API reasoning inclusion (RooCodeInc#9453) * feat: store reasoning in conversation history for all providers * refactor: address review feedback - Move comments inside else block - Combine reasoning checks into single if block - Make comments more concise * refactor: make comments more concise * Fix preserveReasoning flag to control API reasoning inclusion Changes: 1. Removed hardcoded <think> tag logic in streaming - Previously hardcoded reasoning into assistant message text - Now passes reasoning to addToApiConversationHistory as parameter 2. Updated buildCleanConversationHistory to respect preserveReasoning flag - When preserveReasoning: true → reasoning block included in API requests - When preserveReasoning: false/undefined → reasoning stripped from API - Reasoning stored in history for all cases 3. Added temporary debug logs to base-openai-compatible-provider.ts - Shows preserveReasoning flag value - Logs reasoning blocks in incoming messages - Logs <think> tags in converted messages sent to API * Fix: Use api.getModel() directly instead of cachedStreamingModel Addresses review comment: cachedStreamingModel is set during streaming but buildCleanConversationHistory is called before streaming starts. Using the cached value could cause stale model info when switching models between requests. Now directly uses this.api.getModel().info.preserveReasoning to ensure we always check the current model's flag, not a potentially stale cached value. * Clean up comments in Task.ts Removed outdated comment regarding model's preserveReasoning flag. * fix: remove unnecessary reasoningBlock variable in task reasoning logic * fix: send tool_result blocks for skipped tools in native protocol (RooCodeInc#9457) * fix: improve markdown formatting and add reasoning support (RooCodeInc#9458) * feat: implement Minimax as Anthropic-compatible provider (RooCodeInc#9455) * fix: improve search and replace symbol parsing (RooCodeInc#9456) * chore: add changeset for v3.33.3 (RooCodeInc#9459) * Changeset version bump (RooCodeInc#9460) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * feat(terminal): add inline shell integration with user input support * Release: v1.87.0 (RooCodeInc#9477) * refactor(terminal): remove inline shell integration callback and improve terminal process handling * fix: add fallback to yield tool calls regardless of finish_reason (RooCodeInc#9476) * Improvements to base openai compatible (RooCodeInc#9462) Co-authored-by: Roo Code <[email protected]> * Browser Use 2.0 (RooCodeInc#8941) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: daniel-lxs <[email protected]> * fix: resolve apply_diff performance regression from PR RooCodeInc#9456 (RooCodeInc#9474) * fix: implement model cache refresh to prevent stale disk cache (RooCodeInc#9478) * fix: Make cancel button immediately responsive during streaming (RooCodeInc#9448) * Test a provider-oriented welcome screen (RooCodeInc#9484) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * (feat): Add Baseten Provider (RooCodeInc#9461) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: AlexKer <[email protected]> Co-authored-by: Matt Rubens <[email protected]> * fix: copy model-level capabilities to OpenRouter endpoint models (RooCodeInc#9483) * Pin the Roo provider to the top of the list (RooCodeInc#9485) * Wait to experiment until state is hydrated (RooCodeInc#9488) * Change baseten default model to glm for now (RooCodeInc#9489) * Revert "Wait to experiment until state is hydrated" (RooCodeInc#9491) * Try to fix build (RooCodeInc#9490) * Update webview-ui Vite config (RooCodeInc#9493) * Enhance native tool descriptions with examples and clarifications (RooCodeInc#9486) * Revert "Revert "Wait to experiment until state is hydrated"" (RooCodeInc#9494) * chore: add changeset and announcement for v3.34.0 (RooCodeInc#9495) * Changeset version bump (RooCodeInc#9496) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * Enable the Roo Code Cloud provider in evals (RooCodeInc#9492) * Show the prompt for image gen (RooCodeInc#9505) * feat(chat): conditionally render UpdateTodoListToolBlock based on alwaysAllowUpdateTodoList flag * fix(web-evals): update checkbox handler in new-run component * Remove double todo list (RooCodeInc#9517) * Track cloud synced messages (RooCodeInc#9518) * 3.34.1 (RooCodeInc#9522) * Changeset version bump (RooCodeInc#9523) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * fix: support reasoning_details format for Gemini 3 models (RooCodeInc#9506) * feat: update Cerebras models (RooCodeInc#9527) Co-authored-by: Roo Code <[email protected]> * fix: ensure XML parser state matches tool protocol on config update (RooCodeInc#9535) * fix: flush LiteLLM cache when credentials change on refresh (RooCodeInc#9536) * Add Roo Code Cloud as an imagegen provider (RooCodeInc#9528) * fix: gracefully skip unsupported content blocks in Gemini transformer (RooCodeInc#9537) Co-authored-by: Matt Rubens <[email protected]> * feat: add claude-opus-4.5 to OpenRouter prompt caching and reasoning budget models (RooCodeInc#9540) * feat: add claude-opus-4.5 to Anthropic and Vertex providers (RooCodeInc#9541) * Release v3.34.2 (RooCodeInc#9545) * Changeset version bump (RooCodeInc#9546) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * Add support for Roo Code Cloud as an embeddings provider (RooCodeInc#9543) * Switch from asdf to mise-en-place in bare-metal evals setup script (RooCodeInc#9548) * feat: implement streaming for native tool calls (RooCodeInc#9542) * Add Opus 4.5 to Claude Code provider (RooCodeInc#9560) * Fix ask_followup_question streaming issue and add missing tool cases (RooCodeInc#9561) * feat(auth): enhance login status logging and use dynamic plugin version * refactor: remove disable provider and add client id headers * test(webview): remove disable provider tests across multiple test files * fix: enable caching for Opus 4.5 model (RooCodeInc#9568) Added claude-opus-4-5-20251101 to the cache control switch statements to enable prompt caching, matching the behavior of other Claude models. Fixes RooCodeInc#9567 Co-authored-by: Roo Code <[email protected]> * feat: Add contact links to About Roo Code settings page (RooCodeInc#9570) * feat: add contact links to About settings page * Tweaks * i18n * Update webview-ui/src/components/settings/__tests__/About.spec.tsx Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> --------- Co-authored-by: Roo Code <[email protected]> Co-authored-by: Bruno Bergher <[email protected]> Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * feat: add Claude Opus 4.5 model to Bedrock provider (RooCodeInc#9572) Co-authored-by: Roo Code <[email protected]> Co-authored-by: Matt Rubens <[email protected]> * chore: add changeset for v3.34.3 (RooCodeInc#9578) * Changeset version bump (RooCodeInc#9579) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * fix: preserve dynamic MCP tool names in native mode API history (RooCodeInc#9559) * fix: preserve tool_use blocks in summary message during condensing with native tools (RooCodeInc#9582) * Add support for images api (RooCodeInc#9587) * Make it clear that BFL Flux 2 is free (RooCodeInc#9588) * Add BFL models to openrouter (RooCodeInc#9589) * chore: add changeset for v3.34.4 (RooCodeInc#9590) * Changeset version bump (RooCodeInc#9591) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * feat: set native tools as default for minimax-m2 and claude-haiku-4.5 (RooCodeInc#9586) * feat: enable multiple native tool calls per turn with failure guardrails (RooCodeInc#9273) * feat: add Bedrock Opus 4.5 to global inference model list (RooCodeInc#9595) Co-authored-by: Roo Code <[email protected]> * fix: update API handler when toolProtocol changes (RooCodeInc#9599) * Make single file read only apply to xml tools (RooCodeInc#9600) * Revert "Add support for Roo Code Cloud as an embeddings provider" (RooCodeInc#9602) * feat(web-evals): enhance dashboard with dynamic tool columns and UX improvements (RooCodeInc#9592) Co-authored-by: Roo Code <[email protected]> * fix(webview): pass taskId to finishSubTask when canceling or deleting tasks * chore: add changeset for v3.34.5 (RooCodeInc#9603) * Changeset version bump (RooCodeInc#9604) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * Feature/bedrock embeddings support (RooCodeInc#9475) * feat: add AWS Bedrock support for codebase indexing - Add bedrock as a new EmbedderProvider type - Add AWS Bedrock embedding model profiles (titan-embed-text models) - Create BedrockEmbedder class with support for Titan and Cohere models - Add Bedrock configuration support to config manager and interfaces - Update service factory to create BedrockEmbedder instances - Add comprehensive tests for BedrockEmbedder - Add localization strings for Bedrock support Closes RooCodeInc#8658 * fix: add missing bedrockOptions to loadConfiguration return type * Fix various issues that the original PR missed. * Remove debug logs * Rename AWS Bedrock -> Amazon Bedrock * Remove some 'as any's * Revert README changes * Add translations * More translations * Remove leftover code from a debugging session. * fix: add bedrock to codebaseIndexModelsSchema and update brace-expansion override - Add bedrock provider to codebaseIndexModelsSchema type definition to fix empty model dropdown in UI - Update pnpm override for brace-expansion from '>=2.0.2' to '^2.0.2' to resolve ESM/CommonJS compatibility issues * Improvements to AWS Bedrock embeddings support - Enhanced bedrock.ts embedder implementation - Added comprehensive test coverage in bedrock.spec.ts - Updated config-manager.ts for better Bedrock configuration handling - Improved service-factory.ts integration - Updated embeddingModels.ts with Bedrock models - Enhanced CodeIndexPopover.tsx UI for Bedrock options - Added auto-populate test for CodeIndexPopover - Updated pnpm-lock.yaml dependencies * Restore openrouter config * Remove debug log * Fix config-manager.spec.ts unit test. * Add translations for "optional" * Revert unnecessary change related to open ia embedder --------- Co-authored-by: Roo Code <[email protected]> Co-authored-by: Matt Rubens <[email protected]> Co-authored-by: Smartsheet-JB-Brown <[email protected]> * fix: restore content undefined check in WriteToFileTool.handlePartial() (RooCodeInc#9614) * fix: exclude access_mcp_resource tool when MCP has no resources (RooCodeInc#9615) * fix: prevent model cache from persisting empty API responses (RooCodeInc#9623) * fix: update default settings for inline terminal and codebase indexing (RooCodeInc#9622) Co-authored-by: Roo Code <[email protected]> * feat(mistral): add native tool calling support (RooCodeInc#9625) * feat: wire MULTIPLE_NATIVE_TOOL_CALLS experiment to OpenAI parallel_tool_calls (RooCodeInc#9621) * feat(bedrock): allow global inference selection when cross-region is enabled (RooCodeInc#9616) Co-authored-by: Roo Code <[email protected]> * fix: defer new_task tool_result until subtask completes for native protocol (RooCodeInc#9628) * fix: convert line_ranges strings to lineRanges objects in native tool calls (RooCodeInc#9627) * fix: filter non-Anthropic content blocks before sending to Vertex API (RooCodeInc#9618) * Add fine grained tool streaming for OpenRouter Anthropic (RooCodeInc#9629) * Release v3.34.6 (RooCodeInc#9631) * Changeset version bump (RooCodeInc#9632) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * fix: OpenRouter GPT-5 strict schema validation for read_file tool (RooCodeInc#9633) * fix: create parent directories early in write_to_file to prevent ENOENT errors (RooCodeInc#9640) * Fix openrouter tool calls (RooCodeInc#9642) * fix(claude-code): disable native tools and temperature support (RooCodeInc#9643) * Enable native tool calling for z.ai (RooCodeInc#9645) * Moonshot native tool call support (RooCodeInc#9646) * Support native tools in the anthropic provider (RooCodeInc#9644) Co-authored-by: Roo Code <[email protected]> * Add 'taking you to cloud' screen after provider welcome (RooCodeInc#9652) * chore: add changeset for v3.34.7 (RooCodeInc#9651) * Changeset version bump (RooCodeInc#9654) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * fix: race condition in new_task tool for native protocol (RooCodeInc#9655) The pendingNewTaskToolCallId was being set AFTER startSubtask() returned. However, startSubtask() contains a 500ms delay during which the subtask could complete. If the subtask completed during this window, completeSubtask() would be called before pendingNewTaskToolCallId was set, causing it to fall through to the XML protocol path and add a text message instead of a proper tool_result block, breaking the API conversation structure. This fix moves the pendingNewTaskToolCallId assignment to happen BEFORE calling startSubtask(), ensuring the ID is set before the subtask starts. If the subtask creation fails, the pending ID is cleared. * chore: add changeset for v3.34.8 (RooCodeInc#9657) * Changeset version bump (RooCodeInc#9658) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * feat: add model-specific tool customization via excludedTools and includedTools (RooCodeInc#9641) * feat: add model-specific tool customization via excludedTools and includedTools - Add excludedTools and includedTools to ModelInfo schema - Implement applyModelToolCustomization helper to filter tools based on model config - Integrate model tool filtering into filterNativeToolsForMode for native protocol - Add comprehensive tests for tool customization functionality - Wire up modelInfo through buildNativeToolsArray and Task.ts This allows providers to override which native tools are available on a per-model basis via MODEL_DEFAULTS, enabling better control over tool selection for models with specific needs. * feat: add customTools for opt-in only tools - Add customTools array to ToolGroupConfig for defining opt-in only tools - Update getToolsForMode() to exclude customTools from default tool set - Modify applyModelToolCustomization() to include customTools only via includedTools - Add tests for customTools functionality - Add comprehensive documentation with usage examples customTools allows defining tools that are NOT available by default, even when a mode includes their group. These tools are only available when explicitly included via a model's includedTools configuration. This enables: - Gradual rollout of experimental tools - Model-specific specialized capabilities - Safe experimentation without affecting default tool sets * Add assertions for customTools tests per review feedback * test: add tests for including customTools via includedTools * Update src/core/prompts/tools/__tests__/filter-tools-for-mode.spec.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * feat(web-evals): add task log viewing, export failed logs, and new run options (RooCodeInc#9637) Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Metadata‑driven subtasks (no UI changes): automatic parent resume and single‑open safety (RooCodeInc#9090) * feat: add search_and_replace tool for batch text replacements (RooCodeInc#9549) Co-authored-by: daniel-lxs <[email protected]> * feat: enable native tool support for DeepSeek and Doubao models (RooCodeInc#9671) Add supportsNativeTools: true to DeepSeek and Doubao model definitions, enabling native OpenAI-compatible tool calling for these providers. Both providers already extend OpenAiHandler which has built-in support for native tools, so this change is all that's needed to enable the feature. * feat: add native tool support to Requesty provider (RooCodeInc#9672) - Import resolveToolProtocol and TOOL_PROTOCOL from @roo-code/types - Add tools and tool_choice to completion params when native protocol is enabled - Handle tool_call_partial chunks in streaming response - Add comprehensive tests for native tool support * Include tool format in environment details (RooCodeInc#9661) * feat(groq): enable native tool support for models that support function calling (RooCodeInc#9673) Co-authored-by: Matt Rubens <[email protected]> * feat: add native tools support for OpenAI-compatible providers (RooCodeInc#9676) Co-authored-by: Matt Rubens <[email protected]> * feat: enable native tool calls for Vertex Gemini models (RooCodeInc#9678) Add supportsNativeTools: true to all Gemini-based models in the Vertex provider. The VertexHandler extends GeminiHandler which already has full native tool handling logic implemented. Models updated: - gemini-3-pro-preview - gemini-2.5-flash-preview-05-20:thinking - gemini-2.5-flash-preview-05-20 - gemini-2.5-flash - gemini-2.5-flash-preview-04-17:thinking - gemini-2.5-flash-preview-04-17 - gemini-2.5-pro-preview-03-25 - gemini-2.5-pro-preview-05-06 - gemini-2.5-pro-preview-06-05 - gemini-2.5-pro - gemini-2.5-pro-exp-03-25 - gemini-2.0-pro-exp-02-05 - gemini-2.0-flash-001 - gemini-2.0-flash-lite-001 - gemini-2.0-flash-thinking-exp-01-21 - gemini-1.5-flash-002 - gemini-1.5-pro-002 - gemini-2.5-flash-lite-preview-06-17 * fix: display install count in millions instead of thousands (RooCodeInc#9677) Co-authored-by: Roo Code <[email protected]> * feat: add apply_patch native tool (RooCodeInc#9663) Co-authored-by: daniel-lxs <[email protected]> * feat: add debug buttons to view API and UI history (RooCodeInc#9684) Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * test(workflow): update test expectations after removing run_test functionality --------- Co-authored-by: Daniel <[email protected]> Co-authored-by: Hannes Rudolph <[email protected]> Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Roo Code <[email protected]> Co-authored-by: Bruno Bergher <[email protected]> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: daniel-lxs <[email protected]> Co-authored-by: Chris Estreich <[email protected]> Co-authored-by: John Richmond <[email protected]> Co-authored-by: Alex Ker <[email protected]> Co-authored-by: AlexKer <[email protected]> Co-authored-by: Seb Duerr <[email protected]> Co-authored-by: George Goranov <[email protected]> Co-authored-by: Smartsheet-JB-Brown <[email protected]>

hannesrudolph requested review from cte, jr and mrubens as code owners November 26, 2025 23:57

github-project-automation bot moved this to Triage in Roo Code Roadmap Nov 26, 2025

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Nov 26, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Nov 26, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Nov 26, 2025

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Nov 27, 2025

roomote bot reviewed Nov 27, 2025

View reviewed changes

apps/web-evals/src/app/runs/[id]/run.tsx Outdated Show resolved Hide resolved

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 27, 2025

ellipsis-dev bot reviewed Nov 27, 2025

View reviewed changes

apps/web-evals/src/app/api/runs/[id]/logs/[taskId]/route.ts Outdated Show resolved Hide resolved

apps/web-evals/src/app/api/runs/[id]/logs/[taskId]/route.ts Outdated Show resolved Hide resolved

apps/web-evals/src/app/api/runs/[id]/logs/failed/route.ts Show resolved Hide resolved

hannesrudolph added 4 commits November 27, 2025 17:46

hannesrudolph force-pushed the feature/web-evals-updates branch from 70f6c8c to ed51c0e Compare November 28, 2025 00:47

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Nov 28, 2025

ellipsis-dev bot reviewed Nov 28, 2025

View reviewed changes

apps/web-evals/src/actions/runs.ts Show resolved Hide resolved

hannesrudolph and others added 2 commits November 27, 2025 20:21

Update apps/web-evals/src/app/runs/[id]/run.tsx

382d25f

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

roomote bot reviewed Nov 28, 2025

View reviewed changes

apps/web-evals/src/app/runs/[id]/run.tsx Outdated Show resolved Hide resolved

mrubens approved these changes Nov 28, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 28, 2025

hannesrudolph merged commit 3f0a697 into main Nov 28, 2025
13 checks passed

hannesrudolph deleted the feature/web-evals-updates branch November 28, 2025 04:15

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Nov 28, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(web-evals): add task log viewing, export failed logs, and new run options #9637

feat(web-evals): add task log viewing, export failed logs, and new run options #9637

Uh oh!

hannesrudolph commented Nov 26, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

roomote bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(web-evals): add task log viewing, export failed logs, and new run options #9637

feat(web-evals): add task log viewing, export failed logs, and new run options #9637

Uh oh!

Conversation

hannesrudolph commented Nov 26, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Task Log Viewing

Export Failed Logs

New Run Options

Iterations Support

Docker Configuration

Dependencies

Uh oh!

roomote bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hannesrudolph commented Nov 26, 2025 •

edited by ellipsis-dev bot

Loading

roomote bot commented Nov 26, 2025 •

edited

Loading