[Agent Builder] Add todo list tool and per-round TodosStep UI#265578
Conversation
Implements a persistent task management feature for the AI agent: - New `platform_core_todo_write` built-in tool that lets the research agent create and update a todo list during a conversation round - `TodoStateManager` tracks in-memory todo state per execution, initialized from the conversation's persisted state - `TodosStep` added to `ConversationRoundStep` union so each round carries its own snapshot of todos, visible when scrolling through history - Incomplete todos are carried over from the previous round at the start of each new round (`carried_over: true`); if all todos are complete, nothing is carried over - Frontend: `TodosStepDisplay` renders carried-over todos as a collapsed "To-dos N" header; once the agent calls `TodoWrite` the component expands with a CSS animation - `addOrUpdateTodosStep` in `use_conversation_actions` handles both the optimistic carryover on round start and the live expansion when the tool fires - Tool and system prompt (based on opencode's task management instructions) are gated behind the `agentBuilder:experimentalFeatures` UI setting - `todos: false` added to `experimentalFeatures` in test mocks
- Adds explicit rule that marking a task in_progress and completing the previous can happen in the same TodoWrite call - Carries over todo_panel.tsx and use_todo_list.ts (unused legacy files, kept for reference)
Conflict resolution: - x-pack/platform/plugins/shared/agent_builder/server/services/execution/run_agent/utils/select_tools.ts: Combined the upstream `Promise<SelectToolsResult>` return type and the new per-tool `withOrigin(...)` tagging with the branch's `todoStateManager` parameter and `todoTools` block. Todo tools are tagged `ToolOrigin.internal` to match the convention used for the similar builtin filestore tools. Follow-up adjustment from the resolution: - x-pack/platform/plugins/shared/agent_builder/server/services/execution/run_agent/utils/select_tools.test.ts (new file from upstream): Pass a `todoStateManager` mock to the two `selectTools` invocations so the newly required parameter type-checks. Verified locally: type_check (agent_builder plugin, agent-builder-server, agent-builder-common) green; jest (select_tools, chat_message_text) green; eslint clean on touched files.
There was a problem hiding this comment.
(frontend files review only)
Nice work Kenneth @KDKHD 🚀 super neat feature and something that will bring big benefit, especially after we have design input. Looking forward to testing this out further.
How do you see this growing in the future? Where do you envision this going to (just off the top of your head what does a good todo list experience look like to you and what are the low hanging fruits)? I think the frontend part of it is fine at the moment with the carryover logic into the next optimistic round, but I am conscious of this growing in complexity on the FE.
FYI one of us will have a conflict depending on who's PR gets merged first yours, or mine. I can help you resolve the conflict if mine gets merged first. It refactors all the streaming logic. But anyway, it should be 1 or 2 file changes and just a few lines, Claude could handle it with ease.
| /** | ||
| * Returns the todo list to carry over from the previous round, or undefined if nothing should carry over. | ||
| * Carryover only happens when at least one item is still incomplete (pending / in_progress). | ||
| * When carried over, both complete and incomplete items are included so the full plan is visible. | ||
| */ | ||
| const carriedOverTodos = (todos: TodoItem[] | undefined): TodoItem[] | undefined => { | ||
| if (!todos?.length) return undefined; | ||
| const hasIncomplete = todos.some((t) => t.status !== 'completed' && t.status !== 'cancelled'); | ||
| return hasIncomplete ? todos : undefined; | ||
| }; |
There was a problem hiding this comment.
It would be nice to move this to somewhere shared and use it on the frontend too in use_conversation_actions so that it's not duplicated and so that there is no future drift.
pgayvallet
left a comment
There was a problem hiding this comment.
Looking good. A few comments and nits (i'll be faster for the second review I promise)
| @@ -78,6 +77,7 @@ export enum ConversationRoundStepType { | |||
| reasoning = 'reasoning', | |||
| compaction = 'compaction', | |||
| backgroundAgentComplete = 'background_agent_complete', | |||
| todos = 'todos', | |||
There was a problem hiding this comment.
NIT: I would rename that to update_todos to be slightly more explicit
| export const todoTools = { | ||
| write: platformCoreTool('todo_write'), | ||
| } as const; |
There was a problem hiding this comment.
Let's add it to the internalTools map instead, we have too many different records already (and we can just call it write_todos - no need for prefixing)
| export type TodoStatus = 'pending' | 'in_progress' | 'completed' | 'cancelled'; | ||
| export type TodoPriority = 'high' | 'medium' | 'low'; | ||
|
|
||
| export interface TodoItem { | ||
| content: string; | ||
| status: TodoStatus; | ||
| priority: TodoPriority; | ||
| } |
There was a problem hiding this comment.
Given tasks are naturally ordered, I wonder if the priority really matters, especially given it's not surfaced in the UI. Do you think it's still worth having, or should we just remove it for now?
| import { useState } from 'react'; | ||
| import type { TodoItem } from '@kbn/agent-builder-common/chat/conversation'; | ||
|
|
||
| export const useTodoList = (initialTodos?: TodoItem[]) => { | ||
| const [todos, setTodos] = useState<TodoItem[]>(initialTodos ?? []); | ||
| return { todos, setTodos }; | ||
| }; |
|
|
||
| export const getTodoInstructions = (): string => `## TASK MANAGEMENT | ||
|
|
||
| You have access to the TodoWrite tool to plan and track your work. Use it frequently so the user can see what you are doing and how you are progressing. |
There was a problem hiding this comment.
need to replace TodoWrite with the actual tool name here (other references in the prompt too)
| export const TodoPanel: React.FC<TodoPanelProps> = ({ todos }) => { | ||
| if (todos.length === 0) return null; | ||
|
|
||
| const grouped = PRIORITY_ORDER.reduce<Record<TodoPriority, TodoItem[]>>( | ||
| (acc, p) => { | ||
| acc[p] = todos.filter((t) => t.priority === p); | ||
| return acc; | ||
| }, | ||
| { high: [], medium: [], low: [] } | ||
| ); |
There was a problem hiding this comment.
This (and so the whole file) looks unused
| const prevTodosStep = draft?.rounds?.at(-1)?.steps?.filter(isTodosStep)?.at(-1); | ||
| const carryoverTodos = prevTodosStep?.todos?.some( | ||
| (t) => t.status !== 'completed' && t.status !== 'cancelled' | ||
| ) | ||
| ? prevTodosStep.todos | ||
| : undefined; |
There was a problem hiding this comment.
NIT: extract as util /move to same place isTodosStep lives?
| tags: ['internal'], | ||
| handler: async ({ todos }, context) => { | ||
| todoStateManager.set(todos); | ||
| context.events.sendUiEvent('todos_updated', { todos }); |
There was a problem hiding this comment.
(just thinking) so I don't think we really needed this, because we could have have listened to tool calls events for that particular tool directly instead (in use_subscribe_to_chat_events.ts), but this works too.
| @@ -286,6 +295,14 @@ const createRound = ({ | |||
| const thinkingCompleteEvent = events.find(isThinkingCompleteEvent); | |||
| const promptRequestEvents = events.filter(isPromptRequestEvent); | |||
|
|
|||
| // Collect todos_updated UI events; only the last snapshot is stored as a round step | |||
| const lastTodosData = events.reduce<TodoItem[] | undefined>((last, e) => { | |||
| if (isToolUiEvent<'todos_updated', { todos: TodoItem[] }>(e, 'todos_updated')) { | |||
There was a problem hiding this comment.
the todos_updated ui event name is used in various places, let's have a const for it
| const carriedTodos = carriedOverTodos(initialTodos); | ||
| const todosForStep = lastTodosData ?? carriedTodos; |
There was a problem hiding this comment.
NIT: avoid unnecessary call
const todosForStep = lastTodosData ?? carriedOverTodos(initialTodos);cce0031 to
cb7db01
Compare
…bana into feature-agent-builder-todo-list
…er-todo-list # Conflicts: # x-pack/platform/packages/shared/agent-builder/agent-builder-common/index.ts # x-pack/platform/plugins/shared/agent_builder/server/services/execution/run_agent/prompts/answer_agent.ts
…bana into feature-agent-builder-todo-list
|
/ci |
|
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Async chunks
Unknown metric groupsAPI count
History
|
## Summary Part of elastic/search-team#13832 <img width="960" height="702" alt="image" src="https://github.com/user-attachments/assets/2995f47e-7f33-463f-a426-2c07c337c811" /> https://github.com/user-attachments/assets/13cd178c-2f8c-4d30-afea-75ef460a2125 - Adds a new `platform_core_todo_write` built-in tool that lets the research agent plan and track work during a conversation round - Introduces `TodosStep` as a member of `ConversationRoundStep` so each round permanently records its todo snapshot — visible when scrolling through conversation history - Incomplete todos carry over automatically from the previous round; if all todos completed, nothing carries over - Frontend collapses carried-over todos to a "To-dos N" header and expands with a CSS animation when the agent calls `TodoWrite` during the round - Tool and system prompt are gated behind the existing `agentBuilder:experimentalFeatures` UI setting ## Changes **Common packages** - `agent-builder-common`: Added `TodosStep`, `isTodosStep`, `TodosStepData` (with `carried_over` flag) to the `ConversationRoundStep` union; exported todo tool constants - `agent-builder-server`: Added `todos: boolean` to `ExperimentalFeatures`; new `TodoStateManager` (in-memory per-execution, initialized from conversation state); exported from runner index **Server / execution** - New built-in tool `server/services/tools/builtin/todo/todo_write.ts` — replaces the full todo list on each call - `select_tools.ts`: todo tool gated behind `experimentalFeatures.todos` - `run_chat_agent.ts`: captures `initialTodos` before the round starts, passes to `addRoundCompleteEvent` - `add_round_complete_event.ts`: uses `initialTodos` as fallback `TodosStep` with `carried_over: true` when agent never called `TodoWrite`; helper `carriedOverTodos` skips carryover when all todos are done - `run_agent.ts` / `runner.ts`: wires `todos` experimental flag and `todoStateManager` - `research_agent.ts`: includes `getTodoInstructions()` prompt section when `experimentalFeatures.todos` is enabled - New `prompts/utils/todos.ts`: opencode-inspired task management prompt explaining when/how to use the tool, including the full-list replacement semantic **Frontend** - New `todos_step_display.tsx`: collapsed header-only view when `carried_over`, full expanded list with `expandIn` keyframe animation on mount when not carried over - `round_layout.tsx`: renders `TodosStepDisplay` per-round between the thinking section and the response - `use_conversation_actions.ts`: `addOptimisticRound` seeds the new round with carried-over todos; new `addOrUpdateTodosStep` action flips `carried_over → false` triggering the expand animation - `use_subscribe_to_chat_events.ts`: wired to call `addOrUpdateTodosStep` on todo events - `types.ts` (conversation client): `TodosStep` added to `PersistentConversationRoundStep` for ES serialization **Tests** - `chat_message_text.test.tsx`: added `addOrUpdateTodosStep: jest.fn()` to conversation actions mock - `test_utils/runner.ts`: added `todos: false` to `experimentalFeatures` mock ## Test plan - [ ] Enable `agentBuilder:experimentalFeatures` in Kibana advanced settings - [ ] Start a new conversation with a multi-step task; confirm agent creates todos and they appear in the round. e.g. "Can you please search for security alerts with these keywords (using security.alerts), one by one: Virus, malware, macOS. Create a to-do list for each." - [ ] Start a second round with incomplete todos (by sending something like "Add windows to the list"); confirm they carry over collapsed ("To-dos N"), then expand when agent calls `TodoWrite` - [ ] Start a new round that does not modify todos (e.g. say "Thanks") - confirm todos were carried over - [ ] Ask the agent to "Complete the todos sequentially" - observe all todos being marked as completed - [ ] Say "Thanks" to the agent - observer todos are not carried over because they are all complete - [ ] Refresh the page; confirm the todo state and collapsed/expanded display is preserved from persisted round data - [ ] Disable the experimental setting; confirm the todo tool is not available, and the todo prompt section is not part of the prompt anymore (check inside of Langsmith or other observability tooling) --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
…c#265578) ## Summary Part of elastic/search-team#13832 <img width="960" height="702" alt="image" src="https://github.com/user-attachments/assets/2995f47e-7f33-463f-a426-2c07c337c811" /> https://github.com/user-attachments/assets/13cd178c-2f8c-4d30-afea-75ef460a2125 - Adds a new `platform_core_todo_write` built-in tool that lets the research agent plan and track work during a conversation round - Introduces `TodosStep` as a member of `ConversationRoundStep` so each round permanently records its todo snapshot — visible when scrolling through conversation history - Incomplete todos carry over automatically from the previous round; if all todos completed, nothing carries over - Frontend collapses carried-over todos to a "To-dos N" header and expands with a CSS animation when the agent calls `TodoWrite` during the round - Tool and system prompt are gated behind the existing `agentBuilder:experimentalFeatures` UI setting ## Changes **Common packages** - `agent-builder-common`: Added `TodosStep`, `isTodosStep`, `TodosStepData` (with `carried_over` flag) to the `ConversationRoundStep` union; exported todo tool constants - `agent-builder-server`: Added `todos: boolean` to `ExperimentalFeatures`; new `TodoStateManager` (in-memory per-execution, initialized from conversation state); exported from runner index **Server / execution** - New built-in tool `server/services/tools/builtin/todo/todo_write.ts` — replaces the full todo list on each call - `select_tools.ts`: todo tool gated behind `experimentalFeatures.todos` - `run_chat_agent.ts`: captures `initialTodos` before the round starts, passes to `addRoundCompleteEvent` - `add_round_complete_event.ts`: uses `initialTodos` as fallback `TodosStep` with `carried_over: true` when agent never called `TodoWrite`; helper `carriedOverTodos` skips carryover when all todos are done - `run_agent.ts` / `runner.ts`: wires `todos` experimental flag and `todoStateManager` - `research_agent.ts`: includes `getTodoInstructions()` prompt section when `experimentalFeatures.todos` is enabled - New `prompts/utils/todos.ts`: opencode-inspired task management prompt explaining when/how to use the tool, including the full-list replacement semantic **Frontend** - New `todos_step_display.tsx`: collapsed header-only view when `carried_over`, full expanded list with `expandIn` keyframe animation on mount when not carried over - `round_layout.tsx`: renders `TodosStepDisplay` per-round between the thinking section and the response - `use_conversation_actions.ts`: `addOptimisticRound` seeds the new round with carried-over todos; new `addOrUpdateTodosStep` action flips `carried_over → false` triggering the expand animation - `use_subscribe_to_chat_events.ts`: wired to call `addOrUpdateTodosStep` on todo events - `types.ts` (conversation client): `TodosStep` added to `PersistentConversationRoundStep` for ES serialization **Tests** - `chat_message_text.test.tsx`: added `addOrUpdateTodosStep: jest.fn()` to conversation actions mock - `test_utils/runner.ts`: added `todos: false` to `experimentalFeatures` mock ## Test plan - [ ] Enable `agentBuilder:experimentalFeatures` in Kibana advanced settings - [ ] Start a new conversation with a multi-step task; confirm agent creates todos and they appear in the round. e.g. "Can you please search for security alerts with these keywords (using security.alerts), one by one: Virus, malware, macOS. Create a to-do list for each." - [ ] Start a second round with incomplete todos (by sending something like "Add windows to the list"); confirm they carry over collapsed ("To-dos N"), then expand when agent calls `TodoWrite` - [ ] Start a new round that does not modify todos (e.g. say "Thanks") - confirm todos were carried over - [ ] Ask the agent to "Complete the todos sequentially" - observe all todos being marked as completed - [ ] Say "Thanks" to the agent - observer todos are not carried over because they are all complete - [ ] Refresh the page; confirm the todo state and collapsed/expanded display is preserved from persisted round data - [ ] Disable the experimental setting; confirm the todo tool is not available, and the todo prompt section is not part of the prompt anymore (check inside of Langsmith or other observability tooling) --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Summary
Part of https://github.com/elastic/search-team/issues/13832
Screen.Recording.2026-04-24.at.17.39.10.mov
platform_core_todo_writebuilt-in tool that lets the research agent plan and track work during a conversation roundTodosStepas a member ofConversationRoundStepso each round permanently records its todo snapshot — visible when scrolling through conversation historyTodoWriteduring the roundagentBuilder:experimentalFeaturesUI settingChanges
Common packages
agent-builder-common: AddedTodosStep,isTodosStep,TodosStepData(withcarried_overflag) to theConversationRoundStepunion; exported todo tool constantsagent-builder-server: Addedtodos: booleantoExperimentalFeatures; newTodoStateManager(in-memory per-execution, initialized from conversation state); exported from runner indexServer / execution
server/services/tools/builtin/todo/todo_write.ts— replaces the full todo list on each callselect_tools.ts: todo tool gated behindexperimentalFeatures.todosrun_chat_agent.ts: capturesinitialTodosbefore the round starts, passes toaddRoundCompleteEventadd_round_complete_event.ts: usesinitialTodosas fallbackTodosStepwithcarried_over: truewhen agent never calledTodoWrite; helpercarriedOverTodosskips carryover when all todos are donerun_agent.ts/runner.ts: wirestodosexperimental flag andtodoStateManagerresearch_agent.ts: includesgetTodoInstructions()prompt section whenexperimentalFeatures.todosis enabledprompts/utils/todos.ts: opencode-inspired task management prompt explaining when/how to use the tool, including the full-list replacement semanticFrontend
todos_step_display.tsx: collapsed header-only view whencarried_over, full expanded list withexpandInkeyframe animation on mount when not carried overround_layout.tsx: rendersTodosStepDisplayper-round between the thinking section and the responseuse_conversation_actions.ts:addOptimisticRoundseeds the new round with carried-over todos; newaddOrUpdateTodosStepaction flipscarried_over → falsetriggering the expand animationuse_subscribe_to_chat_events.ts: wired to calladdOrUpdateTodosStepon todo eventstypes.ts(conversation client):TodosStepadded toPersistentConversationRoundStepfor ES serializationTests
chat_message_text.test.tsx: addedaddOrUpdateTodosStep: jest.fn()to conversation actions mocktest_utils/runner.ts: addedtodos: falsetoexperimentalFeaturesmockTest plan
agentBuilder:experimentalFeaturesin Kibana advanced settingsTodoWrite