Skip to content

[Agent Builder] Add todo list tool and per-round TodosStep UI#265578

Merged
KDKHD merged 14 commits into
elastic:mainfrom
KDKHD:feature-agent-builder-todo-list
May 11, 2026
Merged

[Agent Builder] Add todo list tool and per-round TodosStep UI#265578
KDKHD merged 14 commits into
elastic:mainfrom
KDKHD:feature-agent-builder-todo-list

Conversation

@KDKHD
Copy link
Copy Markdown
Member

@KDKHD KDKHD commented Apr 24, 2026

Summary

Part of https://github.com/elastic/search-team/issues/13832

image
Screen.Recording.2026-04-24.at.17.39.10.mov
  • Adds a new platform_core_todo_write built-in tool that lets the research agent plan and track work during a conversation round
  • Introduces TodosStep as a member of ConversationRoundStep so each round permanently records its todo snapshot — visible when scrolling through conversation history
  • Incomplete todos carry over automatically from the previous round; if all todos completed, nothing carries over
  • Frontend collapses carried-over todos to a "To-dos N" header and expands with a CSS animation when the agent calls TodoWrite during the round
  • Tool and system prompt are gated behind the existing agentBuilder:experimentalFeatures UI setting

Changes

Common packages

  • agent-builder-common: Added TodosStep, isTodosStep, TodosStepData (with carried_over flag) to the ConversationRoundStep union; exported todo tool constants
  • agent-builder-server: Added todos: boolean to ExperimentalFeatures; new TodoStateManager (in-memory per-execution, initialized from conversation state); exported from runner index

Server / execution

  • New built-in tool server/services/tools/builtin/todo/todo_write.ts — replaces the full todo list on each call
  • select_tools.ts: todo tool gated behind experimentalFeatures.todos
  • run_chat_agent.ts: captures initialTodos before the round starts, passes to addRoundCompleteEvent
  • add_round_complete_event.ts: uses initialTodos as fallback TodosStep with carried_over: true when agent never called TodoWrite; helper carriedOverTodos skips carryover when all todos are done
  • run_agent.ts / runner.ts: wires todos experimental flag and todoStateManager
  • research_agent.ts: includes getTodoInstructions() prompt section when experimentalFeatures.todos is enabled
  • New prompts/utils/todos.ts: opencode-inspired task management prompt explaining when/how to use the tool, including the full-list replacement semantic

Frontend

  • New todos_step_display.tsx: collapsed header-only view when carried_over, full expanded list with expandIn keyframe animation on mount when not carried over
  • round_layout.tsx: renders TodosStepDisplay per-round between the thinking section and the response
  • use_conversation_actions.ts: addOptimisticRound seeds the new round with carried-over todos; new addOrUpdateTodosStep action flips carried_over → false triggering the expand animation
  • use_subscribe_to_chat_events.ts: wired to call addOrUpdateTodosStep on todo events
  • types.ts (conversation client): TodosStep added to PersistentConversationRoundStep for ES serialization

Tests

  • chat_message_text.test.tsx: added addOrUpdateTodosStep: jest.fn() to conversation actions mock
  • test_utils/runner.ts: added todos: false to experimentalFeatures mock

Test plan

  • Enable agentBuilder:experimentalFeatures in Kibana advanced settings
  • Start a new conversation with a multi-step task; confirm agent creates todos and they appear in the round. e.g. "Can you please search for security alerts with these keywords (using security.alerts), one by one: Virus, malware, macOS. Create a to-do list for each."
  • Start a second round with incomplete todos (by sending something like "Add windows to the list"); confirm they carry over collapsed ("To-dos N"), then expand when agent calls TodoWrite
  • Start a new round that does not modify todos (e.g. say "Thanks") - confirm todos were carried over
  • Ask the agent to "Complete the todos sequentially" - observe all todos being marked as completed
  • Say "Thanks" to the agent - observer todos are not carried over because they are all complete
  • Refresh the page; confirm the todo state and collapsed/expanded display is preserved from persisted round data
  • Disable the experimental setting; confirm the todo tool is not available, and the todo prompt section is not part of the prompt anymore (check inside of Langsmith or other observability tooling)

KDKHD added 6 commits April 24, 2026 17:49
Implements a persistent task management feature for the AI agent:

- New `platform_core_todo_write` built-in tool that lets the research agent create and update a todo list during a conversation round
- `TodoStateManager` tracks in-memory todo state per execution, initialized from the conversation's persisted state
- `TodosStep` added to `ConversationRoundStep` union so each round carries its own snapshot of todos, visible when scrolling through history
- Incomplete todos are carried over from the previous round at the start of each new round (`carried_over: true`); if all todos are complete, nothing is carried over
- Frontend: `TodosStepDisplay` renders carried-over todos as a collapsed "To-dos N" header; once the agent calls `TodoWrite` the component expands with a CSS animation
- `addOrUpdateTodosStep` in `use_conversation_actions` handles both the optimistic carryover on round start and the live expansion when the tool fires
- Tool and system prompt (based on opencode's task management instructions) are gated behind the `agentBuilder:experimentalFeatures` UI setting
- `todos: false` added to `experimentalFeatures` in test mocks
- Adds explicit rule that marking a task in_progress and completing the previous can happen in the same TodoWrite call
- Carries over todo_panel.tsx and use_todo_list.ts (unused legacy files, kept for reference)
Conflict resolution:

- x-pack/platform/plugins/shared/agent_builder/server/services/execution/run_agent/utils/select_tools.ts:
  Combined the upstream `Promise<SelectToolsResult>` return type and the new
  per-tool `withOrigin(...)` tagging with the branch's `todoStateManager` parameter
  and `todoTools` block. Todo tools are tagged `ToolOrigin.internal` to match the
  convention used for the similar builtin filestore tools.

Follow-up adjustment from the resolution:

- x-pack/platform/plugins/shared/agent_builder/server/services/execution/run_agent/utils/select_tools.test.ts (new file from upstream):
  Pass a `todoStateManager` mock to the two `selectTools` invocations so the
  newly required parameter type-checks.

Verified locally: type_check (agent_builder plugin, agent-builder-server,
agent-builder-common) green; jest (select_tools, chat_message_text) green;
eslint clean on touched files.
@KDKHD KDKHD marked this pull request as ready for review April 30, 2026 15:00
@KDKHD KDKHD requested a review from a team as a code owner April 30, 2026 15:00
Copy link
Copy Markdown
Contributor

@chrisbmar chrisbmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(frontend files review only)

Nice work Kenneth @KDKHD 🚀 super neat feature and something that will bring big benefit, especially after we have design input. Looking forward to testing this out further.

How do you see this growing in the future? Where do you envision this going to (just off the top of your head what does a good todo list experience look like to you and what are the low hanging fruits)? I think the frontend part of it is fine at the moment with the carryover logic into the next optimistic round, but I am conscious of this growing in complexity on the FE.

FYI one of us will have a conflict depending on who's PR gets merged first yours, or mine. I can help you resolve the conflict if mine gets merged first. It refactors all the streaming logic. But anyway, it should be 1 or 2 file changes and just a few lines, Claude could handle it with ease.

Comment on lines +508 to +517
/**
* Returns the todo list to carry over from the previous round, or undefined if nothing should carry over.
* Carryover only happens when at least one item is still incomplete (pending / in_progress).
* When carried over, both complete and incomplete items are included so the full plan is visible.
*/
const carriedOverTodos = (todos: TodoItem[] | undefined): TodoItem[] | undefined => {
if (!todos?.length) return undefined;
const hasIncomplete = todos.some((t) => t.status !== 'completed' && t.status !== 'cancelled');
return hasIncomplete ? todos : undefined;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to move this to somewhere shared and use it on the frontend too in use_conversation_actions so that it's not duplicated and so that there is no future drift.

@chrisbmar chrisbmar changed the title [AgentBuilder] Add todo list tool and per-round TodosStep UI [Agent Builder] Add todo list tool and per-round TodosStep UI May 6, 2026
Copy link
Copy Markdown
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A few comments and nits (i'll be faster for the second review I promise)

@@ -78,6 +77,7 @@ export enum ConversationRoundStepType {
reasoning = 'reasoning',
compaction = 'compaction',
backgroundAgentComplete = 'background_agent_complete',
todos = 'todos',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I would rename that to update_todos to be slightly more explicit

Comment on lines +70 to +72
export const todoTools = {
write: platformCoreTool('todo_write'),
} as const;
Copy link
Copy Markdown
Contributor

@pgayvallet pgayvallet May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add it to the internalTools map instead, we have too many different records already (and we can just call it write_todos - no need for prefixing)

Comment on lines +324 to +331
export type TodoStatus = 'pending' | 'in_progress' | 'completed' | 'cancelled';
export type TodoPriority = 'high' | 'medium' | 'low';

export interface TodoItem {
content: string;
status: TodoStatus;
priority: TodoPriority;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given tasks are naturally ordered, I wonder if the priority really matters, especially given it's not surfaced in the UI. Do you think it's still worth having, or should we just remove it for now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it!

Comment on lines +8 to +14
import { useState } from 'react';
import type { TodoItem } from '@kbn/agent-builder-common/chat/conversation';

export const useTodoList = (initialTodos?: TodoItem[]) => {
const [todos, setTodos] = useState<TodoItem[]>(initialTodos ?? []);
return { todos, setTodos };
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused


export const getTodoInstructions = (): string => `## TASK MANAGEMENT

You have access to the TodoWrite tool to plan and track your work. Use it frequently so the user can see what you are doing and how you are progressing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to replace TodoWrite with the actual tool name here (other references in the prompt too)

Comment on lines +91 to +100
export const TodoPanel: React.FC<TodoPanelProps> = ({ todos }) => {
if (todos.length === 0) return null;

const grouped = PRIORITY_ORDER.reduce<Record<TodoPriority, TodoItem[]>>(
(acc, p) => {
acc[p] = todos.filter((t) => t.priority === p);
return acc;
},
{ high: [], medium: [], low: [] }
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and so the whole file) looks unused

Comment on lines +157 to +162
const prevTodosStep = draft?.rounds?.at(-1)?.steps?.filter(isTodosStep)?.at(-1);
const carryoverTodos = prevTodosStep?.todos?.some(
(t) => t.status !== 'completed' && t.status !== 'cancelled'
)
? prevTodosStep.todos
: undefined;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: extract as util /move to same place isTodosStep lives?

tags: ['internal'],
handler: async ({ todos }, context) => {
todoStateManager.set(todos);
context.events.sendUiEvent('todos_updated', { todos });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just thinking) so I don't think we really needed this, because we could have have listened to tool calls events for that particular tool directly instead (in use_subscribe_to_chat_events.ts), but this works too.

@@ -286,6 +295,14 @@ const createRound = ({
const thinkingCompleteEvent = events.find(isThinkingCompleteEvent);
const promptRequestEvents = events.filter(isPromptRequestEvent);

// Collect todos_updated UI events; only the last snapshot is stored as a round step
const lastTodosData = events.reduce<TodoItem[] | undefined>((last, e) => {
if (isToolUiEvent<'todos_updated', { todos: TodoItem[] }>(e, 'todos_updated')) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the todos_updated ui event name is used in various places, let's have a const for it

Comment on lines +357 to +358
const carriedTodos = carriedOverTodos(initialTodos);
const todosForStep = lastTodosData ?? carriedTodos;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: avoid unnecessary call

const todosForStep = lastTodosData ?? carriedOverTodos(initialTodos);

@KDKHD KDKHD force-pushed the feature-agent-builder-todo-list branch from cce0031 to cb7db01 Compare May 6, 2026 11:51
@KDKHD KDKHD added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting labels May 6, 2026
KDKHD added 2 commits May 7, 2026 13:38
…er-todo-list

# Conflicts:
#	x-pack/platform/packages/shared/agent-builder/agent-builder-common/index.ts
#	x-pack/platform/plugins/shared/agent_builder/server/services/execution/run_agent/prompts/answer_agent.ts
@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented May 7, 2026

/ci

@KDKHD KDKHD requested review from chrisbmar and pgayvallet May 8, 2026 13:31
Copy link
Copy Markdown
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented May 11, 2026

@elasticmachine merge upstream

Copy link
Copy Markdown
Contributor

@chrisbmar chrisbmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets gooo

@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Scout Lane #8 - stateful-classic / default / local-stateful-classic - APM integration not installed but setup completed - Admin user
  • [job] [logs] Scout Lane #8 - stateful-classic / default / local-stateful-classic - Profiling is not setup and no data is loaded - Admin users
  • [job] [logs] Scout Lane #8 - stateful-classic / default / local-stateful-classic - Profiling is not setup and no data is loaded - Viewer users

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
agentBuilder 1528 1529 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
agentBuilder 1.1MB 1.1MB +3.2KB
Unknown metric groups

API count

id before after diff
@kbn/agent-builder-common 727 740 +13
@kbn/agent-builder-server 529 531 +2
total +15

History

@KDKHD KDKHD merged commit 4f6b28f into elastic:main May 11, 2026
32 checks passed
clintandrewhall pushed a commit that referenced this pull request May 12, 2026
## Summary

Part of elastic/search-team#13832

<img width="960" height="702" alt="image"
src="https://github.com/user-attachments/assets/2995f47e-7f33-463f-a426-2c07c337c811"
/>


https://github.com/user-attachments/assets/13cd178c-2f8c-4d30-afea-75ef460a2125

- Adds a new `platform_core_todo_write` built-in tool that lets the
research agent plan and track work during a conversation round
- Introduces `TodosStep` as a member of `ConversationRoundStep` so each
round permanently records its todo snapshot — visible when scrolling
through conversation history
- Incomplete todos carry over automatically from the previous round; if
all todos completed, nothing carries over
- Frontend collapses carried-over todos to a "To-dos N" header and
expands with a CSS animation when the agent calls `TodoWrite` during the
round
- Tool and system prompt are gated behind the existing
`agentBuilder:experimentalFeatures` UI setting

## Changes

**Common packages**
- `agent-builder-common`: Added `TodosStep`, `isTodosStep`,
`TodosStepData` (with `carried_over` flag) to the
`ConversationRoundStep` union; exported todo tool constants
- `agent-builder-server`: Added `todos: boolean` to
`ExperimentalFeatures`; new `TodoStateManager` (in-memory per-execution,
initialized from conversation state); exported from runner index

**Server / execution**
- New built-in tool `server/services/tools/builtin/todo/todo_write.ts` —
replaces the full todo list on each call
- `select_tools.ts`: todo tool gated behind `experimentalFeatures.todos`
- `run_chat_agent.ts`: captures `initialTodos` before the round starts,
passes to `addRoundCompleteEvent`
- `add_round_complete_event.ts`: uses `initialTodos` as fallback
`TodosStep` with `carried_over: true` when agent never called
`TodoWrite`; helper `carriedOverTodos` skips carryover when all todos
are done
- `run_agent.ts` / `runner.ts`: wires `todos` experimental flag and
`todoStateManager`
- `research_agent.ts`: includes `getTodoInstructions()` prompt section
when `experimentalFeatures.todos` is enabled
- New `prompts/utils/todos.ts`: opencode-inspired task management prompt
explaining when/how to use the tool, including the full-list replacement
semantic

**Frontend**
- New `todos_step_display.tsx`: collapsed header-only view when
`carried_over`, full expanded list with `expandIn` keyframe animation on
mount when not carried over
- `round_layout.tsx`: renders `TodosStepDisplay` per-round between the
thinking section and the response
- `use_conversation_actions.ts`: `addOptimisticRound` seeds the new
round with carried-over todos; new `addOrUpdateTodosStep` action flips
`carried_over → false` triggering the expand animation
- `use_subscribe_to_chat_events.ts`: wired to call
`addOrUpdateTodosStep` on todo events
- `types.ts` (conversation client): `TodosStep` added to
`PersistentConversationRoundStep` for ES serialization

**Tests**
- `chat_message_text.test.tsx`: added `addOrUpdateTodosStep: jest.fn()`
to conversation actions mock
- `test_utils/runner.ts`: added `todos: false` to `experimentalFeatures`
mock

## Test plan

- [ ] Enable `agentBuilder:experimentalFeatures` in Kibana advanced
settings
- [ ] Start a new conversation with a multi-step task; confirm agent
creates todos and they appear in the round. e.g. "Can you please search
for security alerts with these keywords (using security.alerts), one by
one: Virus, malware, macOS. Create a to-do list for each."
- [ ] Start a second round with incomplete todos (by sending something
like "Add windows to the list"); confirm they carry over collapsed
("To-dos N"), then expand when agent calls `TodoWrite`
- [ ] Start a new round that does not modify todos (e.g. say "Thanks") -
confirm todos were carried over
- [ ] Ask the agent to "Complete the todos sequentially" - observe all
todos being marked as completed
- [ ] Say "Thanks" to the agent - observer todos are not carried over
because they are all complete
- [ ] Refresh the page; confirm the todo state and collapsed/expanded
display is preserved from persisted round data
- [ ] Disable the experimental setting; confirm the todo tool is not
available, and the todo prompt section is not part of the prompt anymore
(check inside of Langsmith or other observability tooling)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
patrykkopycinski pushed a commit to patrykkopycinski/kibana that referenced this pull request May 13, 2026
…c#265578)

## Summary

Part of elastic/search-team#13832

<img width="960" height="702" alt="image"
src="https://github.com/user-attachments/assets/2995f47e-7f33-463f-a426-2c07c337c811"
/>


https://github.com/user-attachments/assets/13cd178c-2f8c-4d30-afea-75ef460a2125

- Adds a new `platform_core_todo_write` built-in tool that lets the
research agent plan and track work during a conversation round
- Introduces `TodosStep` as a member of `ConversationRoundStep` so each
round permanently records its todo snapshot — visible when scrolling
through conversation history
- Incomplete todos carry over automatically from the previous round; if
all todos completed, nothing carries over
- Frontend collapses carried-over todos to a "To-dos N" header and
expands with a CSS animation when the agent calls `TodoWrite` during the
round
- Tool and system prompt are gated behind the existing
`agentBuilder:experimentalFeatures` UI setting

## Changes

**Common packages**
- `agent-builder-common`: Added `TodosStep`, `isTodosStep`,
`TodosStepData` (with `carried_over` flag) to the
`ConversationRoundStep` union; exported todo tool constants
- `agent-builder-server`: Added `todos: boolean` to
`ExperimentalFeatures`; new `TodoStateManager` (in-memory per-execution,
initialized from conversation state); exported from runner index

**Server / execution**
- New built-in tool `server/services/tools/builtin/todo/todo_write.ts` —
replaces the full todo list on each call
- `select_tools.ts`: todo tool gated behind `experimentalFeatures.todos`
- `run_chat_agent.ts`: captures `initialTodos` before the round starts,
passes to `addRoundCompleteEvent`
- `add_round_complete_event.ts`: uses `initialTodos` as fallback
`TodosStep` with `carried_over: true` when agent never called
`TodoWrite`; helper `carriedOverTodos` skips carryover when all todos
are done
- `run_agent.ts` / `runner.ts`: wires `todos` experimental flag and
`todoStateManager`
- `research_agent.ts`: includes `getTodoInstructions()` prompt section
when `experimentalFeatures.todos` is enabled
- New `prompts/utils/todos.ts`: opencode-inspired task management prompt
explaining when/how to use the tool, including the full-list replacement
semantic

**Frontend**
- New `todos_step_display.tsx`: collapsed header-only view when
`carried_over`, full expanded list with `expandIn` keyframe animation on
mount when not carried over
- `round_layout.tsx`: renders `TodosStepDisplay` per-round between the
thinking section and the response
- `use_conversation_actions.ts`: `addOptimisticRound` seeds the new
round with carried-over todos; new `addOrUpdateTodosStep` action flips
`carried_over → false` triggering the expand animation
- `use_subscribe_to_chat_events.ts`: wired to call
`addOrUpdateTodosStep` on todo events
- `types.ts` (conversation client): `TodosStep` added to
`PersistentConversationRoundStep` for ES serialization

**Tests**
- `chat_message_text.test.tsx`: added `addOrUpdateTodosStep: jest.fn()`
to conversation actions mock
- `test_utils/runner.ts`: added `todos: false` to `experimentalFeatures`
mock

## Test plan

- [ ] Enable `agentBuilder:experimentalFeatures` in Kibana advanced
settings
- [ ] Start a new conversation with a multi-step task; confirm agent
creates todos and they appear in the round. e.g. "Can you please search
for security alerts with these keywords (using security.alerts), one by
one: Virus, malware, macOS. Create a to-do list for each."
- [ ] Start a second round with incomplete todos (by sending something
like "Add windows to the list"); confirm they carry over collapsed
("To-dos N"), then expand when agent calls `TodoWrite`
- [ ] Start a new round that does not modify todos (e.g. say "Thanks") -
confirm todos were carried over
- [ ] Ask the agent to "Complete the todos sequentially" - observe all
todos being marked as completed
- [ ] Say "Thanks" to the agent - observer todos are not carried over
because they are all complete
- [ ] Refresh the page; confirm the todo state and collapsed/expanded
display is preserved from persisted round data
- [ ] Disable the experimental setting; confirm the todo tool is not
available, and the todo prompt section is not part of the prompt anymore
(check inside of Langsmith or other observability tooling)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants