Python: Add Handoff orchestration pattern support #1469

moonbox3 · 2025-10-15T01:37:34Z

Motivation and Context

Introduce a first-class handoff workflow pattern where a coordinator agent routes conversations to specialist agents in a cyclical flow back to the user.
Provide a reusable pattern for AI support desks, escalation workflows, and multi-agent routing scenarios.
Preserve full conversation context including metadata across all handoffs while supporting optional checkpointing for resumable workflows.
Enable both single-tier (coordinator-to-specialist) and multi-tier (specialist-to-specialist) routing patterns.

Description

Added HandoffBuilder: Fluent API for building cyclical workflows with .coordinator(), .add_handoff(), and .with_termination_condition() methods.
Auto-generated handoff tools: Framework automatically creates handoff_to_<agent> tools and intercepts their invocation via middleware to route between agents.
Conversation preservation: Full conversation history (including ChatMessage.additional_properties) is maintained across all hops, with automatic tool call cleanup during agent transitions to avoid OpenAI API errors.
Multi-tier routing support: Optional .add_handoff() configuration enables specialist-to-specialist handoffs for complex workflows.
Clean API: Removed legacy starting_agent() method in favor of clearer .coordinator() terminology.
Sample updates: Added handoff_simple.py and handoff_specialist_to_specialist.py demonstrating single-tier and multi-tier patterns with scripted response loops.
Comprehensive tests: Regression tests verify metadata preservation, specialist-to-specialist routing, text-based handoff detection, and conversation isolation across multiple runs.
Closes Python: Add support for the HandoffBuilder #497

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot

Pull Request Overview

This PR introduces a new handoff orchestration pattern for Python workflows that allows triage agents to route conversations to specialists and maintain cyclical user interaction. The handoff pattern provides structured routing between agents with full conversation context preservation and configurable termination conditions.

Key Changes:

Added HandoffBuilder API for creating triage-to-specialist workflows with automatic context management
Implemented conversation state persistence utilities that preserve message metadata across agent handoffs
Created comprehensive samples demonstrating basic usage, context windows, and custom routing resolvers

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`python/packages/core/agent_framework/_workflows/_handoff.py`	Core handoff workflow implementation with coordinator, gateway, and builder classes
`python/packages/core/agent_framework/_workflows/_conversation_state.py`	Utilities for serializing/deserializing chat conversations with metadata preservation
`python/packages/core/agent_framework/_workflows/__init__.py`	Export new handoff classes in public API
`python/packages/core/tests/workflow/test_handoff.py`	Comprehensive test coverage for handoff functionality and metadata preservation
`python/samples/getting_started/workflows/orchestration/handoff_agents.py`	Basic handoff workflow sample with triage and specialist agents
`python/samples/getting_started/workflows/orchestration/handoff_with_context_window.py`	Sample demonstrating rolling context window for token efficiency
`python/samples/getting_started/workflows/orchestration/handoff_with_custom_resolver.py`	Advanced sample using Pydantic models for structured output routing
`python/samples/getting_started/workflows/README.md`	Documentation updates describing handoff pattern and usage tips

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

python/packages/core/agent_framework/_workflows/_handoff.py

python/samples/getting_started/workflows/orchestration/handoff_with_custom_resolver.py

markwallace-microsoft · 2025-10-15T01:39:02Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework/_workflows
_agent.py	241	48	80%	54, 62–65, 93–94, 234, 242–248, 265, 306–309, 315, 321, 325–326, 329–335, 339–340, 379, 386, 392–393, 399, 411, 443, 450, 471, 478, 482, 484–486, 493
_conversation_state.py	39	28	28%	37–40, 42–49, 51, 53–59, 61–66, 68, 77
_handoff.py	484	147	69%	56, 58, 64–66, 73–74, 76, 78, 83–84, 86, 135, 143–148, 151–152, 166, 175, 194–197, 206–208, 219, 222, 232–243, 245, 251, 257, 295–297, 331, 336–338, 374, 383–388, 390–391, 402, 419, 492–495, 507, 541, 573–575, 578–581, 583–585, 815, 824, 828, 833, 890, 893, 981, 986, 996, 1002–1005, 1013–1014, 1018–1020, 1022–1032, 1034–1035, 1037, 1039, 1054–1055, 1058–1059, 1062, 1082–1088, 1090, 1096, 1130–1131, 1184–1185, 1282, 1290, 1299–1302, 1309, 1344, 1351, 1355, 1359, 1364, 1368, 1373–1374
TOTAL	11667	1954	83%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
1359	98 💤	0 ❌	0 🔥	27.886s ⏱️

python/samples/getting_started/workflows/orchestration/handoff_simple.py

python/samples/getting_started/workflows/orchestration/handoff_with_context_window.py

python/samples/getting_started/workflows/orchestration/handoff_with_custom_resolver.py

python/packages/core/agent_framework/_workflows/_handoff.py

python/samples/getting_started/workflows/orchestration/handoff_specialist_to_specialist.py

james-tn · 2025-10-16T22:03:20Z

Proposal: Lazy Intent Classification for Handoff Workflows

The Key Idea

I'd like to propose an alternative handoff pattern that uses lazy intent classification—only triggered when a specialist agent signals they can't help—rather than having a coordinator mediate every turn.

The core insight: specialists know their own boundaries through their system prompt and signal when they're out of scope. The system then classifies intent and routes to the appropriate specialist.

How It Works

Each specialist has boundary instructions in their prompt:

You are the Billing Specialist for Contoso support.

Your expertise: subscriptions, invoices, payments, account adjustments.

IMPORTANT: If the user asks about anything outside your domain, 
respond with this EXACT phrase:
"This is outside my area. Let me connect you with the right specialist."

Otherwise, handle billing questions directly using your tools.

When the system detects this phrase in a response:

Extract the original user request
Run intent classification to determine the correct domain
Route conversation to the new specialist
Transfer relevant context (configurable: N turns or full history)

Otherwise, the specialist communicates directly with the user. No intermediary, no overhead. Classification only happens when needed (first message or handoff signal).

Why This Scales

The critical advantage here is scalability. Each specialist only needs to know their own boundaries—not the capabilities of other specialists. When you add a new specialist, you just define their domain and tools. No need to update coordination logic or make other specialists aware of the new addition. This makes the system scalable to a large number of specialist agents without growing complexity.

Other benefits:

No coordinator overhead on every turn
Fewer LLM API calls (around 40% reduction in my testing)
Specialists stream directly to users for natural conversation flow
Lower latency since most interactions bypass classification

Flow Comparison

Current pattern (coordinator-mediated):

User: "I can't log in"
  → Coordinator analyzes → Routes to Security
  → Security responds → Coordinator
  → Coordinator requests user input
User: "What's my bill?"
  → Coordinator analyzes → Routes to Billing

Every turn goes through coordinator.

Proposed pattern (lazy classification):

User: "I can't log in"
  → [Classify once] → Security Specialist
  → Security ↔ User (direct)
User: "What's my bill?"
  → Security: "This is outside my area. Let me connect you..."
  → [Detect phrase] → [Classify] → Billing Specialist
  → Billing ↔ User (direct)

Classification only on entry and handoff signals.

When to Use Each?

Coordinator-mediated works well for:

Complex routing rules requiring centralized control
Multi-tier specialist-to-specialist handoffs
Strong governance/audit requirements
Human-in-the-loop approval workflows

Lazy classification works well for:

Customer support with clear domain specialists
Low latency and natural conversation prioritized
Specialists that can self-identify boundaries via prompts
Token efficiency matters

Implementation Question

Should this be added as an optional mode to HandoffBuilder?

workflow = (
    HandoffBuilder(participants=[billing, security, products])
    .coordinator("billing")  # Starting agent
    .with_routing_mode("lazy_classification")  # vs "coordinator_mediated"
    .with_handoff_phrase("This is outside my area")  # Configurable
    .with_context_transfer(turns=3)  # How much history on handoff
    .build()
)

Or should it be a separate builder class altogether?

Real Results and Reference Implementation

I've implemented this pattern for customer support with 3 domain specialists and seen:

Reduction in LLM API calls vs coordinator-mediated
Reliable handoffs using prompt-based boundary recognition
Easy to scale (specialists don't need to know about each other)

For reference, here is my implementation which does not use the workflow framework:
OpenAIWorkshop/agentic_ai/agents/agent_framework/multi_agent/HANDOFF_README.md

Discussion

Would love feedback on:

Should this be integrated into HandoffBuilder or kept separate?
Is "lazy_classification" a clear name for this mode?
Should the handoff phrase be standardized or configurable?
Any concerns about relying on prompt-based boundary recognition?

The key advantage is simplicity. Most of the time, specialists just talk directly to users. No third-party monitoring system needed. Adding new specialists doesn't require updating coordination logic—just define their domain and boundaries.

What do you think? Does this resonate with use cases you're working on?

python/samples/getting_started/workflows/orchestration/handoff_specialist_to_specialist.py

python/test_specialist_handoff.py

python/samples/getting_started/workflows/orchestration/handoff_simple.py

moonbox3 · 2025-10-17T00:05:40Z

Proposal: Lazy Intent Classification for Handoff Workflows

The Key Idea

I'd like to propose an alternative handoff pattern that uses lazy intent classification—only triggered when a specialist agent signals they can't help—rather than having a coordinator mediate every turn.

The core insight: specialists know their own boundaries through their system prompt and signal when they're out of scope. The system then classifies intent and routes to the appropriate specialist.

How It Works

Each specialist has boundary instructions in their prompt:
You are the Billing Specialist for Contoso support.

Your expertise: subscriptions, invoices, payments, account adjustments.

IMPORTANT: If the user asks about anything outside your domain, 
respond with this EXACT phrase:
"This is outside my area. Let me connect you with the right specialist."

Otherwise, handle billing questions directly using your tools.
When the system detects this phrase in a response:

Extract the original user request

Run intent classification to determine the correct domain

Route conversation to the new specialist

Transfer relevant context (configurable: N turns or full history)

Otherwise, the specialist communicates directly with the user. No intermediary, no overhead. Classification only happens when needed (first message or handoff signal).

Why This Scales

The critical advantage here is scalability. Each specialist only needs to know their own boundaries—not the capabilities of other specialists. When you add a new specialist, you just define their domain and tools. No need to update coordination logic or make other specialists aware of the new addition. This makes the system scalable to a large number of specialist agents without growing complexity.

Other benefits:

No coordinator overhead on every turn

Fewer LLM API calls (around 40% reduction in my testing)

Specialists stream directly to users for natural conversation flow

Lower latency since most interactions bypass classification

Flow Comparison

Current pattern (coordinator-mediated):
User: "I can't log in"
  → Coordinator analyzes → Routes to Security
  → Security responds → Coordinator
  → Coordinator requests user input
User: "What's my bill?"
  → Coordinator analyzes → Routes to Billing
Every turn goes through coordinator.

Proposed pattern (lazy classification):
User: "I can't log in"
  → [Classify once] → Security Specialist
  → Security ↔ User (direct)
User: "What's my bill?"
  → Security: "This is outside my area. Let me connect you..."
  → [Detect phrase] → [Classify] → Billing Specialist
  → Billing ↔ User (direct)
Classification only on entry and handoff signals.

When to Use Each?

Coordinator-mediated works well for:

Complex routing rules requiring centralized control

Multi-tier specialist-to-specialist handoffs

Strong governance/audit requirements

Human-in-the-loop approval workflows

Lazy classification works well for:

Customer support with clear domain specialists

Low latency and natural conversation prioritized

Specialists that can self-identify boundaries via prompts

Token efficiency matters

Implementation Question

Should this be added as an optional mode to HandoffBuilder?
workflow = (
    HandoffBuilder(participants=[billing, security, products])
    .coordinator("billing")  # Starting agent
    .with_routing_mode("lazy_classification")  # vs "coordinator_mediated"
    .with_handoff_phrase("This is outside my area")  # Configurable
    .with_context_transfer(turns=3)  # How much history on handoff
    .build()
)
Or should it be a separate builder class altogether?

Real Results and Reference Implementation

I've implemented this pattern for customer support with 3 domain specialists and seen:

Reduction in LLM API calls vs coordinator-mediated

Reliable handoffs using prompt-based boundary recognition

Easy to scale (specialists don't need to know about each other)

For reference, here is my implementation which does not use the workflow framework: OpenAIWorkshop/agentic_ai/agents/agent_framework/multi_agent/HANDOFF_README.md

Discussion

Would love feedback on:

Should this be integrated into HandoffBuilder or kept separate?

Is "lazy_classification" a clear name for this mode?

Should the handoff phrase be standardized or configurable?

Any concerns about relying on prompt-based boundary recognition?

The key advantage is simplicity. Most of the time, specialists just talk directly to users. No third-party monitoring system needed. Adding new specialists doesn't require updating coordination logic—just define their domain and boundaries.

What do you think? Does this resonate with use cases you're working on?

Thanks for the suggestion, @james-tn. We appreciate it. It looks like this can be an improvement around the current coordinator-style that we're looking to introduce. We wanted to get the "easiest" pattern out first for feedback, and then we can look at evolving it further via improvements like the one you suggested.

…-framework into workflow-handoff-20251014

python/packages/core/agent_framework/_workflows/_handoff.py

python/packages/core/agent_framework/_workflows/_conversation_state.py

* Add Handoff orchestration pattern support * PR feedback * Use AOAI client in samples * Adjust to tool * Handoff to sub-agent via ai function * PR feedback * More cleanup * Improvements * PR feedback cleanup * Add handoff migration sample. * Remove type ignore * fix markdown link formatting * Remove readme link for non-existent sample

Add Handoff orchestration pattern support

8fbc4a1

moonbox3 self-assigned this Oct 15, 2025

Copilot AI review requested due to automatic review settings October 15, 2025 01:37

moonbox3 added python squad: workflows Agent Framework Workflows Squad agent orchestration Issues related to agent orchestration workflows Related to Workflows in agent-framework labels Oct 15, 2025

markwallace-microsoft added the documentation Improvements or additions to documentation label Oct 15, 2025

Copilot AI reviewed Oct 15, 2025

View reviewed changes

PR feedback

f8c3b14

moonbox3 mentioned this pull request Oct 15, 2025

Python: Handoff Workflow Orchestration missing #1157

Closed

Use AOAI client in samples

e333dd9

ekzhu reviewed Oct 15, 2025

View reviewed changes

moonbox3 marked this pull request as draft October 15, 2025 11:21

moonbox3 changed the title ~~Python: Add Handoff orchestration pattern support~~ [wip] Python: Add Handoff orchestration pattern support Oct 15, 2025

moonbox3 added 2 commits October 16, 2025 08:12

Adjust to tool

f7e105f

Handoff to sub-agent via ai function

07abcbb

ekzhu reviewed Oct 16, 2025

View reviewed changes

python/packages/core/agent_framework/_workflows/_handoff.py Outdated Show resolved Hide resolved

ekzhu reviewed Oct 16, 2025

View reviewed changes

python/samples/getting_started/workflows/orchestration/handoff_specialist_to_specialist.py Outdated Show resolved Hide resolved

moonbox3 added 2 commits October 16, 2025 14:12

PR feedback

0b6d984

More cleanup

639735e

moonbox3 marked this pull request as ready for review October 16, 2025 05:19

Merge branch 'main' into workflow-handoff-20251014

a4f0103

moonbox3 changed the title ~~[wip] Python: Add Handoff orchestration pattern support~~ Python: Add Handoff orchestration pattern support Oct 16, 2025

Merge branch 'main' into workflow-handoff-20251014

5223f12

ekzhu reviewed Oct 16, 2025

View reviewed changes

moonbox3 added 2 commits October 17, 2025 09:21

Improvements

6e4c9c5

Merge branch 'workflow-handoff-20251014' of github.com:moonbox3/agent…

f271af0

…-framework into workflow-handoff-20251014

moonbox3 requested a review from ekzhu October 17, 2025 00:23

ekzhu approved these changes Oct 20, 2025

View reviewed changes

TaoChenOSU reviewed Oct 20, 2025

View reviewed changes

moonbox3 added 3 commits October 21, 2025 08:55

Merge branch 'main' into workflow-handoff-20251014

3445ac3

PR feedback cleanup

13a86ba

Add handoff migration sample.

5714fea

TaoChenOSU reviewed Oct 22, 2025

View reviewed changes

python/packages/core/agent_framework/_workflows/_conversation_state.py Outdated Show resolved Hide resolved

moonbox3 and others added 4 commits October 22, 2025 09:34

Remove type ignore

7462eca

fix markdown link formatting

be475c1

Remove readme link for non-existent sample

00b7ccb

Merge branch 'main' into workflow-handoff-20251014

cc26704

TaoChenOSU approved these changes Oct 22, 2025

View reviewed changes

moonbox3 added this pull request to the merge queue Oct 22, 2025

victordibia approved these changes Oct 22, 2025

View reviewed changes

Merged via the queue into microsoft:main with commit b66619a Oct 22, 2025
20 checks passed

Python: Add Handoff orchestration pattern support #1469

Python: Add Handoff orchestration pattern support #1469

Conversation

moonbox3 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

james-tn commented Oct 16, 2025

Proposal: Lazy Intent Classification for Handoff Workflows

The Key Idea

How It Works

Why This Scales

Flow Comparison

When to Use Each?

Implementation Question

Real Results and Reference Implementation

Discussion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 commented Oct 17, 2025

Proposal: Lazy Intent Classification for Handoff Workflows

The Key Idea

How It Works

Why This Scales

Flow Comparison

When to Use Each?

Implementation Question

Real Results and Reference Implementation

Discussion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

moonbox3 commented Oct 15, 2025 •

edited

Loading

markwallace-microsoft commented Oct 15, 2025 •

edited

Loading