Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup by eavanvalkenburg · Pull Request #5424 · microsoft/agent-framework

eavanvalkenburg · 2026-04-22T10:33:03Z

Motivation and Context

While running the hyperlight CodeAct sample I hit two issues that hurt the
sandbox path:

A transient pyo3_runtime.PanicException: ... is unsendable, but sent to another thread!
from the WASM Sandbox being touched outside its creating OS thread.
The host-tool callback marshalled tool results through repr() →
Content → ast.literal_eval, which is wasteful, lossy for non-literal
types, and a needless surface area on the sandbox boundary.

While in there I also tightened the public API of
HyperlightExecuteCodeTool and the internal class layout per a small design
review.

Description

Core (agent_framework)

Added a SKIP_PARSING sentinel that callers can pass to
FunctionTool.invoke(..., result_parser=SKIP_PARSING) to bypass
Content-wrapping and return the raw function result. Default behavior is
unchanged. The signature uses @overloads so the return type is Any on
the sentinel path and list[Content] otherwise. Telemetry still records a
tool.result value on both paths (str(result) on the sentinel path,
parsed text otherwise).
Re-exported SKIP_PARSING from agent_framework.

Hyperlight (agent_framework_hyperlight)

Confined every interaction with a WasmSandbox to its creating OS thread
via a per-entry single-worker ThreadPoolExecutor (_SandboxWorker).
This eliminates the PyO3 unsendable panic when the registry is touched
from arbitrary asyncio worker threads. _SandboxRegistry.close() shuts
the workers down deterministically.
The sandbox host callback now invokes managed tools with
FunctionTool.invoke(..., result_parser=SKIP_PARSING) and returns the raw
Python object, so dict / list / primitive results round-trip natively into
the WASM guest instead of going through repr() ↔ literal_eval.
Switched the execute_code input schema from a Pydantic BaseModel to a
plain JSON-schema dict (EXECUTE_CODE_INPUT_SCHEMA), which goes through
FunctionTool's _schema_supplied fast path and removes the pydantic
import from this module.
Split the previously-shared EXECUTE_CODE_INPUT_DESCRIPTION into two
distinct constants: EXECUTE_CODE_TOOL_DESCRIPTION (tool-level fallback)
and EXECUTE_CODE_PARAM_DESCRIPTION (parameter-level for code).
Collapsed the internal _StoredFileMount dataclass into the public
FileMount NamedTuple — they had the same shape, the second type was
redundant.
Made _SandboxRegistry formally inherit from the SandboxRuntime
Protocol so the contract is enforced by type checkers.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

moonbox3 · 2026-04-22T10:37:37Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework
_tools.py	981	83	91%	219–220, 395, 397, 410, 435–437, 445, 463, 477, 484, 491, 514, 516, 523, 531, 650, 690–692, 694, 700, 752–754, 779, 805, 809, 847–849, 853, 875, 1017–1023, 1059, 1071, 1073, 1075, 1078–1081, 1102, 1106, 1110, 1124–1126, 1467, 1576–1582, 1711, 1715, 1761, 1822–1823, 1938, 1958, 1960, 2016, 2079, 2251–2252, 2272, 2328–2329, 2467–2468, 2535, 2540, 2547
packages/hyperlight/agent_framework_hyperlight
_execute_code_tool.py	495	81	83%	67, 133–134, 149, 151, 164, 182, 192, 217, 222, 229, 235, 243, 251–253, 255–260, 300, 305, 307, 309, 326–327, 336–339, 366–369, 375, 377, 387–388, 420–421, 424–425, 432, 460, 516–522, 600–601, 630, 641–643, 701, 737, 743–745, 774–778, 782–783, 788, 805–809, 813–814, 871–872
TOTAL	29070	3467	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5799	30 💤	0 ❌	0 🔥	1m 34s ⏱️

Copilot

Pull request overview

This PR improves Hyperlight sandbox stability and efficiency by thread-confining WasmSandbox access and allowing host callbacks to return raw Python tool results without Content wrapping/parsing, while also tightening the execute_code tool schema/constants.

Changes:

Added SKIP_PARSING to FunctionTool.invoke(...) to return raw tool results (and re-exported it from agent_framework), with new core tests.
Refactored Hyperlight sandbox execution to route all sandbox lifecycle/run operations through a per-entry single-thread worker to satisfy PyO3 “unsendable” thread affinity.
Simplified/optimized Hyperlight execute_code tool schema and internal types; updated docs and added regression tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
python/packages/core/agent_framework/_tools.py	Introduces `SKIP_PARSING` sentinel and updates `FunctionTool.invoke` behavior/typing.
python/packages/core/agent_framework/init.py	Re-exports `SKIP_PARSING` from the public package surface.
python/packages/core/tests/core/test_tools.py	Adds unit coverage for `SKIP_PARSING` behavior and telemetry expectations.
python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py	Adds per-sandbox worker thread confinement, uses `SKIP_PARSING` for host callbacks, and switches execute_code schema/constants.
python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py	Adds regression tests for sandbox thread affinity, registry shutdown, and native tool result passthrough.
python/packages/hyperlight/README.md	Documents sandbox tool result behavior and `result_parser` not running on the sandbox path.

Comments suppressed due to low confidence (1)

python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py:664

HyperlightExecuteCodeTool creates an internal _SandboxRegistry that now owns background worker threads, but the tool doesn't expose any public teardown path to call registry.close(). In long-running processes this can leak threads/resources for the lifetime of the tool. Consider adding a public close() (and/or async context manager) on HyperlightExecuteCodeTool that delegates to the registry when it is a _SandboxRegistry (or more generally when it has a close() attribute), so callers can deterministically release workers and temp dirs.

        self._state_lock = threading.RLock()
        self._registry = _registry or _SandboxRegistry()
        self._default_approval_mode: ApprovalMode = approval_mode or "never_require"

… fix - FunctionTool.invoke now takes a boolean skip_parsing flag instead of the SKIP_PARSING sentinel; the sentinel is still accepted as result_parser at construction time to opt out of parsing for every call. The two paths are equivalent. - _SandboxRegistry.close now invokes any sandbox close/shutdown hook on the entry's own worker thread (PyO3 unsendable), then shuts the worker down, then cleans up the per-entry temporary directories. - Clarified the _SandboxWorker.shutdown comment to describe the actual ThreadPoolExecutor.shutdown(wait=False, cancel_futures=False) semantics. - Hyperlight host callback uses skip_parsing=True (the new flag). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

After callable(configured_parser) the sentinel is already excluded; the extra identity check tripped mypy's non-overlapping identity warning. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

Automated Code Review

Reviewers: 2 | Confidence: 95%

✗ Correctness

The _build_sandbox closure introduced in _create_entry has a critical UnboundLocalError. The original sandbox = _create_sandbox() call was removed from the enclosing scope (old line 581) but was NOT re-added at the top of _build_sandbox(). Because sandbox = _create_sandbox() appears in the except block inside _build_sandbox, Python's scoping rules mark sandbox as local to the entire function. The very first line of _build_sandbox — _configure_sandbox(sandbox=sandbox, ...) — reads this local before any assignment, which will unconditionally raise UnboundLocalError at runtime. The rest of the changes (SKIP_PARSING sentinel, skip_parsing parameter, worker-thread model, input schema switch) look correct.

✗ Design Approach

The thread-affinity fix is the right general direction, and the new raw-return path in FunctionTool is a cleaner way to avoid the old repr/literal_eval round-trip. I found one design-level issue in the Hyperlight integration, though: the sandbox callback now invokes the original FunctionTool instance directly, which leaks mutable tool lifecycle state across host and in-sandbox callers.

Suggestions

Keep the new skip_parsing capability in FunctionTool, but apply it to a copied/wrapped tool inside _make_sandbox_callback() rather than invoking the shared FunctionTool instance directly.

Automated review by moonbox3's agents

github-actions

Automated Code Review

Reviewers: 3 | Confidence: 89%

✗ Correctness

The SKIP_PARSING sentinel and skip_parsing parameter in FunctionTool.invoke() are well-designed and thoroughly tested. The Hyperlight sandbox refactoring to use a dedicated _SandboxWorker per entry correctly enforces the PyO3 unsendable thread-affinity invariant, and _SandboxRegistry.close() properly tears down all per-entry resources. However, the _build_sandbox closure still contains the UnboundLocalError identified in a previously-resolved comment — the initial sandbox = _create_sandbox() call was removed from the enclosing scope but never re-added as the first statement of _build_sandbox, so the happy path always crashes because sandbox is referenced before assignment.

✗ Security Reliability

The PR adds a well-designed SKIP_PARSING sentinel and skip_parsing parameter to FunctionTool.invoke(), and refactors the Hyperlight sandbox to a single-threaded worker model honoring the PyO3 unsendable invariant. The core _tools.py changes are correct: the sentinel is a proper singleton, overload signatures are accurate, and both the observability and non-observability code paths handle skip_parsing consistently. The _SandboxWorker and _SandboxRegistry.close() implementation properly serialize sandbox access and clean up resources. However, _build_sandbox still contains the previously-identified UnboundLocalError that will crash at runtime on every call — the diff does not show the fix being applied.

✗ Design Approach

I found one blocking design issue and one API-shape concern. The thread-affinity fix itself is directionally right, but the new per-entry worker lifecycle is only implemented inside _SandboxRegistry and never owned from the public Hyperlight provider/tool flow, so the patch adds deterministic teardown in a place production callers cannot actually use. Separately, exposing SKIP_PARSING as a public FunctionTool mode makes the core tool contract inconsistent: direct callers can get raw values, but the normal agent execution path still re-normalizes those values back into Content text, so this behaves more like a caller-specific escape hatch than a coherent tool configuration.

Suggestions

SKIP_PARSING is being exposed as a general FunctionTool feature (_tools.py constructor/decorator changes and __init__.py export), but the main framework execution path still feeds invoke() results straight into Content.from_function_result (_tools.py:1443-1451), which stringifies any non-list[Content] result (_types.py:831-851). That means this is not really an end-to-end tool mode; it only changes behavior for specific in-process callers like Hyperlight. A cleaner design would be a caller-side raw API (invoke_raw/internal helper) or a Hyperlight-specific escape hatch, rather than overloading result_parser with backend-specific semantics.

Automated review by moonbox3's agents

* Bump Python package versions for 1.2.0 release Released tier bumps 1.1.1 -> 1.2.0 (core, openai, foundry, root) to reflect additive public APIs landed since 1.1.0: functional workflow API (#4238) and FunctionTool SKIP_PARSING sentinel (#5424). All beta packages stamped 1.0.0b260424, alpha packages 1.0.0a260424. All 26 non-core agent-framework-core floors raised to >=1.2.0,<2. CHANGELOG consolidates the never-tagged 1.1.1 entries with the post-merge additions into [1.2.0]. * Update CHANGELOG footer links for 1.2.0 Advance [Unreleased] comparison base from python-1.1.0 to python-1.2.0 and add a [1.2.0] reference link comparing python-1.1.0...python-1.2.0 so the heading links resolve correctly. * Fix CHANGELOG: restore [1.1.1] section and add proper [1.2.0] Previous commit incorrectly renamed the [1.1.1] header to [1.2.0], which wiped the historical 1.1.1 entries and wrongly attributed them to 1.2.0. This restores [1.1.1] to its origin/main content and adds a new [1.2.0] section above containing only the commits in python-1.1.1..HEAD: - #4238 functional workflow API - #5142 GitHub Copilot OpenTelemetry - #2403 A2A bridge support - #5070 oauth_consent_request events in Foundry clients - #5447 FoundryAgent hosted agent sessions - #5459 hosting server dependency upgrade + types - #5389 AG-UI reasoning/multimodal parsing fix - #5440 stop [TOOLBOXES] warning spam - #5455 user agent prefix fix Also corrects the [1.2.0] compare base to python-1.1.1 (not 1.1.0) and adds the missing [1.1.1] reference link.

Copilot AI review requested due to automatic review settings April 22, 2026 10:33

moonbox3 added documentation Improvements or additions to documentation python labels Apr 22, 2026

github-actions Bot changed the title ~~Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup~~ Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup Apr 22, 2026

Copilot started reviewing on behalf of eavanvalkenburg April 22, 2026 10:34 View session

improved parsing of tool call results and tweaks

21ec19b

eavanvalkenburg force-pushed the hyperlight_perf_improvement branch from 3e20ac7 to 21ec19b Compare April 22, 2026 10:35

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread python/packages/core/agent_framework/_tools.py Outdated

Comment thread python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py

Comment thread python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py Outdated

eavanvalkenburg and others added 2 commits April 22, 2026 12:53

github-actions Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py

Comment thread python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py Outdated

github-actions Bot reviewed Apr 22, 2026

View reviewed changes

moonbox3 approved these changes Apr 23, 2026

View reviewed changes

fixed sandbox working on copy of tool

503c04e

eavanvalkenburg enabled auto-merge April 23, 2026 07:46

SergeyMenshykh approved these changes Apr 23, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Apr 23, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 23, 2026

eavanvalkenburg added this pull request to the merge queue Apr 23, 2026

Merged via the queue into microsoft:main with commit 58ff4ad Apr 23, 2026
33 checks passed

This was referenced Apr 24, 2026

Bump Microsoft.Agents.AI and Microsoft.Extensions.AI ristoikonen/PDF_Llama#13

Open

Bump Microsoft.Agents.AI.OpenAI from 1.0.0-rc1 to 1.3.0 pinkroosterai/Clarive#52

Open

moonbox3 mentioned this pull request Apr 24, 2026

Python: Bump Python package versions for 1.2.0 release #5468

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup#5424

Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup#5424
eavanvalkenburg merged 4 commits intomicrosoft:mainfrom
eavanvalkenburg:hyperlight_perf_improvement

eavanvalkenburg commented Apr 22, 2026

Uh oh!

moonbox3 commented Apr 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eavanvalkenburg commented Apr 22, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✗ Correctness

✗ Design Approach

Suggestions

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✗ Correctness

✗ Security Reliability

✗ Design Approach

Suggestions

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moonbox3 commented Apr 22, 2026 •

edited

Loading