Skip to content

Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup#5424

Merged
eavanvalkenburg merged 4 commits intomicrosoft:mainfrom
eavanvalkenburg:hyperlight_perf_improvement
Apr 23, 2026
Merged

Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup#5424
eavanvalkenburg merged 4 commits intomicrosoft:mainfrom
eavanvalkenburg:hyperlight_perf_improvement

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

Motivation and Context

While running the hyperlight CodeAct sample I hit two issues that hurt the
sandbox path:

  1. A transient pyo3_runtime.PanicException: ... is unsendable, but sent to another thread!
    from the WASM Sandbox being touched outside its creating OS thread.
  2. The host-tool callback marshalled tool results through repr()
    Contentast.literal_eval, which is wasteful, lossy for non-literal
    types, and a needless surface area on the sandbox boundary.

While in there I also tightened the public API of
HyperlightExecuteCodeTool and the internal class layout per a small design
review.

Description

Core (agent_framework)

  • Added a SKIP_PARSING sentinel that callers can pass to
    FunctionTool.invoke(..., result_parser=SKIP_PARSING) to bypass
    Content-wrapping and return the raw function result. Default behavior is
    unchanged. The signature uses @overloads so the return type is Any on
    the sentinel path and list[Content] otherwise. Telemetry still records a
    tool.result value on both paths (str(result) on the sentinel path,
    parsed text otherwise).
  • Re-exported SKIP_PARSING from agent_framework.

Hyperlight (agent_framework_hyperlight)

  • Confined every interaction with a WasmSandbox to its creating OS thread
    via a per-entry single-worker ThreadPoolExecutor (_SandboxWorker).
    This eliminates the PyO3 unsendable panic when the registry is touched
    from arbitrary asyncio worker threads. _SandboxRegistry.close() shuts
    the workers down deterministically.
  • The sandbox host callback now invokes managed tools with
    FunctionTool.invoke(..., result_parser=SKIP_PARSING) and returns the raw
    Python object, so dict / list / primitive results round-trip natively into
    the WASM guest instead of going through repr()literal_eval.
  • Switched the execute_code input schema from a Pydantic BaseModel to a
    plain JSON-schema dict (EXECUTE_CODE_INPUT_SCHEMA), which goes through
    FunctionTool's _schema_supplied fast path and removes the pydantic
    import from this module.
  • Split the previously-shared EXECUTE_CODE_INPUT_DESCRIPTION into two
    distinct constants: EXECUTE_CODE_TOOL_DESCRIPTION (tool-level fallback)
    and EXECUTE_CODE_PARAM_DESCRIPTION (parameter-level for code).
  • Collapsed the internal _StoredFileMount dataclass into the public
    FileMount NamedTuple — they had the same shape, the second type was
    redundant.
  • Made _SandboxRegistry formally inherit from the SandboxRuntime
    Protocol so the contract is enforced by type checkers.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings April 22, 2026 10:33
@moonbox3 moonbox3 added documentation Improvements or additions to documentation python labels Apr 22, 2026
@github-actions github-actions Bot changed the title Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup Apr 22, 2026
@eavanvalkenburg eavanvalkenburg force-pushed the hyperlight_perf_improvement branch from 3e20ac7 to 21ec19b Compare April 22, 2026 10:35
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 22, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _tools.py9818391%219–220, 395, 397, 410, 435–437, 445, 463, 477, 484, 491, 514, 516, 523, 531, 650, 690–692, 694, 700, 752–754, 779, 805, 809, 847–849, 853, 875, 1017–1023, 1059, 1071, 1073, 1075, 1078–1081, 1102, 1106, 1110, 1124–1126, 1467, 1576–1582, 1711, 1715, 1761, 1822–1823, 1938, 1958, 1960, 2016, 2079, 2251–2252, 2272, 2328–2329, 2467–2468, 2535, 2540, 2547
packages/hyperlight/agent_framework_hyperlight
   _execute_code_tool.py4958183%67, 133–134, 149, 151, 164, 182, 192, 217, 222, 229, 235, 243, 251–253, 255–260, 300, 305, 307, 309, 326–327, 336–339, 366–369, 375, 377, 387–388, 420–421, 424–425, 432, 460, 516–522, 600–601, 630, 641–643, 701, 737, 743–745, 774–778, 782–783, 788, 805–809, 813–814, 871–872
TOTAL29070346788% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5799 30 💤 0 ❌ 0 🔥 1m 34s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Hyperlight sandbox stability and efficiency by thread-confining WasmSandbox access and allowing host callbacks to return raw Python tool results without Content wrapping/parsing, while also tightening the execute_code tool schema/constants.

Changes:

  • Added SKIP_PARSING to FunctionTool.invoke(...) to return raw tool results (and re-exported it from agent_framework), with new core tests.
  • Refactored Hyperlight sandbox execution to route all sandbox lifecycle/run operations through a per-entry single-thread worker to satisfy PyO3 “unsendable” thread affinity.
  • Simplified/optimized Hyperlight execute_code tool schema and internal types; updated docs and added regression tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/packages/core/agent_framework/_tools.py Introduces SKIP_PARSING sentinel and updates FunctionTool.invoke behavior/typing.
python/packages/core/agent_framework/init.py Re-exports SKIP_PARSING from the public package surface.
python/packages/core/tests/core/test_tools.py Adds unit coverage for SKIP_PARSING behavior and telemetry expectations.
python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py Adds per-sandbox worker thread confinement, uses SKIP_PARSING for host callbacks, and switches execute_code schema/constants.
python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py Adds regression tests for sandbox thread affinity, registry shutdown, and native tool result passthrough.
python/packages/hyperlight/README.md Documents sandbox tool result behavior and result_parser not running on the sandbox path.
Comments suppressed due to low confidence (1)

python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py:664

  • HyperlightExecuteCodeTool creates an internal _SandboxRegistry that now owns background worker threads, but the tool doesn't expose any public teardown path to call registry.close(). In long-running processes this can leak threads/resources for the lifetime of the tool. Consider adding a public close() (and/or async context manager) on HyperlightExecuteCodeTool that delegates to the registry when it is a _SandboxRegistry (or more generally when it has a close() attribute), so callers can deterministically release workers and temp dirs.
        self._state_lock = threading.RLock()
        self._registry = _registry or _SandboxRegistry()
        self._default_approval_mode: ApprovalMode = approval_mode or "never_require"

Comment thread python/packages/core/agent_framework/_tools.py Outdated
Comment thread python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py Outdated
eavanvalkenburg and others added 2 commits April 22, 2026 12:53
… fix

- FunctionTool.invoke now takes a boolean skip_parsing flag instead of the
  SKIP_PARSING sentinel; the sentinel is still accepted as result_parser at
  construction time to opt out of parsing for every call. The two paths are
  equivalent.
- _SandboxRegistry.close now invokes any sandbox close/shutdown hook on the
  entry's own worker thread (PyO3 unsendable), then shuts the worker down,
  then cleans up the per-entry temporary directories.
- Clarified the _SandboxWorker.shutdown comment to describe the actual
  ThreadPoolExecutor.shutdown(wait=False, cancel_futures=False) semantics.
- Hyperlight host callback uses skip_parsing=True (the new flag).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After callable(configured_parser) the sentinel is already excluded; the extra
identity check tripped mypy's non-overlapping identity warning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 2 | Confidence: 95%

✗ Correctness

The _build_sandbox closure introduced in _create_entry has a critical UnboundLocalError. The original sandbox = _create_sandbox() call was removed from the enclosing scope (old line 581) but was NOT re-added at the top of _build_sandbox(). Because sandbox = _create_sandbox() appears in the except block inside _build_sandbox, Python's scoping rules mark sandbox as local to the entire function. The very first line of _build_sandbox_configure_sandbox(sandbox=sandbox, ...) — reads this local before any assignment, which will unconditionally raise UnboundLocalError at runtime. The rest of the changes (SKIP_PARSING sentinel, skip_parsing parameter, worker-thread model, input schema switch) look correct.

✗ Design Approach

The thread-affinity fix is the right general direction, and the new raw-return path in FunctionTool is a cleaner way to avoid the old repr/literal_eval round-trip. I found one design-level issue in the Hyperlight integration, though: the sandbox callback now invokes the original FunctionTool instance directly, which leaks mutable tool lifecycle state across host and in-sandbox callers.

Suggestions

  • Keep the new skip_parsing capability in FunctionTool, but apply it to a copied/wrapped tool inside _make_sandbox_callback() rather than invoking the shared FunctionTool instance directly.

Automated review by moonbox3's agents

Comment thread python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py Outdated
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 89%

✗ Correctness

The SKIP_PARSING sentinel and skip_parsing parameter in FunctionTool.invoke() are well-designed and thoroughly tested. The Hyperlight sandbox refactoring to use a dedicated _SandboxWorker per entry correctly enforces the PyO3 unsendable thread-affinity invariant, and _SandboxRegistry.close() properly tears down all per-entry resources. However, the _build_sandbox closure still contains the UnboundLocalError identified in a previously-resolved comment — the initial sandbox = _create_sandbox() call was removed from the enclosing scope but never re-added as the first statement of _build_sandbox, so the happy path always crashes because sandbox is referenced before assignment.

✗ Security Reliability

The PR adds a well-designed SKIP_PARSING sentinel and skip_parsing parameter to FunctionTool.invoke(), and refactors the Hyperlight sandbox to a single-threaded worker model honoring the PyO3 unsendable invariant. The core _tools.py changes are correct: the sentinel is a proper singleton, overload signatures are accurate, and both the observability and non-observability code paths handle skip_parsing consistently. The _SandboxWorker and _SandboxRegistry.close() implementation properly serialize sandbox access and clean up resources. However, _build_sandbox still contains the previously-identified UnboundLocalError that will crash at runtime on every call — the diff does not show the fix being applied.

✗ Design Approach

I found one blocking design issue and one API-shape concern. The thread-affinity fix itself is directionally right, but the new per-entry worker lifecycle is only implemented inside _SandboxRegistry and never owned from the public Hyperlight provider/tool flow, so the patch adds deterministic teardown in a place production callers cannot actually use. Separately, exposing SKIP_PARSING as a public FunctionTool mode makes the core tool contract inconsistent: direct callers can get raw values, but the normal agent execution path still re-normalizes those values back into Content text, so this behaves more like a caller-specific escape hatch than a coherent tool configuration.

Suggestions

  • SKIP_PARSING is being exposed as a general FunctionTool feature (_tools.py constructor/decorator changes and __init__.py export), but the main framework execution path still feeds invoke() results straight into Content.from_function_result (_tools.py:1443-1451), which stringifies any non-list[Content] result (_types.py:831-851). That means this is not really an end-to-end tool mode; it only changes behavior for specific in-process callers like Hyperlight. A cleaner design would be a caller-side raw API (invoke_raw/internal helper) or a Hyperlight-specific escape hatch, rather than overloading result_parser with backend-specific semantics.

Automated review by moonbox3's agents

@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Apr 23, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 23, 2026
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Apr 23, 2026
Merged via the queue into microsoft:main with commit 58ff4ad Apr 23, 2026
33 checks passed
moonbox3 added a commit that referenced this pull request Apr 24, 2026
* Bump Python package versions for 1.2.0 release

Released tier bumps 1.1.1 -> 1.2.0 (core, openai, foundry, root) to
reflect additive public APIs landed since 1.1.0: functional workflow API
(#4238) and FunctionTool SKIP_PARSING sentinel (#5424). All beta packages
stamped 1.0.0b260424, alpha packages 1.0.0a260424. All 26 non-core
agent-framework-core floors raised to >=1.2.0,<2. CHANGELOG consolidates
the never-tagged 1.1.1 entries with the post-merge additions into [1.2.0].

* Update CHANGELOG footer links for 1.2.0

Advance [Unreleased] comparison base from python-1.1.0 to python-1.2.0
and add a [1.2.0] reference link comparing python-1.1.0...python-1.2.0
so the heading links resolve correctly.

* Fix CHANGELOG: restore [1.1.1] section and add proper [1.2.0]

Previous commit incorrectly renamed the [1.1.1] header to [1.2.0], which
wiped the historical 1.1.1 entries and wrongly attributed them to 1.2.0.
This restores [1.1.1] to its origin/main content and adds a new [1.2.0]
section above containing only the commits in python-1.1.1..HEAD:

- #4238 functional workflow API
- #5142 GitHub Copilot OpenTelemetry
- #2403 A2A bridge support
- #5070 oauth_consent_request events in Foundry clients
- #5447 FoundryAgent hosted agent sessions
- #5459 hosting server dependency upgrade + types
- #5389 AG-UI reasoning/multimodal parsing fix
- #5440 stop [TOOLBOXES] warning spam
- #5455 user agent prefix fix

Also corrects the [1.2.0] compare base to python-1.1.1 (not 1.1.0) and
adds the missing [1.1.1] reference link.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants