Python: Hyperlight: thread-confine sandbox, skip parsing on host callbacks, schema/tool cleanup#5424
Conversation
3e20ac7 to
21ec19b
Compare
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR improves Hyperlight sandbox stability and efficiency by thread-confining WasmSandbox access and allowing host callbacks to return raw Python tool results without Content wrapping/parsing, while also tightening the execute_code tool schema/constants.
Changes:
- Added
SKIP_PARSINGtoFunctionTool.invoke(...)to return raw tool results (and re-exported it fromagent_framework), with new core tests. - Refactored Hyperlight sandbox execution to route all sandbox lifecycle/run operations through a per-entry single-thread worker to satisfy PyO3 “unsendable” thread affinity.
- Simplified/optimized Hyperlight execute_code tool schema and internal types; updated docs and added regression tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| python/packages/core/agent_framework/_tools.py | Introduces SKIP_PARSING sentinel and updates FunctionTool.invoke behavior/typing. |
| python/packages/core/agent_framework/init.py | Re-exports SKIP_PARSING from the public package surface. |
| python/packages/core/tests/core/test_tools.py | Adds unit coverage for SKIP_PARSING behavior and telemetry expectations. |
| python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py | Adds per-sandbox worker thread confinement, uses SKIP_PARSING for host callbacks, and switches execute_code schema/constants. |
| python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py | Adds regression tests for sandbox thread affinity, registry shutdown, and native tool result passthrough. |
| python/packages/hyperlight/README.md | Documents sandbox tool result behavior and result_parser not running on the sandbox path. |
Comments suppressed due to low confidence (1)
python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py:664
HyperlightExecuteCodeToolcreates an internal_SandboxRegistrythat now owns background worker threads, but the tool doesn't expose any public teardown path to callregistry.close(). In long-running processes this can leak threads/resources for the lifetime of the tool. Consider adding a publicclose()(and/or async context manager) onHyperlightExecuteCodeToolthat delegates to the registry when it is a_SandboxRegistry(or more generally when it has aclose()attribute), so callers can deterministically release workers and temp dirs.
self._state_lock = threading.RLock()
self._registry = _registry or _SandboxRegistry()
self._default_approval_mode: ApprovalMode = approval_mode or "never_require"
… fix - FunctionTool.invoke now takes a boolean skip_parsing flag instead of the SKIP_PARSING sentinel; the sentinel is still accepted as result_parser at construction time to opt out of parsing for every call. The two paths are equivalent. - _SandboxRegistry.close now invokes any sandbox close/shutdown hook on the entry's own worker thread (PyO3 unsendable), then shuts the worker down, then cleans up the per-entry temporary directories. - Clarified the _SandboxWorker.shutdown comment to describe the actual ThreadPoolExecutor.shutdown(wait=False, cancel_futures=False) semantics. - Hyperlight host callback uses skip_parsing=True (the new flag). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After callable(configured_parser) the sentinel is already excluded; the extra identity check tripped mypy's non-overlapping identity warning. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Automated Code Review
Reviewers: 2 | Confidence: 95%
✗ Correctness
The
_build_sandboxclosure introduced in_create_entryhas a criticalUnboundLocalError. The originalsandbox = _create_sandbox()call was removed from the enclosing scope (old line 581) but was NOT re-added at the top of_build_sandbox(). Becausesandbox = _create_sandbox()appears in theexceptblock inside_build_sandbox, Python's scoping rules marksandboxas local to the entire function. The very first line of_build_sandbox—_configure_sandbox(sandbox=sandbox, ...)— reads this local before any assignment, which will unconditionally raiseUnboundLocalErrorat runtime. The rest of the changes (SKIP_PARSING sentinel,skip_parsingparameter, worker-thread model, input schema switch) look correct.
✗ Design Approach
The thread-affinity fix is the right general direction, and the new raw-return path in
FunctionToolis a cleaner way to avoid the oldrepr/literal_evalround-trip. I found one design-level issue in the Hyperlight integration, though: the sandbox callback now invokes the originalFunctionToolinstance directly, which leaks mutable tool lifecycle state across host and in-sandbox callers.
Suggestions
- Keep the new
skip_parsingcapability inFunctionTool, but apply it to a copied/wrapped tool inside_make_sandbox_callback()rather than invoking the sharedFunctionToolinstance directly.
Automated review by moonbox3's agents
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 89%
✗ Correctness
The
SKIP_PARSINGsentinel andskip_parsingparameter inFunctionTool.invoke()are well-designed and thoroughly tested. The Hyperlight sandbox refactoring to use a dedicated_SandboxWorkerper entry correctly enforces the PyO3unsendablethread-affinity invariant, and_SandboxRegistry.close()properly tears down all per-entry resources. However, the_build_sandboxclosure still contains theUnboundLocalErroridentified in a previously-resolved comment — the initialsandbox = _create_sandbox()call was removed from the enclosing scope but never re-added as the first statement of_build_sandbox, so the happy path always crashes becausesandboxis referenced before assignment.
✗ Security Reliability
The PR adds a well-designed SKIP_PARSING sentinel and skip_parsing parameter to FunctionTool.invoke(), and refactors the Hyperlight sandbox to a single-threaded worker model honoring the PyO3 unsendable invariant. The core _tools.py changes are correct: the sentinel is a proper singleton, overload signatures are accurate, and both the observability and non-observability code paths handle skip_parsing consistently. The _SandboxWorker and _SandboxRegistry.close() implementation properly serialize sandbox access and clean up resources. However, _build_sandbox still contains the previously-identified UnboundLocalError that will crash at runtime on every call — the diff does not show the fix being applied.
✗ Design Approach
I found one blocking design issue and one API-shape concern. The thread-affinity fix itself is directionally right, but the new per-entry worker lifecycle is only implemented inside
_SandboxRegistryand never owned from the public Hyperlight provider/tool flow, so the patch adds deterministic teardown in a place production callers cannot actually use. Separately, exposingSKIP_PARSINGas a publicFunctionToolmode makes the core tool contract inconsistent: direct callers can get raw values, but the normal agent execution path still re-normalizes those values back intoContenttext, so this behaves more like a caller-specific escape hatch than a coherent tool configuration.
Suggestions
SKIP_PARSINGis being exposed as a generalFunctionToolfeature (_tools.pyconstructor/decorator changes and__init__.pyexport), but the main framework execution path still feedsinvoke()results straight intoContent.from_function_result(_tools.py:1443-1451), which stringifies any non-list[Content]result (_types.py:831-851). That means this is not really an end-to-end tool mode; it only changes behavior for specific in-process callers like Hyperlight. A cleaner design would be a caller-side raw API (invoke_raw/internal helper) or a Hyperlight-specific escape hatch, rather than overloadingresult_parserwith backend-specific semantics.
Automated review by moonbox3's agents
* Bump Python package versions for 1.2.0 release Released tier bumps 1.1.1 -> 1.2.0 (core, openai, foundry, root) to reflect additive public APIs landed since 1.1.0: functional workflow API (#4238) and FunctionTool SKIP_PARSING sentinel (#5424). All beta packages stamped 1.0.0b260424, alpha packages 1.0.0a260424. All 26 non-core agent-framework-core floors raised to >=1.2.0,<2. CHANGELOG consolidates the never-tagged 1.1.1 entries with the post-merge additions into [1.2.0]. * Update CHANGELOG footer links for 1.2.0 Advance [Unreleased] comparison base from python-1.1.0 to python-1.2.0 and add a [1.2.0] reference link comparing python-1.1.0...python-1.2.0 so the heading links resolve correctly. * Fix CHANGELOG: restore [1.1.1] section and add proper [1.2.0] Previous commit incorrectly renamed the [1.1.1] header to [1.2.0], which wiped the historical 1.1.1 entries and wrongly attributed them to 1.2.0. This restores [1.1.1] to its origin/main content and adds a new [1.2.0] section above containing only the commits in python-1.1.1..HEAD: - #4238 functional workflow API - #5142 GitHub Copilot OpenTelemetry - #2403 A2A bridge support - #5070 oauth_consent_request events in Foundry clients - #5447 FoundryAgent hosted agent sessions - #5459 hosting server dependency upgrade + types - #5389 AG-UI reasoning/multimodal parsing fix - #5440 stop [TOOLBOXES] warning spam - #5455 user agent prefix fix Also corrects the [1.2.0] compare base to python-1.1.1 (not 1.1.0) and adds the missing [1.1.1] reference link.
Motivation and Context
While running the hyperlight CodeAct sample I hit two issues that hurt the
sandbox path:
pyo3_runtime.PanicException: ... is unsendable, but sent to another thread!from the WASM
Sandboxbeing touched outside its creating OS thread.repr()→Content→ast.literal_eval, which is wasteful, lossy for non-literaltypes, and a needless surface area on the sandbox boundary.
While in there I also tightened the public API of
HyperlightExecuteCodeTooland the internal class layout per a small designreview.
Description
Core (
agent_framework)SKIP_PARSINGsentinel that callers can pass toFunctionTool.invoke(..., result_parser=SKIP_PARSING)to bypassContent-wrapping and return the raw function result. Default behavior isunchanged. The signature uses
@overloads so the return type isAnyonthe sentinel path and
list[Content]otherwise. Telemetry still records atool.resultvalue on both paths (str(result)on the sentinel path,parsed text otherwise).
SKIP_PARSINGfromagent_framework.Hyperlight (
agent_framework_hyperlight)WasmSandboxto its creating OS threadvia a per-entry single-worker
ThreadPoolExecutor(_SandboxWorker).This eliminates the PyO3
unsendablepanic when the registry is touchedfrom arbitrary asyncio worker threads.
_SandboxRegistry.close()shutsthe workers down deterministically.
FunctionTool.invoke(..., result_parser=SKIP_PARSING)and returns the rawPython object, so dict / list / primitive results round-trip natively into
the WASM guest instead of going through
repr()↔literal_eval.execute_codeinput schema from a PydanticBaseModelto aplain JSON-schema dict (
EXECUTE_CODE_INPUT_SCHEMA), which goes throughFunctionTool's_schema_suppliedfast path and removes the pydanticimport from this module.
EXECUTE_CODE_INPUT_DESCRIPTIONinto twodistinct constants:
EXECUTE_CODE_TOOL_DESCRIPTION(tool-level fallback)and
EXECUTE_CODE_PARAM_DESCRIPTION(parameter-level forcode)._StoredFileMountdataclass into the publicFileMountNamedTuple — they had the same shape, the second type wasredundant.
_SandboxRegistryformally inherit from theSandboxRuntimeProtocol so the contract is enforced by type checkers.
Contribution Checklist