Fix refcount bug in map input conversion that caused shutdown segfault#27413
Merged
Fix refcount bug in map input conversion that caused shutdown segfault#27413
Conversation
…xruntime_test_python_mlops.py`
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical Python reference counting bug in the map input conversion code that could cause segmentation faults at interpreter shutdown. The bug manifested when error paths in CreateMapMLValue_LoopIntoMap unconditionally decremented references that were sometimes borrowed (should not be decremented) and sometimes owned (should be decremented).
Changes:
- Added explicit reference ownership tracking through a boolean parameter
owns_item_refinCreateMapMLValue_LoopIntoMap - Conditionally decrement
itemreference only when owned on error paths - Updated both caller sites to pass correct ownership flags (false for borrowed refs, true for owned refs)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
xadupre
approved these changes
Feb 23, 2026
tianleiwu
added a commit
that referenced
this pull request
Feb 26, 2026
#27413) Fix Python refcount bug in map input conversion that caused shutdown segfault in `onnxruntime_test_python_mlops.py` ( see #27392). ## Summary This PR fixes a Python reference-count ownership bug in the map conversion path in `onnxruntime/python/onnxruntime_pybind_mlvalue.cc`. In Python 3.14 test runs, the bug could manifest as a segmentation fault after tests completed (typically at interpreter shutdown), even when test assertions passed. . ## Root Cause In `CreateMapMLValue_LoopIntoMap`, error paths decremented `item` unconditionally. - In single-map flow, `item` is a **borrowed reference** (must not be decref'd there). - In iterator/vector-map flow, `item` is an **owned reference** (must be decref'd). The unconditional decref in borrowed-reference flow caused refcount corruption and eventually a crash. ## Fix Add explicit ownership handling for `item`: - `CreateMapMLValue_LoopIntoMap(..., bool owns_item_ref, ...)` - Pass `owns_item_ref = false` from single-map path (`CreateMapMLValue_Map`) - Pass `owns_item_ref = true` from vector-map path (`CreateMapMLValue_VectorMap`) - Only `Py_XDECREF(item)` on error when `owns_item_ref` is true This preserves existing behavior while correcting reference ownership. ## Validation ```bash cd onnxruntime/test/python python onnxruntime_test_python_mlops.py ``` Result: - `OK` - Exit code `0` (no shutdown segfault) ## Notes - Although this became reproducible in Python 3.14, the underlying refcount bug is version-agnostic C-extension undefined behavior.
tianleiwu
added a commit
that referenced
this pull request
Feb 27, 2026
This cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | decd177 | #27090 | Fix GatherND division by zero when batch dimensions mismatch | | 55f8234 | #27360 | Fix QMoE CPU Operator | | df9146f | #27403 | [MLAS] Adding DynamicQGemm function pointers and ukernel interface | | 0f93853 | #27318 | [js/web] Use embedded WASM module in Blob URL workers when wasmBinary is provided | | b2a6e69 | #27364 | QMoE CPU Performance Update (Up to 4x on 4-bit) | | f501e1d | #27413 | Fix refcount bug in map input conversion that caused shutdown segfault | | b32b205 | #27421 | Fix error where bytes is not assigned for dynamic qgemm pack b size | | 426b006 | #27397 | Fix DllImportResolver | | 0982844 | #27412 | MatmulNBits prepacking scales fix | | 9afb0d2 | #27430 | Fix validation for external data paths for models loaded from bytes | | 71d2cd0 | #27401 | Enable Python 3.14 CI and Upgrade Dependencies | | 79e0676 | #27419 | fix: out of bounds access for resize operation | | 82eb99c | #27459 | Fix SkipLayerNorm fusion incorrectly applied when gamma/beta are not 1D | | 355278a | #27444 | Fix GatherCopyData Integer Truncation Leading to Heap Out-of-Bounds Read/Write | | cf96123 | #27411 | [web] fix usage of wasmBinary together with a blob URL for .mjs | | 1131a86 | #27399 | [web] remove the unhelpful "Unknown CPU vendor" warning. | | ffbbc4f | #27316 | Build Windows ARM64X binaries as part of packaging pipeline | --------- Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com> Co-authored-by: patryk-kaiser-ARM <patryk.kaiser@arm.com> Co-authored-by: don <70039285+0-don@users.noreply.github.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Lukas Folle <126877803+lukas-folle-snkeos@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Chaya <cha182350@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Erik <erscor@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
This was referenced Mar 9, 2026
This was referenced Mar 17, 2026
This was referenced Mar 30, 2026
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix Python refcount bug in map input conversion that caused shutdown segfault in
onnxruntime_test_python_mlops.py( see #27392).Summary
This PR fixes a Python reference-count ownership bug in the map conversion path in
onnxruntime/python/onnxruntime_pybind_mlvalue.cc.In Python 3.14 test runs, the bug could manifest as a segmentation fault after tests completed (typically at interpreter shutdown), even when test assertions passed. .
Root Cause
In
CreateMapMLValue_LoopIntoMap, error paths decrementeditemunconditionally.itemis a borrowed reference (must not be decref'd there).itemis an owned reference (must be decref'd).The unconditional decref in borrowed-reference flow caused refcount corruption and eventually a crash.
Fix
Add explicit ownership handling for
item:CreateMapMLValue_LoopIntoMap(..., bool owns_item_ref, ...)owns_item_ref = falsefrom single-map path (CreateMapMLValue_Map)owns_item_ref = truefrom vector-map path (CreateMapMLValue_VectorMap)Py_XDECREF(item)on error whenowns_item_refis trueThis preserves existing behavior while correcting reference ownership.
Validation
cd onnxruntime/test/python python onnxruntime_test_python_mlops.pyResult:
OK0(no shutdown segfault)Notes