Skip to content

fix: CoreML EP crash when loading model with external data file#28043

Closed
Scottcjn wants to merge 1 commit intomicrosoft:mainfrom
Scottcjn:fix/coreml-external-data-path
Closed

fix: CoreML EP crash when loading model with external data file#28043
Scottcjn wants to merge 1 commit intomicrosoft:mainfrom
Scottcjn:fix/coreml-external-data-path

Conversation

@Scottcjn
Copy link
Copy Markdown

Fixes #28005

CoreML EP crashes with SIGBUS when loading models that use external data files (.onnx_data). CPU EP works fine which is what tipped me off that it was provider-specific.

The bug is in TensorProtoWithExternalDataToTensorProto() -- it passes model_path straight to ReadExternalDataForTensor() but model_path is the full file path like /path/to/model.onnx not the directory. So GetExternalDataInfo() builds a garbage path like /path/to/model.onnx/model.onnx_data.

Fix is model_path.parent_path() instead. Same thing UnpackInitializerData() already does at line ~2572. Only affects CoreML EP since thats the only caller passing a non-empty path here.

- ORT_RETURN_IF_ERROR(ReadExternalDataForTensor(ten_proto, model_path, unpacked_data));
+ ORT_RETURN_IF_ERROR(ReadExternalDataForTensor(ten_proto, model_path.parent_path(), unpacked_data));

…path

TensorProtoWithExternalDataToTensorProto() passes model_path (full file
path like "/path/to/model.onnx") directly to ReadExternalDataForTensor(),
which expects a directory. This causes GetExternalDataInfo() to construct
invalid paths like "/path/to/model.onnx/model.onnx_data" instead of
"/path/to/model.onnx_data".

The bug only affects the CoreML EP because it's the only production caller
that passes a non-empty model_path. CPU EP uses UnpackInitializerData()
which correctly calls model_path.parent_path().

Fix: use model_path.parent_path(), matching the established pattern at
line ~2572.

Fixes microsoft#28005

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tianleiwu tianleiwu requested a review from Copilot April 14, 2026 03:10
@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a CoreML EP crash when loading ONNX models that reference external data files by correcting the base path passed to external-data loading utilities.

Changes:

  • Pass model_path.parent_path() (model directory) to ReadExternalDataForTensor() instead of the full model file path.
  • Add explanatory comments clarifying why the parent directory is required.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +362 to +363
// ReadExternalDataForTensor expects the parent directory, matching the
// pattern used by UnpackInitializerData() at line ~2572.
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new comment hard-codes a reference to a specific line number in UnpackInitializerData ("line ~2572"). Line-number references tend to go stale quickly and make the comment misleading over time. Consider rewording to reference the function/behavior instead (e.g., that UnpackInitializerData passes model_path.parent_path()) without citing a line number.

Suggested change
// ReadExternalDataForTensor expects the parent directory, matching the
// pattern used by UnpackInitializerData() at line ~2572.
// ReadExternalDataForTensor expects the parent directory, matching how
// UnpackInitializerData() passes model_path.parent_path().

Copilot uses AI. Check for mistakes.
Comment on lines +360 to +365
// Load the external data into memory.
// model_path is the full model file path (e.g., "/path/to/model.onnx").
// ReadExternalDataForTensor expects the parent directory, matching the
// pattern used by UnpackInitializerData() at line ~2572.
std::vector<uint8_t> unpacked_data;
ORT_RETURN_IF_ERROR(ReadExternalDataForTensor(ten_proto, model_path, unpacked_data));
ORT_RETURN_IF_ERROR(ReadExternalDataForTensor(ten_proto, model_path.parent_path(), unpacked_data));
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix changes how external-data paths are resolved for TensorProtoWithExternalDataToTensorProto when a model is loaded from a file path, but there doesn't appear to be a unit test covering this specific scenario (external data in file + model_path points to the .onnx file, not a directory). Please add a regression test that creates a temp directory containing a dummy model path and an external data file, then verifies TensorProtoWithExternalDataToTensorProto successfully inlines the external data when given the model file path.

Copilot uses AI. Check for mistakes.
@tianleiwu
Copy link
Copy Markdown
Contributor

@copilot apply changes based on the comments in this thread

@tianleiwu
Copy link
Copy Markdown
Contributor

@Scottcjn, thanks for providing the fix.

There is a similar fix in #28062. That looks better since it has regression test.

@Scottcjn
Copy link
Copy Markdown
Author

Closing — this is superseded by upstream's defensive fix in tensorprotoutils.cc, which handles the same bug more safely than my approach.

Upstream's version (now on main):

std::filesystem::path external_data_path = model_path;
std::error_code ec;
if (std::filesystem::is_regular_file(model_path, ec)) {
  external_data_path = model_path.parent_path();
} else if (ec) {
  ec.clear();
}
ORT_RETURN_IF_ERROR(ReadExternalDataForTensor(ten_proto, external_data_path, unpacked_data));

That covers the CoreML EP crash I was fixing (file path → parent_path()) and preserves backward compatibility for callers that already pass a directory — my simpler model_path.parent_path() would have broken the latter. Nothing to contribute; upstream did it right.

Thanks for the review cycles.

@Scottcjn Scottcjn closed this Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CoreML EP: SIGBUS crash when loading model with external data file via CreateSession

3 participants