Skip to content

ORT 1.24.3 release cherry pick round 3#27501

Merged
tianleiwu merged 2 commits intorel-1.24.3from
tlwu/rel-1.24.3_cherrypikc_round3
Mar 2, 2026
Merged

ORT 1.24.3 release cherry pick round 3#27501
tianleiwu merged 2 commits intorel-1.24.3from
tlwu/rel-1.24.3_cherrypikc_round3

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu commented Feb 28, 2026

This cherry-picks the following commits for the release:

Commit ID PR Number Commit Title
6e72d31 #27295 Remove s_kernel_registry_vitisaiep.reset() in deinitialize_vitisai_ep()
41f9056 #27391 Fix O(n²) model load time for TreeEnsemble with categorical feature chains

…() (#27295)

Remove unnecessary s_kernel_registry_vitisaiep.reset() call in
deinitialize_vitisai_ep() function. The kernel registry will be
repopulated on next initialization, making this reset redundant.
…hains (#27391)

### Description
Profiling shows that CheckIfSubtreesAreEqual is invoked recursively for
many node pairs for LightGBM models with categorical features. A
significant portion of this work consists of self-comparisons (left_id
== right_id), leading to effectively O(n²) comparing trees to themselves
during model loading.

This change adds a fast-path for trivial equality, avoiding unnecessary
recursive comparisons.

Example results:
 - model with 7K BRANCH_EQ nodes: 527 ms → 47 ms (~11× faster)
 - model with 106K BRANCH_EQ nodes: 141 s → 80 ms (~1760× faster)

### Motivation and Context
We have some LightGBM exported models that make heavy use of categorical
features and exhibit extremely slow load times (minutes for a single
2.5mb model).

Heres a diagram to illustrate the issue:
<img width="1008" height="1229" alt="image"
src="https://github.com/user-attachments/assets/348e16cb-9eec-448f-ac5c-e1edb60e2a3d"
/>

the 106K model has much longer "member of" chains, with chains that lead
into more chains:

<details>
  <summary>"trees"</summary>
  
<img width="1405" height="593" alt="image"
src="https://github.com/user-attachments/assets/12f0c43f-5987-4b33-9001-2a2b526e537f"
/>

  
</details>

Interestingly we did also try using the new onnx.ml opset 5 node that
has MEMBER, but it seems even slower as it recreates these branch EQ
chains.
@tianleiwu tianleiwu merged commit dd6a854 into rel-1.24.3 Mar 2, 2026
79 of 81 checks passed
@tianleiwu tianleiwu deleted the tlwu/rel-1.24.3_cherrypikc_round3 branch March 2, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants