Skip to content

Fix O(n²) model load time for TreeEnsemble with categorical feature chains#27391

Merged
tianleiwu merged 1 commit intomicrosoft:mainfrom
vektah:trivial-subtree-equality
Mar 1, 2026
Merged

Fix O(n²) model load time for TreeEnsemble with categorical feature chains#27391
tianleiwu merged 1 commit intomicrosoft:mainfrom
vektah:trivial-subtree-equality

Conversation

@vektah
Copy link
Copy Markdown
Contributor

@vektah vektah commented Feb 19, 2026

Description

Profiling shows that CheckIfSubtreesAreEqual is invoked recursively for many node pairs for LightGBM models with categorical features. A significant portion of this work consists of self-comparisons (left_id == right_id), leading to effectively O(n²) comparing trees to themselves during model loading.

This change adds a fast-path for trivial equality, avoiding unnecessary recursive comparisons.

Example results:

  • model with 7K BRANCH_EQ nodes: 527 ms → 47 ms (~11× faster)
  • model with 106K BRANCH_EQ nodes: 141 s → 80 ms (~1760× faster)

Motivation and Context

We have some LightGBM exported models that make heavy use of categorical features and exhibit extremely slow load times (minutes for a single 2.5mb model).

Heres a diagram to illustrate the issue:
image

the 106K model has much longer "member of" chains, with chains that lead into more chains:

"trees" image

Interestingly we did also try using the new onnx.ml opset 5 node that has MEMBER, but it seems even slower as it recreates these branch EQ chains.

Some LightGBM-exported models that make heavy use of categorical features exhibit extremely slow load times (minutes for a single 2.5mb model).

Profiling shows that CheckIfSubtreesAreEqual is invoked recursively for many node pairs. A significant portion of this work consists of self-comparisons (left_id == right_id), leading to effectively O(n²) behavior during model loading.

This change adds a fast-path for the trivial equality, avoiding unnecessary recursive comparisons.

Example results (test models)
 - 7K BRANCH_EQ nodes: 527 ms → 47 ms (~11× faster)
 - 106K BRANCH_EQ nodes: 141 s → 80 ms (~1760× faster)
@vektah
Copy link
Copy Markdown
Contributor Author

vektah commented Feb 20, 2026

@xadupre could you take a look 🙏🏻

@cbourjau
Copy link
Copy Markdown
Contributor

Interestingly we did also try using the new onnx.ml opset 5 node that has MEMBER, but it seems even slower as it recreates these branch EQ chains.

Thanks so much for looking into this! IIRC, the current v5 reuses large parts of the v3 implementation. I think one could make a good case for eventually rewriting this operator from scratch at some point 🙈.

@cbourjau
Copy link
Copy Markdown
Contributor

cbourjau commented Mar 1, 2026

Can this bei merged and possibly be included in 1.24.3 via #27501 ?

@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu enabled auto-merge (squash) March 1, 2026 19:11
@tianleiwu tianleiwu merged commit 41f9056 into microsoft:main Mar 1, 2026
88 checks passed
tianleiwu pushed a commit that referenced this pull request Mar 2, 2026
…hains (#27391)

### Description
Profiling shows that CheckIfSubtreesAreEqual is invoked recursively for
many node pairs for LightGBM models with categorical features. A
significant portion of this work consists of self-comparisons (left_id
== right_id), leading to effectively O(n²) comparing trees to themselves
during model loading.

This change adds a fast-path for trivial equality, avoiding unnecessary
recursive comparisons.

Example results:
 - model with 7K BRANCH_EQ nodes: 527 ms → 47 ms (~11× faster)
 - model with 106K BRANCH_EQ nodes: 141 s → 80 ms (~1760× faster)

### Motivation and Context
We have some LightGBM exported models that make heavy use of categorical
features and exhibit extremely slow load times (minutes for a single
2.5mb model).

Heres a diagram to illustrate the issue:
<img width="1008" height="1229" alt="image"
src="https://github.com/user-attachments/assets/348e16cb-9eec-448f-ac5c-e1edb60e2a3d"
/>

the 106K model has much longer "member of" chains, with chains that lead
into more chains:

<details>
  <summary>"trees"</summary>
  
<img width="1405" height="593" alt="image"
src="https://github.com/user-attachments/assets/12f0c43f-5987-4b33-9001-2a2b526e537f"
/>

  
</details>

Interestingly we did also try using the new onnx.ml opset 5 node that
has MEMBER, but it seems even slower as it recreates these branch EQ
chains.
tianleiwu added a commit that referenced this pull request Mar 2, 2026
This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| 6e72d31 | #27295 | Remove s_kernel_registry_vitisaiep.reset() in
deinitialize_vitisai_ep() |
| 41f9056 | #27391 | Fix O(n²) model load time for TreeEnsemble with
categorical feature chains |

---------

Co-authored-by: zz002 <zhenzew@amd.com>
Co-authored-by: Adam Scarr <adam@vektah.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants