Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Set Membership in TreeEnsemble #21222

Closed
wants to merge 11 commits into from

Conversation

bili2002
Copy link
Contributor

@bili2002 bili2002 commented Jul 1, 2024

Description

This PR is a first step towards implementing the new set-membership node mode for categorical splits as discussed in onnx/onnx#5851. It works with the old standard where no explicit label of a node as a set-membership operator is required and it optimizes its runtime. Currently, the implementation has a constraint of a limited number of categories (see below) which will be generalized in the future.

Motivation and Context

The implementation would first merge the chain of nodes with EQ operator. Then, create a mask for the categories included whereas the mask is stored in the threshold. As currently all nodes from the categorical split chain point to the same true node, the merge would preserve the same true node. E.g. to represent that the category 2 is included in the chain, we would set the second bit of the threshold to true. This comes with the constraint of having a limited number of categories. To solve this, I chose not to merge the nodes with EQ operator with a larger category than the threshold type size.

Results

The benchmarks were done through lleaves by building a wheel for onnxruntime and installing it into the environment. Here are the results of the average of 10 runs using an M3 Pro with 36GB of RAM whereas the MTPL2 data set makes heavy use of categorical features:

MTPL2

Batchsize nthread Setup & Runtime Old solution New solution
10000 1 Setup 0.7s 0.7s
Runtime 88.69ms 68.26ms
100000 1 Setup 0.7s 0.7s
Runtime 918.14ms 707.61ms
1000000 1 Setup 0.7s 0.7s
Runtime 6265.22ms 4827.94ms
10000 4 Setup 0.69s 0.7s
Runtime 28.81ms 22.36ms
100000 4 Setup 0.69s 0.7s
Runtime 304.50ms 237.56ms
1000000 4 Setup 0.69s 0.7s
Runtime 2081.54ms 1624.14ms
10000 8 Setup 0.72s 0.75s
Runtime 21.99ms 15.48ms
100000 8 Setup 0.72s 0.75s
Runtime 237.61ms 171.48ms
1000000 8 Setup 0.72s 0.75s
Runtime 1640.76ms 1200.88ms

@bili2002
Copy link
Contributor Author

bili2002 commented Jul 1, 2024

@microsoft-github-policy-service agree company="QuantCo"

@bili2002 bili2002 marked this pull request as ready for review July 8, 2024 09:07
@bili2002 bili2002 changed the title Implementation of set membership Implementation of Set Membership in TreeEnsemble Jul 24, 2024
@bili2002 bili2002 changed the title Implementation of Set Membership in TreeEnsemble Implementation of Set Membership for Categorical splits in TreeEnsemble Jul 24, 2024
@bili2002 bili2002 changed the title Implementation of Set Membership for Categorical splits in TreeEnsemble Implementation of Set Membership in TreeEnsemble Jul 24, 2024
@bili2002
Copy link
Contributor Author

@xadupre Could you take a look, as you seem involved in this part of the codebase?

@cbourjau
Copy link
Contributor

@xadupre @skottmckay might one of you find a movement to take a look at this PR and/or to trigger the CI?

@xadupre
Copy link
Member

xadupre commented Aug 14, 2024

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Windows CPU CI Pipeline, Windows GPU CUDA CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@xadupre
Copy link
Member

xadupre commented Aug 14, 2024

Sorry, I missed it. Is it possible to add some comments somewhere in the code for explain the logic used to handle BRANCH_SM nodes?

@bili2002
Copy link
Contributor Author

Hey @xadupre, could you trigger the CI?

@cbourjau
Copy link
Contributor

Is there anything missing from this? I'm sorry @xadupre , I don't mean to spam you!

xadupre
xadupre previously approved these changes Sep 17, 2024
@xadupre
Copy link
Member

xadupre commented Sep 17, 2024

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Windows CPU CI Pipeline, Windows GPU CUDA CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@xadupre
Copy link
Member

xadupre commented Sep 17, 2024

/azp run Big Models, Linux Android Emulator QNN CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@xadupre
Copy link
Member

xadupre commented Sep 17, 2024

/azp run Windows ARM64 QNN CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline

@xadupre
Copy link
Member

xadupre commented Sep 17, 2024

/azp run onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@xadupre
Copy link
Member

xadupre commented Oct 4, 2024

It seems test TreeEnsembleSetMembership is failing. Since it is implemented in another PR, maybe you can disable it or just merge the PR into one.

@bili2002
Copy link
Contributor Author

bili2002 commented Oct 7, 2024

The PR-s should work independently so it would be harder to debug if I were to merge them. Can you re-run the pipeline for both PR-s?

@xadupre
Copy link
Member

xadupre commented Nov 5, 2024

Moved to #22333.

@xadupre xadupre closed this Nov 5, 2024
xadupre added a commit that referenced this pull request Nov 22, 2024
### Description
Merges PR #21851, #21222.

Implements TreeEnsemble from ai.onnx.ml==5 (CPU).

---------

Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Christian Bourjau <[email protected]>
guschmue pushed a commit that referenced this pull request Dec 2, 2024
### Description
Merges PR #21851, #21222.

Implements TreeEnsemble from ai.onnx.ml==5 (CPU).

---------

Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Christian Bourjau <[email protected]>
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
Merges PR microsoft#21851, microsoft#21222.

Implements TreeEnsemble from ai.onnx.ml==5 (CPU).

---------

Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Christian Bourjau <[email protected]>
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
Merges PR microsoft#21851, microsoft#21222.

Implements TreeEnsemble from ai.onnx.ml==5 (CPU).

---------

Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Bilyana Indzheva <[email protected]>
Co-authored-by: Christian Bourjau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants