-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Set Membership in TreeEnsemble #21222
Conversation
@microsoft-github-policy-service agree company="QuantCo" |
@xadupre Could you take a look, as you seem involved in this part of the codebase? |
@xadupre @skottmckay might one of you find a movement to take a look at this PR and/or to trigger the CI? |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Windows CPU CI Pipeline, Windows GPU CUDA CI Pipeline |
Azure Pipelines successfully started running 5 pipeline(s). |
Sorry, I missed it. Is it possible to add some comments somewhere in the code for explain the logic used to handle BRANCH_SM nodes? |
Hey @xadupre, could you trigger the CI? |
Is there anything missing from this? I'm sorry @xadupre , I don't mean to spam you! |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Windows CPU CI Pipeline, Windows GPU CUDA CI Pipeline |
Azure Pipelines successfully started running 5 pipeline(s). |
/azp run Big Models, Linux Android Emulator QNN CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
/azp run Windows ARM64 QNN CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline |
/azp run onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline |
Azure Pipelines successfully started running 5 pipeline(s). |
Azure Pipelines successfully started running 3 pipeline(s). |
It seems test TreeEnsembleSetMembership is failing. Since it is implemented in another PR, maybe you can disable it or just merge the PR into one. |
The PR-s should work independently so it would be harder to debug if I were to merge them. Can you re-run the pipeline for both PR-s? |
Moved to #22333. |
### Description Merges PR #21851, #21222. Implements TreeEnsemble from ai.onnx.ml==5 (CPU). --------- Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Christian Bourjau <[email protected]>
### Description Merges PR #21851, #21222. Implements TreeEnsemble from ai.onnx.ml==5 (CPU). --------- Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Christian Bourjau <[email protected]>
### Description Merges PR microsoft#21851, microsoft#21222. Implements TreeEnsemble from ai.onnx.ml==5 (CPU). --------- Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Christian Bourjau <[email protected]>
### Description Merges PR microsoft#21851, microsoft#21222. Implements TreeEnsemble from ai.onnx.ml==5 (CPU). --------- Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Bilyana Indzheva <[email protected]> Co-authored-by: Christian Bourjau <[email protected]>
Description
This PR is a first step towards implementing the new
set-membership
node mode forcategorical splits
as discussed in onnx/onnx#5851. It works with the old standard where no explicit label of a node as aset-membership
operator is required and it optimizes its runtime. Currently, the implementation has a constraint of a limited number of categories (see below) which will be generalized in the future.Motivation and Context
The implementation would first merge the chain of nodes with
EQ
operator. Then, create a mask for the categories included whereas the mask is stored in the threshold. As currently all nodes from the categorical split chain point to the same true node, the merge would preserve the same true node. E.g. to represent that the category 2 is included in the chain, we would set the second bit of the threshold to true. This comes with the constraint of having a limited number of categories. To solve this, I chose not to merge the nodes withEQ
operator with a larger category than the threshold type size.Results
The benchmarks were done through lleaves by building a wheel for onnxruntime and installing it into the environment. Here are the results of the average of 10 runs using an M3 Pro with 36GB of RAM whereas the MTPL2 data set makes heavy use of categorical features:
MTPL2