Added slot names support for OnnxTransformer by harishsk · Pull Request #4857 · dotnet/machinelearning

harishsk · 2020-02-19T18:59:57Z

This PR adds support for persisting the SlotNames annotations of a column during onnx export and reading those back in OnnxTransformer and adding the annotations back to the column when the onnx model is read from disk.

Onnx natively does not have support for annotations. To work around this, we store some metadata in some unused portions of the graph. As an example, let us say we have an ML.NET model with an output column NGrams that outputs a vector of NGram counts. This column will have an Annotation in ML.NET named SlotNames. When this model is exported to onnx, we create an additional LabelEncoder node and store the SlotNames in the keys_strings attribute of the LabelEncoder.

The LabelEncoder is created with an input name of $"mlnet.{column.Name}.unusedInput", an output name of $"mlnet.{column.Name}.unusedOutput" and a node name of $"mlnet.{column.Name}.SlotNames". (All the actual output columns of the ML.NET model are suffixed with a ".output" string)

Then when OnnxTransformer loads the graph it goes through the list of output nodes and creates output columns for each of them in its output schema. For each column it searches the graph for a node named $"mlnet.{column.Name}.SlotNames". If it finds it, it reads the keys_strings attributes from that node and adds those strings as SlotNames annotation to that column.

This SlotNames data should then be available as annotations on the column in both ML.NET and Nimbus.

ganik

yaeldekel · 2020-02-20T08:52:53Z

test/BaselineOutput/Common/Onnx/BinaryClassification/BreastCancer/OneHotBagPipeline.txt

-              0.50476193,
-              -0.97911227
+              0.504761934,
+              -0.979112267


How come these changed? #Resolved

Not sure. But I see this occurring off and on that the baselines numbers change when we run them locally. I am ignoring them because the change is in the 7th decimal place

In reply to: 381858622 [](ancestors = 381858622)

yaeldekel · 2020-02-20T08:54:38Z

test/Microsoft.ML.Tests/OnnxConversionTest.cs

+                    var mlNetSlotNames = mlNetSlots.DenseValues().ToList();
+                    var onnxSlotNames = onnxSlots.DenseValues().ToList();
+                    for (int j = 0; j < mlNetSlots.Length; j++)
+                        Assert.Equal(mlNetSlotNames[j].ToString(), onnxSlotNames[j].ToString());


Equal [](start = 31, length = 5)

nit: I think Assert.Equal also has an overload for IEnumerables.

I tried that already. But the Assert was firing even when all the strings were equal. Not sure why.

In reply to: 381859560 [](ancestors = 381859560)

yaeldekel · 2020-02-20T09:08:54Z

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs

+            var labelEncoderOutput = ctx.AddIntermediateVariable(NumberDataViewType.Int64, labelEncoderOutputName, true);
+            var node = ctx.CreateNode(opType, one, labelEncoderOutput, labelEncoderNodeName);
+            node.AddAttribute("keys_strings", slotNamesAsStrings);
+            node.AddAttribute("values_int64s", Enumerable.Range(0, slotNames.Length).Select(x => (long)x));


values_int64s [](start = 31, length = 13)

Why do we need this? #Resolved

These are unused. But are specified only to satisfy ORT.

In reply to: 381867237 [](ancestors = 381867237)

Added slot names support for OnnxTransformer

eef7621

harishsk requested review from antoniovs1029, ganik and yaeldekel February 19, 2020 18:59

harishsk requested a review from a team as a code owner February 19, 2020 18:59

Updated baselines for failing tests

e2fbac4

ganik approved these changes Feb 19, 2020

View reviewed changes

harishsk merged commit 8d1809e into dotnet:master Feb 19, 2020

yaeldekel reviewed Feb 20, 2020

View reviewed changes

antoniovs1029 mentioned this pull request Apr 9, 2020

Avoid propagating some input columns when applying an Onnx model #5012

Closed

harishsk deleted the slotNames branch April 21, 2020 23:59

ghost locked as resolved and limited conversation to collaborators Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added slot names support for OnnxTransformer#4857

Added slot names support for OnnxTransformer#4857
harishsk merged 2 commits intodotnet:masterfrom
harishsk:slotNames

harishsk commented Feb 19, 2020

Uh oh!

ganik left a comment

Uh oh!

yaeldekel Feb 20, 2020 •

edited by harishsk

Loading

Uh oh!

harishsk Feb 21, 2020

Uh oh!

yaeldekel Feb 20, 2020

Uh oh!

harishsk Feb 21, 2020

Uh oh!

yaeldekel Feb 20, 2020 •

edited by harishsk

Loading

Uh oh!

harishsk Feb 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

harishsk commented Feb 19, 2020

Uh oh!

ganik left a comment

Choose a reason for hiding this comment

Uh oh!

yaeldekel Feb 20, 2020 • edited by harishsk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk Feb 21, 2020

Choose a reason for hiding this comment

Uh oh!

yaeldekel Feb 20, 2020

Choose a reason for hiding this comment

Uh oh!

harishsk Feb 21, 2020

Choose a reason for hiding this comment

Uh oh!

yaeldekel Feb 20, 2020 • edited by harishsk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk Feb 21, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaeldekel Feb 20, 2020 •

edited by harishsk

Loading

yaeldekel Feb 20, 2020 •

edited by harishsk

Loading