Added slot names support for OnnxTransformer#4857
Added slot names support for OnnxTransformer#4857harishsk merged 2 commits intodotnet:masterfrom harishsk:slotNames
Conversation
| 0.50476193, | ||
| -0.97911227 | ||
| 0.504761934, | ||
| -0.979112267 |
There was a problem hiding this comment.
Not sure. But I see this occurring off and on that the baselines numbers change when we run them locally. I am ignoring them because the change is in the 7th decimal place
In reply to: 381858622 [](ancestors = 381858622)
| var mlNetSlotNames = mlNetSlots.DenseValues().ToList(); | ||
| var onnxSlotNames = onnxSlots.DenseValues().ToList(); | ||
| for (int j = 0; j < mlNetSlots.Length; j++) | ||
| Assert.Equal(mlNetSlotNames[j].ToString(), onnxSlotNames[j].ToString()); |
There was a problem hiding this comment.
Equal [](start = 31, length = 5)
nit: I think Assert.Equal also has an overload for IEnumerables.
There was a problem hiding this comment.
I tried that already. But the Assert was firing even when all the strings were equal. Not sure why.
In reply to: 381859560 [](ancestors = 381859560)
| var labelEncoderOutput = ctx.AddIntermediateVariable(NumberDataViewType.Int64, labelEncoderOutputName, true); | ||
| var node = ctx.CreateNode(opType, one, labelEncoderOutput, labelEncoderNodeName); | ||
| node.AddAttribute("keys_strings", slotNamesAsStrings); | ||
| node.AddAttribute("values_int64s", Enumerable.Range(0, slotNames.Length).Select(x => (long)x)); |
There was a problem hiding this comment.
values_int64s [](start = 31, length = 13)
Why do we need this? #Resolved
There was a problem hiding this comment.
These are unused. But are specified only to satisfy ORT.
In reply to: 381867237 [](ancestors = 381867237)
This PR adds support for persisting the SlotNames annotations of a column during onnx export and reading those back in OnnxTransformer and adding the annotations back to the column when the onnx model is read from disk.
Onnx natively does not have support for annotations. To work around this, we store some metadata in some unused portions of the graph. As an example, let us say we have an ML.NET model with an output column NGrams that outputs a vector of NGram counts. This column will have an Annotation in ML.NET named SlotNames. When this model is exported to onnx, we create an additional LabelEncoder node and store the SlotNames in the keys_strings attribute of the LabelEncoder.
The LabelEncoder is created with an input name of
$"mlnet.{column.Name}.unusedInput", an output name of $"mlnet.{column.Name}.unusedOutput" and a node name of$"mlnet.{column.Name}.SlotNames". (All the actual output columns of the ML.NET model are suffixed with a".output"string)Then when OnnxTransformer loads the graph it goes through the list of output nodes and creates output columns for each of them in its output schema. For each column it searches the graph for a node named
$"mlnet.{column.Name}.SlotNames". If it finds it, it reads the keys_strings attributes from that node and adds those strings as SlotNames annotation to that column.This SlotNames data should then be available as annotations on the column in both ML.NET and Nimbus.