Changed Binarizer node to be cast to the type of the predicted label …#4818
Changed Binarizer node to be cast to the type of the predicted label …#4818harishsk merged 3 commits intodotnet:masterfrom harishsk:binarizerBug
Conversation
…column's data type
| var t = InternalDataKindExtensions.ToInternalDataKind(DataKind.Boolean).ToType(); | ||
| node.AddAttribute("to", t); | ||
| var predictedLabelCol = OutputSchema.GetColumnOrNull(outColumnNames[0]); | ||
| node.AddAttribute("to", predictedLabelCol.HasValue ? predictedLabelCol.Value.Type.RawType : typeof(bool)); |
There was a problem hiding this comment.
predictedLabelCol.HasValue [](start = 36, length = 26)
Doesn't it always have a value?
Or, if it doesn't - should we be adding the ONNX node at all? #Resolved
There was a problem hiding this comment.
I have added an Assert to capture that and fixed the next line.
In reply to: 376929923 [](ancestors = 376929923)
I know this isn't related to the change in this PR, but is this correct? The binary classifier scorer has a While I could not find a way in the public API to change which column is used (by default it is actually the score column, even if the probability column exists), there is a public API to change the threshold value: Refers to: src/Microsoft.ML.Data/Scorers/BinaryClassifierScorer.cs:203 in fc1925c. [](commit_id = fc1925c, deletion_comment = False) |
Yep, this is not correct. Predicted label has to be based off threshold. In reply to: 584023200 [](ancestors = 584023200) Refers to: src/Microsoft.ML.Data/Scorers/BinaryClassifierScorer.cs:203 in fc1925c. [](commit_id = fc1925c, deletion_comment = False) |
…ulting baseline changes
…to the probability column
…column's data type
In BinaryClassifierScorer's SaveAsOnnxCore function we were always casting the output of the Binarizer to a bool. But in some cases BinaryClassifierScorer can output a key value (uint) and in this case we should cast the output to a uint. This fix changes the cast to be dependent on the output type of the predicted label.