StopWordsRemovingEstimator export to Onnx by kere-nel · Pull Request #5279 · dotnet/machinelearning

kere-nel · 2020-07-02T18:57:32Z

Exporting StopWordsRemovingEstimator/CustomStopWordsRemovingEstimator to Onnx.
A test which currently fails is the edge case where all the words in the text get removed. While Ml.net produces an empty array, Onnx produces an array of length 1 with the empty string. Suggestions for how handle this case?

Note: The decision was address the issue mentioned above in another PR.

harishsk · 2020-07-02T19:49:40Z

src/Microsoft.ML.Transforms/Text/StopWordsRemovingTransformer.cs

+            {
+                var opType = "Squeeze";
+                var squeezeOutput = ctx.AddIntermediateVariable(null, "SqueezeOutput", true);
+                var node = ctx.CreateNode(opType, srcVariableName, squeezeOutput, ctx.GetNodeName(opType), "");


Is it possible to avoid skipping the shape and type addition? #Resolved

The reason I skipped the shape is because the shape of the tokenized word vector is not known prior to inference time. The number of words tokenized may be any number. #Resolved

Other text transformers that also infer shape:

https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Transforms/Text/WordTokenizing.cs

https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Transforms/Text/TokenizingByCharacters.cs #Resolved

harishsk · 2020-07-02T19:52:01Z

src/Microsoft.ML.Transforms/Text/StopWordsRemovingTransformer.cs

+                var opType = "Squeeze";
+                var squeezeOutput = ctx.AddIntermediateVariable(null, "SqueezeOutput", true);
+                var node = ctx.CreateNode(opType, srcVariableName, squeezeOutput, ctx.GetNodeName(opType), "");
+                node.AddAttribute("axes", new long[] { 0 });


Not in this PR, but in a different PR it maybe worth considering changing the default domain for CreateNode to be "ai.onnx" and not "ai.onnx.ml". The latter has very few ops and we mostly use operators from "ai.onnx" and it makes sense to retain that as the default.

src/Microsoft.ML.Transforms/Text/StopWordsRemovingTransformer.cs

harishsk · 2020-07-02T19:56:47Z

Can you check whether the number of elements returned is the same?

codecov · 2020-07-02T20:20:52Z

Codecov Report

Merging #5279 into master will increase coverage by 0.00%.
The diff coverage is 96.80%.

@@           Coverage Diff           @@
##           master    #5279   +/-   ##
=======================================
  Coverage   73.68%   73.68%           
=======================================
  Files        1022     1022           
  Lines      190366   190348   -18     
  Branches    20474    20472    -2     
=======================================
+ Hits       140265   140267    +2     
+ Misses      44568    44548   -20     
  Partials     5533     5533

Flag	Coverage Δ
#Debug	`73.68% <96.80%> (+<0.01%)`	⬆️
#production	`69.42% <94.73%> (+<0.01%)`	⬆️
#test	`87.65% <100.00%> (+0.03%)`	⬆️

Impacted Files	Coverage Δ
...ML.Transforms/Text/StopWordsRemovingTransformer.cs	`86.67% <94.73%> (+0.89%)`	⬆️
test/Microsoft.ML.Tests/OnnxConversionTest.cs	`96.66% <100.00%> (+0.08%)`	⬆️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs	`40.00% <0.00%> (-3.46%)`	⬇️
src/Microsoft.ML.CodeGenerator/Utils.cs	`59.20% <0.00%> (-2.12%)`	⬇️
src/Microsoft.ML.Data/Training/TrainerUtils.cs	`65.86% <0.00%> (-1.01%)`	⬇️
...dardTrainers/Standard/Online/AveragedPerceptron.cs	`89.70% <0.00%> (-0.58%)`	⬇️
...rc/Microsoft.ML.LightGbm/LightGbmRankingTrainer.cs	`88.00% <0.00%> (-0.38%)`	⬇️
...c/Microsoft.ML.Data/DataLoadSave/EstimatorChain.cs	`89.65% <0.00%> (-0.35%)`	⬇️
src/Microsoft.ML.Data/Prediction/Calibrator.cs	`80.45% <0.00%> (-0.27%)`	⬇️
src/Microsoft.ML.Sweeper/AsyncSweeper.cs	`71.23% <0.00%> (-0.20%)`	⬇️
... and 29 more

kere-nel · 2020-07-02T21:54:03Z

Can you check whether the number of elements returned is the same?

I think this already done as part of the onnx testing framework:
https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.TestFramework/BaseTestBaseline.cs#L751

harishsk · 2020-07-02T22:19:57Z

Can you reshape the returned vector in SaveAsOnnx to be compatible with ML.NET?

In reply to: 653235748 [](ancestors = 653235748)

harishsk

kere-nel requested a review from a team as a code owner July 2, 2020 18:57

kere-nel requested review from antoniovs1029, harishsk and wangyems July 2, 2020 19:08

harishsk reviewed Jul 2, 2020

View reviewed changes

src/Microsoft.ML.Transforms/Text/StopWordsRemovingTransformer.cs Show resolved Hide resolved

kere-nel requested a review from harishsk July 10, 2020 22:47

kere-nel added 3 commits July 10, 2020 16:12

StopWordsRemoving transformer export to onnx

98d9f5b

format changes

809d77f

adding types

f55873c

kere-nel force-pushed the onnx_stop_wrds branch from 86621de to f55873c Compare July 10, 2020 23:13

harishsk approved these changes Jul 10, 2020

View reviewed changes

kere-nel merged commit 7879849 into dotnet:master Jul 11, 2020

ghost locked as resolved and limited conversation to collaborators Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StopWordsRemovingEstimator export to Onnx#5279

StopWordsRemovingEstimator export to Onnx#5279
kere-nel merged 3 commits intodotnet:masterfrom
kere-nel:onnx_stop_wrds

kere-nel commented Jul 2, 2020 •

edited

Loading

Uh oh!

harishsk Jul 2, 2020 •

edited by kere-nel

Loading

Uh oh!

kere-nel Jul 2, 2020 •

edited

Loading

Uh oh!

kere-nel Jul 2, 2020 •

edited

Loading

Uh oh!

harishsk Jul 2, 2020

Uh oh!

Uh oh!

harishsk commented Jul 2, 2020

Uh oh!

codecov bot commented Jul 2, 2020 •

edited

Loading

Uh oh!

kere-nel commented Jul 2, 2020 •

edited

Loading

Uh oh!

harishsk commented Jul 2, 2020

Uh oh!

harishsk left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kere-nel commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harishsk Jul 2, 2020 • edited by kere-nel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kere-nel Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kere-nel Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk Jul 2, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

harishsk commented Jul 2, 2020

Uh oh!

codecov bot commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kere-nel commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harishsk commented Jul 2, 2020

Uh oh!

harishsk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kere-nel commented Jul 2, 2020 •

edited

Loading

harishsk Jul 2, 2020 •

edited by kere-nel

Loading

kere-nel Jul 2, 2020 •

edited

Loading

kere-nel Jul 2, 2020 •

edited

Loading

codecov bot commented Jul 2, 2020 •

edited

Loading

kere-nel commented Jul 2, 2020 •

edited

Loading