Model comparison v2 #1672

tolga-b · 2018-12-07T18:53:15Z

Second version of model comparison for what-if-tool plugin.

Motivation for features / changes
PR #1589 added the ability to load two models and compare their inference scores in Facets Dive panel for binary classification and Datapoint Editor tab for classification.
This PR extends the model comparison mode to Global/Partial PDP, Fairness and Performance panels. It also adds comparing regression and multi-class models in all supported panels.
All model prediction related features in facets dive (Inference Value, Inference Error etc.) now also have multiple model support.

The only mode that does not support multiple models is closest counterfactual button which still computes its results on model 1.

Technical description of changes
Some variables that used to handle single models became arrays indexed by model number.
Some string keys for facets features now have model number appended to them.

Add the ability to compare two models them for Global PDP, PDP plots:
Now the Global PDP and PDP plots show one line per model per label in the same chart.
(Previously they only showed one line per label.)

Performance and Fairness panel:
Show one ROC curve per model within the same chart. (Previously showed ROC curve for single model.)
Show one threshold and confusion matrix per model. (Previously showed threshold and confusion matrix for single model.)

Add the ability to load two models for regression:
Facets dive features and MSEs in Performance and Fairness panel work for two regression models.

Minor:

Do not show model number in facets "Inference Score" when only one model is present.
Categorical PD plots now show the original value as text under the bar (feature name - original value), instead of showing orange bar for original feature and blue for others
Remove forceUpdateFeatureValueThresholds as it is unnecessary when bound to arrays with "{{}}" in dom-repeat
Fixes a bug with multi-class button in inference settings panel where clicking on it after the tool is used on a binary model without refreshing the page could throw an exception.

Screenshots of UI changes
all features now support two models

features old

new performance and fairness tab

old performance and fairness tab

pdp shows multiple models

same for multi-class case

categorical pdp shows each model as separate bar and labels original value in text

all regression features are available in facets

regression performance shows both models

Detailed steps to verify changes work correctly (as executed by you)

Tested all offline demos, multi/single model for binary and multi-class models (using the settings panel multi-class option with binary models).
Tested multiple model regression with age demo by duplicating the first model.
Testing involves visually verifying the functionality of all panels by clicking buttons and verifying the outputs and checking for exceptions in the browser console.

Alternate designs / implementations considered

N/A

jameswex · 2018-12-07T19:11:55Z

tensorboard/plugins/interactive_inference/utils/inference_utils.py

@@ -509,6 +514,8 @@ def make_json_formatted_for_single_chart(mutant_features,
      was sent for inference. The length of that field should be the same length
      of mutant_features.
    index_to_mutate: The index of the feature being mutated for this chart.
+    model_id: Index of the model corresponding to inference_results_proto to
+      distinguish between multiple models for model comparison


this isn't used anywhere in the method?

you are right, removed it.

jameswex · 2018-12-07T19:12:31Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

                            <div class="flex">
+                              <template is="dom-repeat" items="{{featureValueThreshold.threshold}}">


indent the paper-slider

jameswex · 2018-12-07T19:13:13Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

-                                            immediate-value="{{overallThreshold}}" value="[[overallThreshold]]">
-                              </paper-slider>
+                              <template is="dom-repeat" items="{{overallThresholds}}" as="overallThreshold">
+                                <paper-slider class="slider" editable=true min="0" max="1" step="0.01"


add label for sliders when more than one model?

jameswex · 2018-12-07T19:14:24Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

-                                                  class="conf-matrix"></tf-confusion-matrix>
-                              </tf-confusion-matrix>
+                              <div>
+                                <template is="dom-repeat" items="[[inferenceStats_]]" index-as="modelInd">


label conf matrices when more than one model

jameswex · 2018-12-07T19:17:10Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+        // series
+        self = this;
+        const mapped = {};
+        _.forEach(data, function(modelValues, modelInd){


i know the code was like this before, but if you remove function() {} and replace with () => {} here and below, then you don't need the self=this, can just use this. also if you were to use "self", should be "const self"

jameswex

some quick initial comments

jameswex · 2018-12-07T20:01:05Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+
+      getNumberOfModels_: function(){
+        return this.modelName.split(',').length;
+      },


its strange that each time we want this we have to do this string split. i think numModels should be a data member that is set/updated at any appropriate time during inference or after model name setting, and then is used without needing recalcuating.

You are right, moved it to variable numModels which gets updated when user clicks Accept in settings pane.

jameswex · 2018-12-07T20:03:57Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+            plotStats.push(inferenceStats.faceted[key]);
+            plotThresholds.push(modelThresholds[modelInd].threshold)
+          }
+        this.plotPr(


indenting issue here maybe?

jameswex · 2018-12-07T20:06:08Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

-            inferenceMap[this.inferences.indices[i]][0] =
+            for (let modelNum = 0; modelNum < this.inferences.results.length;
+                 modelNum++){
+            const result = this.inferences.results[modelNum].regressionResult;


indenting issue

jameswex · 2018-12-07T20:06:26Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+                modelInd < self.getNumberOfModels_();
+                modelInd++) {
+              item[self.strWithModelIndex_(inferenceLabelStr, modelInd)] =
+              self.labelVocab[


jameswex · 2018-12-07T20:07:15Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

          return;
        }

        // Binary classifier case.
-        let threshold = this.overallThreshold;
+        let threshold = this.overallThresholds;


probably better if this is called "thresholds" as it is a list

jameswex

more comments

tolga-b · 2018-12-13T00:41:30Z

Thank you James! Should be all set with the comments.

jameswex · 2018-12-13T14:26:26Z

tensorboard/plugins/interactive_inference/utils/inference_utils.py

-        mutant_examples, serving_bundle)
-    return make_json_formatted_for_single_chart(mutant_features,
+    charts = []
+    for model_id, serving_bundle in enumerate(serving_bundles):


remove model_id and enumerate call as model_id isn't used

good catch, removed!

jameswex · 2018-12-13T14:28:40Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

@@ -222,8 +221,9 @@
      }

      .pr-line-chart {
-        height: 150px;
-        width: 250px;
+        margin: 6px 6px 6px 6px;


if you want margin of 6px on all sides you can just do "margin: 6px" - same rules for padding as well. see https://www.w3schools.com/cssref/pr_margin.asp

thanks, done.

jameswex · 2018-12-13T14:29:10Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

        transform: rotate(270deg);
+        font-size: 12px;
+        color: #3c4043;


where'd this color come from?

Mahima added it yesterday.

jameswex · 2018-12-13T14:33:17Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+            this.examplesAndInferences[0].inferences == null ||
+            this.examplesAndInferences[0].inferences[0].length
+              != this.numModels ||
+            (this.inferenceStats_.length != this.numModels &&


maybe add comments about this check, as its gotten complex with these new cases

added description.

jameswex · 2018-12-13T14:36:18Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+        // console.log('overallthresholds', this.overallThresholds)
+        // console.log('inferencestats', this.inferenceStats_)
+        // console.log('inferences', this.inferences.results)
+        // console.log('examplesandinferences', this.examplesAndInferences)


remove commented-out debug logs

oops, done!

jameswex · 2018-12-13T14:37:41Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+          },
+        ];
+        // If there is more than 1 model, show model number tooltip
+        if (data.length >= 2){


nit: space before {

jameswex · 2018-12-13T14:38:03Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+      },
+
+      formatChartKey: function(labelKey, modelInd, numberOfModels) {
+          if (numberOfModels == 1){


space before {

jameswex

looks great. some minor comments.

also, have you tested it with multiple multiclass models (can just treat our binary models as multiclass to test that) and multiple regression models?

tolga-b · 2018-12-17T22:33:11Z

Did another pass.
Tested all offline demos, multi/single model for binary and multi-class models (using the settings panel multi-class option with binary models). Tested multiple model regression with age demo by duplicating the first model.

Added binary check for ROC curve plotting in updateInferenceStats. Before, it only checked for the existence of chart, but if someone switches from binary to multi-class using inference settings window without refreshing the page, the chart elements still exist in the dom which causes an error in plotPR (thresholds don't exist for non-binary inferenceStats.faceted[key] is an object not an array so map in plotPR throws an error).

Added "restamp" to binary/multi-class/regression dom-ifs to delete unused chunks when model type changes. There is currently an issue with multi-class selection in inference settings panel where one may trigger a number of thresholds x number of thresholds (100 x 100) confusion matrix when switching from binary to multi-class model without refreshing the page. Without "restamp" this cripples the GUI after that event. With "restamp", the confusion matrix gets cleared after creation so the issue is temporary. Deferring fixing this to a future PR as it is independent from model comparison.

wchargin

Thanks for the clear PR description—this is very helpful.

tolga-b added 8 commits December 3, 2018 11:08

half working fairness pane

a937db6

fairness panel working, threshold buttons broken

add590d

fairness panel works

9c558d2

removed forceupdatefeaturevaluethresholds, unnecessary

0d150a8

fix default colorby

a70a69e

working for binary case live models

ceba9b7

working

05b9188

Merge branch 'master' into model_comparison_v2

e65c520

jameswex self-requested a review December 7, 2018 19:09

jameswex added plugin:what-if-tool type:feature labels Dec 7, 2018

jameswex reviewed Dec 7, 2018

View reviewed changes

tolga-b added 6 commits December 12, 2018 14:17

everything working, pre-cosmetic fix sliders

89bf46b

review comments complete

f8cc3f9

Merge branch 'master' into model_comparison_v2

eabd3e6

merge master

aae3af2

fixing tests

03a0e16

fixing tests

24867dc

jameswex reviewed Dec 13, 2018

View reviewed changes

tolga-b added 2 commits December 17, 2018 13:37

fixed demos and multi-class button issue

7214335

fixed more bugs

5ced347

jameswex approved these changes Dec 19, 2018

View reviewed changes

Merge branch 'master' into model_comparison_v2

29c88d2

wchargin approved these changes Dec 21, 2018

View reviewed changes

jameswex merged commit 33b82c4 into tensorflow:master Dec 21, 2018

tolga-b mentioned this pull request Apr 26, 2019

Model comparison in What-If Tool #1592

Closed

		<div class="flex">
		<template is="dom-repeat" items="{{featureValueThreshold.threshold}}">

Model comparison v2 #1672

Model comparison v2 #1672

Conversation

tolga-b commented Dec 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameswex left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameswex left a comment

Choose a reason for hiding this comment

tolga-b commented Dec 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameswex left a comment

Choose a reason for hiding this comment

tolga-b commented Dec 17, 2018

wchargin left a comment

Choose a reason for hiding this comment

tolga-b commented Dec 7, 2018 •

edited

Loading