add root cause localization transformer#4925
add root cause localization transformer#4925harishsk merged 51 commits intodotnet:masterfrom suxi-ms:master
Conversation
mstfbl
left a comment
There was a problem hiding this comment.
Hi @suxi-ms , thank you for your PR! Can you please briefly describe the changes you've made according to our contribution guide?
|
@suxi-ms, please do following thing then we can start code review:
|
| private static IRowMapper Create(IHostEnvironment env, ModelLoadContext ctx, DataViewSchema inputSchema) | ||
| => Create(env, ctx).MakeRowMapper(inputSchema); | ||
|
|
||
| private protected override void SaveModel(ModelSaveContext ctx) |
There was a problem hiding this comment.
SaveModel [](start = 40, length = 9)
Don't you need to save (and load) _beta? #Resolved
There was a problem hiding this comment.
SaveModel [](start = 40, length = 9)
Don't you need to save (and load)
_beta?
Thank you for the suggestion, I have added changeds to save _beta #Resolved
| } | ||
|
|
||
| [Fact] | ||
| public void RootCauseLocalizationWithDT() |
There was a problem hiding this comment.
RootCauseLocalizationWithDT [](start = 20, length = 27)
Please also test serialization/deserialization. #Resolved
There was a problem hiding this comment.
RootCauseLocalizationWithDT [](start = 20, length = 27)
Please also test serialization/deserialization.
Could you help explain more, I did't exactly understand the suggestion. Which part needs serialization/deserialization? #Resolved
There was a problem hiding this comment.
Part of the DTRootCauseLocalizationTransformer class is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using the ml.Model.Save and ml.Model.Load APIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.
In reply to: 399045379 [](ancestors = 399045379)
There was a problem hiding this comment.
Part of the
DTRootCauseLocalizationTransformerclass is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using theml.Model.Saveandml.Model.LoadAPIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.In reply to: 399045379 [](ancestors = 399045379)
Do I need to save the model, I find in the document, the save menthod is used for training related data? #Resolved
There was a problem hiding this comment.
Sorry, I was unclear. You should save the model, not the data. Using this API:
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Model/ModelOperationsCatalog.cs#L112
You then need to load the model using this API:
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Model/ModelOperationsCatalog.cs#L211
and then run the loaded model on the data, and verify that it gives the same outputs as the model before serialization.
Here is a test that contains an example of how to save and load the model:
https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Transformers/NormalizerTests.cs#L961
(the saving and loading is done in lines 969 and 970).
In reply to: 399795392 [](ancestors = 399795392)
There was a problem hiding this comment.
Part of the
DTRootCauseLocalizationTransformerclass is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using theml.Model.Saveandml.Model.LoadAPIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.In reply to: 399045379 [](ancestors = 399045379)
I have tried to use this logic to save and load model,
var modelPath = "temp.zip";
//Save model to a file
ml.Model.Save(model, data.Schema, modelPath);
//Load model from a file
ITransformer serializedModel;
using (var file = File.OpenRead(modelPath))
{
serializedModel = ml.Model.Load(file, out var serializedSchema);
TestCommon.CheckSameSchemas(data.Schema, serializedSchema);
}
However, the Save() method will hit an exception because Save.IsColumnSavable(type) returns false. The root cause is we have a customized dataviewtype, which is neither VectorDataViewType nor PrimitiveDataViewType, the check failed so no columns is added for save. Do you have any guidance on this? #Resolved
There was a problem hiding this comment.
Part of the
DTRootCauseLocalizationTransformerclass is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using theml.Model.Saveandml.Model.LoadAPIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.
In reply to: 399045379 [](ancestors = 399045379)I have tried to use this logic to save and load model,
var modelPath = "temp.zip"; //Save model to a file ml.Model.Save(model, data.Schema, modelPath); //Load model from a file ITransformer serializedModel; using (var file = File.OpenRead(modelPath)) { serializedModel = ml.Model.Load(file, out var serializedSchema); TestCommon.CheckSameSchemas(data.Schema, serializedSchema); }However, the
Save()method will hit an exception becauseSave.IsColumnSavable(type)returns false. The root cause is we have a customized dataviewtype, which is neitherVectorDataViewTypenorPrimitiveDataViewType, the check failed so no columns is added for save. Do you have any guidance on this?
Has updated according to your suggestion in email discussion #Resolved
| var src = default(RootCauseLocalizationInput); | ||
| var getSrc = input.GetGetter<RootCauseLocalizationInput>(input.Schema[ColMapNewToOld[iinfo]]); | ||
|
|
||
| disposer = |
There was a problem hiding this comment.
disposer [](start = 16, length = 8)
I think this is not necessary. This is used by components that use resources that are IDisposable, for example, ImageResizer that uses a Bitmap for its computations. #Resolved
There was a problem hiding this comment.
disposer [](start = 16, length = 8)
I think this is not necessary. This is used by components that use resources that are
IDisposable, for example,ImageResizerthat uses aBitmapfor its computations.
Have changed, please help review whether the change is right #Resolved
There was a problem hiding this comment.
There was a problem hiding this comment.
You can also assign
nullinstead.In reply to: 399045482 [](ancestors = 399045482)
Have updated to null #Resolved
| /// <param name="columns">The name of the columns (first item of the tuple), and the name of the resulting output column (second item of the tuple).</param> | ||
| /// <param name="beta">The weight for generating score in output result.</param> | ||
| [BestFriend] | ||
| internal DTRootCauseLocalizationEstimator(IHostEnvironment env, double beta = Defaults.Beta, params (string outputColumnName, string inputColumnName)[] columns) |
There was a problem hiding this comment.
[] [](start = 157, length = 2)
Since this transformer is non-trainable (meaning, the estimator doesn't need to look at the data in order to create a transformer), there is no advantage to supporting multiple input/output columns here - if there is ever a scenario where there are multiple input columns, the user can create an estimator chain instead. #Resolved
There was a problem hiding this comment.
[] [](start = 157, length = 2)
Since this transformer is non-trainable (meaning, the estimator doesn't need to look at the data in order to create a transformer), there is no advantage to supporting multiple input/output columns here - if there is ever a scenario where there are multiple input columns, the user can create an estimator chain instead.
I find that ImageLoading and TextNormalizing are also non-trainable, however they defines this columns parameter, could you help explain when to use it and when not? #Resolved
There was a problem hiding this comment.
[] [](start = 157, length = 2)
Since this transformer is non-trainable (meaning, the estimator doesn't need to look at the data in order to create a transformer), there is no advantage to supporting multiple input/output columns here - if there is ever a scenario where there are multiple input columns, the user can create an estimator chain instead.
I find that ImageLoading and TextNormalizing are also non-trainable, however they defines this columns parameter, could you help explain when to use it and when not?
I have updated the columns to outputColumnand inputColumn, could you help check whether they are right? #Resolved
| _beta = beta; | ||
| } | ||
|
|
||
| // Factory method for SignatureDataTransform. |
There was a problem hiding this comment.
SignatureDataTransform [](start = 30, length = 22)
This (and the matching attribute) is only needed if you want to use this component from the command line. If this is the intention, please add a test for running this using maml. #Resolved
There was a problem hiding this comment.
SignatureDataTransform [](start = 30, length = 22)
This (and the matching attribute) is only needed if you want to use this component from the command line. If this is the intention, please add a test for running this using maml.
I have no idea about the maml, could you help provide more information? #Resolved
There was a problem hiding this comment.
SignatureDataTransform [](start = 30, length = 22)
This (and the matching attribute) is only needed if you want to use this component from the command line. If this is the intention, please add a test for running this using maml.
I have no idea about the maml, could you help provide more information?
maml is not needed and I have removed this method #Resolved
| if (!(col.ItemType is RootCauseLocalizationInputDataViewType) || col.Kind != SchemaShape.Column.VectorKind.Scalar) | ||
| throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.inputColumnName, new RootCauseLocalizationInputDataViewType().ToString(), col.GetTypeString()); | ||
|
|
||
| result[colInfo.outputColumnName] = new SchemaShape.Column(colInfo.outputColumnName, col.Kind, col.ItemType, col.IsKey, col.Annotations); |
There was a problem hiding this comment.
col.Kind, col.ItemType, col.IsKey, col.Annotations [](start = 100, length = 50)
These should be the values for the output column. The Kind should be SchemaShape.Column.VectorKind.Scalar (this should match col.Kind since you are checking above that it is a scalar), but what should the ItemType be? Also, seems like col.IsKey is always false, and are there any annotations that need to be passed from the input to the output? #Resolved
There was a problem hiding this comment.
col.Kind, col.ItemType, col.IsKey, col.Annotations [](start = 100, length = 50)
These should be the values for the output column. The
Kindshould beSchemaShape.Column.VectorKind.Scalar(this should matchcol.Kindsince you are checking above that it is a scalar), but what should theItemTypebe? Also, seems likecol.IsKeyis always false, and are there any annotations that need to be passed from the input to the output?
made some udpates, could you help review whether they are right? #Resolved
There was a problem hiding this comment.
Why was this check removed - col.Kind != SchemaShape.Column.VectorKind.Scalar? Isn't it correct?
In reply to: 399045854 [](ancestors = 399045854)
There was a problem hiding this comment.
Ha
Why was this check removed -
col.Kind != SchemaShape.Column.VectorKind.Scalar? Isn't it correct?In reply to: 399045854 [](ancestors = 399045854)
Have updated. #Resolved
| => new SrCnnAnomalyEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, windowSize, backAddWindowSize, lookaheadWindowSize, averageingWindowSize, judgementWindowSize, threshold, inputColumnName); | ||
|
|
||
| /// <summary> | ||
| /// Create <see cref="DTRootCauseLocalizationEstimator"/>, which localizes root causess using decision tree algorithm. |
There was a problem hiding this comment.
causess [](start = 88, length = 7)
typo #Resolved
There was a problem hiding this comment.
causess [](start = 88, length = 7)
typo
Thank you for pointing it out, has fixed it. #Resolved
| return subDim; | ||
| } | ||
|
|
||
| protected List<RootCauseItem> LocalizeRootCauseByDimension(PointTree anomalyTree, PointTree pointTree, Dictionary<string, Object> anomalyDimension, List<string> aggDims) |
There was a problem hiding this comment.
suggestion: Please consider something along the lines of the following for the GetSubDim function.
return new Dictionary<string, object>(keyList.Select(dim => new KeyValuePair<string, object>(dim, dimension[dim])));
#Resolved
There was a problem hiding this comment.
suggestion: Please consider something along the lines of the following for the GetSubDim function.
return new Dictionary<string, object>(keyList.Select(dim => new KeyValuePair<string, object>(dim, dimension[dim])));
updated #Resolved
| } | ||
|
|
||
| private double Log2(double val) | ||
| { |
There was a problem hiding this comment.
Please consider using the following attribute for the function:
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
``` #Resolved
There was a problem hiding this comment.
Please consider using the following attribute for the function:
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
updated #Resolved
| if (dimension.Key.AnomalyDis.Count > 1) | ||
| { | ||
| if (best == null || (!Double.IsNaN(valueRatioMap[best]) && (best.AnomalyDis.Count != 1 && (isLeavesLevel ? valueRatioMap[best].CompareTo(dimension.Value) <= 0 : valueRatioMap[best].CompareTo(dimension.Value) >= 0)))) | ||
| { |
There was a problem hiding this comment.
suggestion: For readability, can you please avoid the long line and break up the conditions on separate lines? Same comment for line 487. #Resolved
There was a problem hiding this comment.
suggestion: For readability, can you please avoid the long line and break up the conditions on separate lines? Same comment for line 487.
made some updates #Resolved
|
|
||
| //check beta | ||
| if (beta < 0 || beta > 1) { | ||
| host.CheckUserArg(beta >= 0 && beta <= 1, nameof(beta), "Must be in [0,1]"); |
There was a problem hiding this comment.
You don't need this if check. The CheckUserArg is performing the check. #Resolved
There was a problem hiding this comment.
You don't need this if check. The CheckUserArg is performing the check.
will update #Resolved
| @@ -0,0 +1,49 @@ | |||
| At Mircosoft, we develop a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally. | |||
There was a problem hiding this comment.
Typo, "Microsoft." Also, it's a bit nonstandard to use present tense for "we develop." I would expect "we have developed" if the work is completed or "we maintain" if the work is ongoing.
| At Mircosoft, we develop a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally. | ||
|
|
||
| ## Multi-Dimensional Root Cause Localization | ||
| It's a common case that one measure is collected with many dimensions (*e.g.*, Province, ISP) whose values are categorical(*e.g.*, Beijing or Shanghai for dimension Province). When a measure's value deviates from its expected value, this measure encounters anomalies. In such case, operators would like to localize the root cause dimension combinations rapidly and accurately. Multi-dimensional root cause localization is critical to troubleshoot and mitigate such case. |
There was a problem hiding this comment.
Let's use "users" instead of "operators."
|
|
||
| ### Decision Tree | ||
|
|
||
| [Decision tree](https://en.wikipedia.org/wiki/Decision_tree) algorithm chooses the highest information gain to split or construct a decision tree. We use it to choose the dimension which contributes the most to the anomaly. Following are some concepts used in decision tree. |
There was a problem hiding this comment.
It's non-standard to omit articles here. Try something like "The Decision Tree algorithm chooses..." and "Below are some concepts used in decision trees"
|
|
||
| Where $Ent(D^v)$ is the entropy of set points in D for which dimension $a$ is equal to $v$, $|D|$ is the total number of points in dataset $D$. $|D^V|$ is the total number of points in dataset $D$ for which dimension $a$ is equal to $v$. | ||
|
|
||
| For all aggregated dimensions, we calculate the information for each dimension. The greater the reduction in this uncertainty, the more information is gained about D from dimension $a$. |
There was a problem hiding this comment.
D should be in dollar signs?
| { | ||
| public static class LocalizeRootCause | ||
| { | ||
| private static string AGG_SYMBOL = "##SUM##"; |
There was a problem hiding this comment.
What is AGG_SYMBOL for here? I notice that on line 19, you have both AGG_SYMBOL and AggregateType.Sum, and that some of the points have AGG_SYMBOL passed in instead of strings like "DC1."
Can you add a few comments explaining what AGG_SYMBOL is and why it is used?
| /// </example> | ||
| public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog catalog, string outputColumnName, string inputColumnName, | ||
| int windowSize=64, int backAddWindowSize=5, int lookaheadWindowSize=5, int averageingWindowSize=3, int judgementWindowSize=21, double threshold=0.3) | ||
| int windowSize = 64, int backAddWindowSize = 5, int lookaheadWindowSize = 5, int averageingWindowSize = 3, int judgementWindowSize = 21, double threshold = 0.3) |
There was a problem hiding this comment.
I believe it's spelled "averaging" (without an "e"). If this is an easy change to make, would be a good way to keep our code base looking high quality.
| /// </summary> | ||
| /// <param name="catalog">The anomaly detection catalog.</param> | ||
| /// <param name="src">Root cause's input. The data is an instance of <see cref="Microsoft.ML.TimeSeries.RootCauseLocalizationInput"/>.</param> | ||
| /// <param name="beta">Beta is a weight parameter for user to choose. It is used when score is calculated for each root cause item. The range of beta should be in [0,1]. For a larger beta, root cause point which has a large difference between value and expected value will get a high score. On the contrary, for a small beta, root cause items which has a high relative change will get a high score.</param> |
There was a problem hiding this comment.
You say "on the contrary," but the two scenarios you describe don't seem to be opposites. One is about "relative change" and one is about difference between expected value and actual value. Can you make this explanation clearer?
Additionally, you mention "score," but it's not clear what score is, exactly.
The goal of this pull request is to provide a decision tree based algorithm to localize the root cause of an incident on multi-dimensional time series on a specified timestamp.