Skip to content

Conversation

@sfilipi
Copy link
Member

@sfilipi sfilipi commented Jun 4, 2018

Adding the EntryPoints.md and the GraphRunner.md files.
EntryPoints.md introduces the entry points, the entry points manifests and classes that are associated with them.
GraphRunner.md introduces and describes the entry points graph structure.

Addresses #390

@dnfclas
Copy link

dnfclas commented Jun 4, 2018

CLA assistant check
All CLA requirements met. #Resolved

@@ -0,0 +1,188 @@
# Overview

An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON.
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ML.Net [](start = 43, length = 6)

I think the branding is not the lower case ML.Net but ML.NET. #Closed

@@ -0,0 +1,188 @@
# Overview

An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON.
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json [](start = 58, length = 4)

JSON is typically capitalized. #Closed

@@ -0,0 +1,188 @@
# Overview

An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON.
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it [](start = 74, length = 2)

Ambiguous, when we say "it" what are we referring to? Entry-points? JSON? An ML.NET type? #Closed


An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON.
It is also one of the ways ML.Net uses to deserialize experiments, and the recommended way to interface with other languages.
In terms defining experiments w.r.t entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes.
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w.r.t [](start = 30, length = 5)

If we want to use this initialism it would be "w.r.t." not "w.r.t". #Closed

"OutputData": "$Output_1528136517433",
"Model": "$TransformModel_1528136517433"
}
}
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be consistent with usage of spaces vs. tabs above. Prefer spaces. #Closed

@@ -0,0 +1,123 @@
# JSON Graph format

The entry point graph in TLC is an array of _nodes_. Each node is an object with the following fields:
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entry point [](start = 4, length = 11)

This might be a good place to have a link to EntryPoints.md. #Closed

- _array_ of the above. Represented as a JSON array, maps to a C# array.
- _dictionary_. Currently not implemented. Represented as a JSON object, maps to a C# `Dictionary<string,T>`.
- _component_. Currently not implemented. Represented as a JSON object with 2 fields: _name_:string and _settings_:object.

Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this information current? I thought I saw some support for these. Certainly components are supported (not as SubComponent type specifically, but we can use dependency injection through the component factories). #Resolved

Copy link
Member Author

@sfilipi sfilipi Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my edits to this file are not reflected. Fixing that. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected the component part. Double-checking on the dictionaries and indexing in arrays. I don't think we do that yet.


In reply to: 192866131 [](ancestors = 192866131)

- _TransformModel_
- _PredictorModel_

These must be passed as _variables_. The variable is represented as a JSON string that begins with "$".
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"$" [](start = 99, length = 3)

For code like this I might prefer `$` to "$". #Closed


## Example of a JSON entry point manifest object, and the respective entry point graph node
Let's consider the following manifest snippet, describing an entry point _'CVSplit.Split'_:
```
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have in the other file been using the javascript type on these code blocks. This is probably a good practice to carry over to this file. #Closed

@TomFinley
Copy link
Contributor

TomFinley commented Jun 4, 2018

JSON Graph format

Maybe "Entry Points JSON Graph Format" might be a more unambiguous title. #Closed


Refers to: docs/code/GraphRunner.md:1 in 5da49a3. [](commit_id = 5da49a3, deletion_comment = False)

## Input and output types
The following types are supported in JSON graphs:

- _string_. Represented as a JSON string, maps to a C# string.
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string [](start = 55, length = 6)

Should these types when listed here be listed as string vs. plain old string, since we are using C# keywords to describe them? (E.g.: string, float, double, bool, enum, int, long, etc.) This comment would not apply to things that are actually meant to be interpreted as prose descriptions of the type, e.g., "array." #WontFix


## Variables
The following input/output types can not be represented as a JSON value:
- _DataView_
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataView [](start = 2, length = 10)

Is this usage intentional? There is no DataView, but there is an IDataView. Similar for file handles, the models, etc. #Closed

Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if these are meant to be actual types, should they not be in ` backticks, since they're meant to be interpreted as code?


In reply to: 192868598 [](ancestors = 192868598)

"src"
],
"Required": false,
"SortOrder": 150.0,
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SortOrder": 150.0, [](start = 18, length = 19)

These are kind of poor examples... SortOrder is identical between the two properties here, and in the enclosing scope they are also identical with sort order of 1. :) #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving the 150 sort order intact, since it seems to be the de fact (not sure if intentional, though) default for advanced properties.

Updating the transform used for the example to a better one.


In reply to: 192868923 [](ancestors = 192868923)

This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.Net classes that help construct an entry point
graph.

## `EntryPoint manifest - the definition of an entry point`
Copy link
Contributor

@TomFinley TomFinley Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EntryPoint manifest - the definition of an entry point [](start = 4, length = 54)

This was put in code formatting. Was that intentional? #Closed

@TomFinley
Copy link
Contributor

TomFinley commented Jun 4, 2018

Overview

The header structure of this document is interesting. There is one top level header # Overview, but that seems to comprise the entire document, with the remainder of the document being ## headers.. (Which is odd, since we'd expect an overview to be a summary rather than the entire document.) Was the intention that there be another top level header, somewhere? #Closed


Refers to: docs/code/EntryPoints.md:1 in 5da49a3. [](commit_id = 5da49a3, deletion_comment = False)

@sfilipi sfilipi requested a review from GalOshri June 5, 2018 17:24

## Overview

An 'entry point', is a representation of a ML.NET type in JSON format. Entry points are used to serialize and deserialize an ML.NET type in JSON.
Copy link
Contributor

@TomFinley TomFinley Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An 'entry point', is a representation of a ML.NET type in JSON format. [](start = 0, length = 70)

I am not entirely enthusiastic about that description. I think the primary reason why I don't like it is, I think the phrase ML.NET type is misleading, or at least vague. If I were asked what an ML.NET type is I might say something like VBuffer or IDataView, and to me a representation as JSON makes me think that thing is being serialized, which is not the point of entry-points at all.

So maybe, we could replace a lot of this language with something like this (I don't insist on this exact wording):

Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. Etc. Etc. #Closed


An 'entry point', is a representation of a ML.NET type in JSON format. Entry points are used to serialize and deserialize an ML.NET type in JSON.
It is also the recommended way to interface with other languages.
Defined based on entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes.
Copy link
Contributor

@TomFinley TomFinley Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined based on entry points, experiments are entry points DAGs, and respectively...

Could this be rephrased? I'm not quite sure what it is mean to express.

Experiments #Closed


An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is:

```javascript
Copy link
Contributor

@TomFinley TomFinley Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it is actually written out, but I wonder if we could just format it a bit to make it a bit more tolerable. The document is dominated by this ~180 line monstrosity. I think it could be improved significantly by just deleting a bunch of whitespace... so for example if the stuff from lines 40 through 65, we could make it look more like this to save a bunch of lines.

"Values": ["I1", "U1", "I2", "U2", "I4", "U4", "I8", "U8",
    "R4", "Num", R8", "TX", "Text", "TXT", "BL", "Bool",
    "TimeSpan", "TS", "DT", DateTime", "DZ", "DateTimeZone",
    "UG", "U16"]

Basically I suppose I'd say if it looked more like someone actually wrote it vs. code-generated it would be a lot easier to appreciate and comprehend. I think we can get it to all fit on one page. Sometimes more lengthy cannot be helped, but in general and especially for the first example, I think it's important that it fit on one page. #Closed

"Name": "ResultType",
"Type": {
"Kind": "Enum",
"Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"]
Copy link
Contributor

@TomFinley TomFinley Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"] [](start = 32, length = 156)

Having judicious linebreaks is fine, just that one per element was a bit much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I condensed all the '[ ] ' to be on the same line as the element. Most of the arrays contain one element.

Keeping this in-line as well for consistency and to keep the graph shorter. I'll fix the spacing before/after the '['']'to be consistent.


In reply to: 194775271 [](ancestors = 194775271)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comprehensible document understandable by its reader is the goal. Syntactic "consistency" isn't a goal. One way this can be incomprehensible is for it to be so long that the reader gets lost in the weeds, as was the case previously (and, frankly, is still the case). The other way it can be incomprehensible is to put everything on one line so structure can't be appreciated.

Think of it in these terms. If you were personally writing out this yourself, I doubt you would structure code in this way.


In reply to: 194776883 [](ancestors = 194776883,194775271)


Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components.
Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON.
The recommended way of interacting with ML.NET through other programming languages is by composing, and exchanging pipeline or experiment graphs.
Copy link
Contributor

@TomFinley TomFinley Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

through other programming languages [](start = 47, length = 35)

Specifically, non-.NET programming languages. #Closed

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sfilipi ! Still not wild about the example, but that's OK. If that's the most confusing thing people find about entry-points we ought to consider ourselves lucky I guess. :)


## EntryPoint manifest - the definition of an entry point

An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is:
Copy link
Contributor

@TomFinley TomFinley Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MissingValueIndicator [](start = 67, length = 21)

Consider using code formatting for class names. #Resolved

@shauheen
Copy link
Contributor

shauheen commented Jun 13, 2018

Is this PR related to #160 ? #Resolved

@sfilipi
Copy link
Member Author

sfilipi commented Jun 14, 2018

Addresses part of it. I keep logging bugs about Entry Points, need to give everybody context about what they are.


In reply to: 397095572 [](ancestors = 397095572)


## Overview

Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components.
Copy link
Contributor

@GalOshri GalOshri Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we choose either "entry-points" or "entry points"? #Resolved

Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON.
The recommended way of interacting with ML.NET through other, non-.NET programming languages, is by composing, and exchanging pipeline or experiment graphs.

Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs.
Copy link
Contributor

@GalOshri GalOshri Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo on "documentation" #Resolved

Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON.
The recommended way of interacting with ML.NET through other, non-.NET programming languages, is by composing, and exchanging pipeline or experiment graphs.

Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs.
Copy link
Contributor

@GalOshri GalOshri Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo on "experiment" #Resolved


Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs.
The graph 'variables', the various values of the experiment graph JSON properties serve to describe the relationship between the entry point nodes.
The 'variables' are therefore the edges of the DAG.
Copy link
Contributor

@GalOshri GalOshri Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce the acronym "directed acyclic graph" #Resolved


## EntryPoint manifest - the definition of an entry point

An example of an entry point manifest object, specifically for the `MissingValueIndicator` transform, is:
Copy link
Contributor

@GalOshri GalOshri Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example is not the MissingValueIndicator transform #Resolved

- If the variable is present only in _inputs_, but never in _outputs_, it is a _graph input_. All graph inputs must be provided before
a graph can be run.
- The variable has a type, which is the type of inputs (and, optionally, output) that it appears in. If the type of the variable is
ambiguous, TLC throws an exception.
Copy link
Contributor

@GalOshri GalOshri Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to ML.NET? #Resolved

@GalOshri
Copy link
Contributor

GalOshri commented Jun 18, 2018

Thank you for creating this! Would it be useful to include a bit more information on how to turn an ML.NET component into an entrypoint (the C# code that needs to be added) and how the manifest is created? #Resolved


## How to create an entry point for an existing ML.Net component

The steps to take, to create an entry point for an existing ML.Net component, are:
Copy link
Contributor

@GalOshri GalOshri Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ML.Net" -> ML.NET (also in the header) #Resolved

parameter.
parameter.

## How to create an entry point for an existing ML.Net component
Copy link
Contributor

@GalOshri GalOshri Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth linking to an example somewhere in the code. #Resolved

Copy link
Contributor

@shauheen shauheen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit ecc6857 into dotnet:master Jun 21, 2018
@sfilipi sfilipi deleted the entrypointdoc branch July 5, 2018 18:18
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants