-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Adding documentation about entry points, and entry points graphs: EntryPoints.md and GraphRunner.md #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
docs/code/EntryPoints.md
Outdated
| @@ -0,0 +1,188 @@ | |||
| # Overview | |||
|
|
|||
| An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML.Net [](start = 43, length = 6)
I think the branding is not the lower case ML.Net but ML.NET. #Closed
docs/code/EntryPoints.md
Outdated
| @@ -0,0 +1,188 @@ | |||
| # Overview | |||
|
|
|||
| An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json [](start = 58, length = 4)
JSON is typically capitalized. #Closed
docs/code/EntryPoints.md
Outdated
| @@ -0,0 +1,188 @@ | |||
| # Overview | |||
|
|
|||
| An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it [](start = 74, length = 2)
Ambiguous, when we say "it" what are we referring to? Entry-points? JSON? An ML.NET type? #Closed
docs/code/EntryPoints.md
Outdated
|
|
||
| An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON. | ||
| It is also one of the ways ML.Net uses to deserialize experiments, and the recommended way to interface with other languages. | ||
| In terms defining experiments w.r.t entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w.r.t [](start = 30, length = 5)
If we want to use this initialism it would be "w.r.t." not "w.r.t". #Closed
docs/code/EntryPoints.md
Outdated
| "OutputData": "$Output_1528136517433", | ||
| "Model": "$TransformModel_1528136517433" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be consistent with usage of spaces vs. tabs above. Prefer spaces. #Closed
docs/code/GraphRunner.md
Outdated
| @@ -0,0 +1,123 @@ | |||
| # JSON Graph format | |||
|
|
|||
| The entry point graph in TLC is an array of _nodes_. Each node is an object with the following fields: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entry point [](start = 4, length = 11)
This might be a good place to have a link to EntryPoints.md. #Closed
| - _array_ of the above. Represented as a JSON array, maps to a C# array. | ||
| - _dictionary_. Currently not implemented. Represented as a JSON object, maps to a C# `Dictionary<string,T>`. | ||
| - _component_. Currently not implemented. Represented as a JSON object with 2 fields: _name_:string and _settings_:object. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this information current? I thought I saw some support for these. Certainly components are supported (not as SubComponent type specifically, but we can use dependency injection through the component factories). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my edits to this file are not reflected. Fixing that. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected the component part. Double-checking on the dictionaries and indexing in arrays. I don't think we do that yet.
In reply to: 192866131 [](ancestors = 192866131)
docs/code/GraphRunner.md
Outdated
| - _TransformModel_ | ||
| - _PredictorModel_ | ||
|
|
||
| These must be passed as _variables_. The variable is represented as a JSON string that begins with "$". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"$" [](start = 99, length = 3)
For code like this I might prefer `$` to "$". #Closed
docs/code/GraphRunner.md
Outdated
|
|
||
| ## Example of a JSON entry point manifest object, and the respective entry point graph node | ||
| Let's consider the following manifest snippet, describing an entry point _'CVSplit.Split'_: | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have in the other file been using the javascript type on these code blocks. This is probably a good practice to carry over to this file. #Closed
docs/code/GraphRunner.md
Outdated
| ## Input and output types | ||
| The following types are supported in JSON graphs: | ||
|
|
||
| - _string_. Represented as a JSON string, maps to a C# string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string [](start = 55, length = 6)
Should these types when listed here be listed as string vs. plain old string, since we are using C# keywords to describe them? (E.g.: string, float, double, bool, enum, int, long, etc.) This comment would not apply to things that are actually meant to be interpreted as prose descriptions of the type, e.g., "array." #WontFix
docs/code/GraphRunner.md
Outdated
|
|
||
| ## Variables | ||
| The following input/output types can not be represented as a JSON value: | ||
| - _DataView_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataView [](start = 2, length = 10)
Is this usage intentional? There is no DataView, but there is an IDataView. Similar for file handles, the models, etc. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if these are meant to be actual types, should they not be in ` backticks, since they're meant to be interpreted as code?
In reply to: 192868598 [](ancestors = 192868598)
docs/code/EntryPoints.md
Outdated
| "src" | ||
| ], | ||
| "Required": false, | ||
| "SortOrder": 150.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SortOrder": 150.0, [](start = 18, length = 19)
These are kind of poor examples... SortOrder is identical between the two properties here, and in the enclosing scope they are also identical with sort order of 1. :) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving the 150 sort order intact, since it seems to be the de fact (not sure if intentional, though) default for advanced properties.
Updating the transform used for the example to a better one.
In reply to: 192868923 [](ancestors = 192868923)
docs/code/EntryPoints.md
Outdated
| This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.Net classes that help construct an entry point | ||
| graph. | ||
|
|
||
| ## `EntryPoint manifest - the definition of an entry point` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EntryPoint manifest - the definition of an entry point [](start = 4, length = 54)
This was put in code formatting. Was that intentional? #Closed
The header structure of this document is interesting. There is one top level header Refers to: docs/code/EntryPoints.md:1 in 5da49a3. [](commit_id = 5da49a3, deletion_comment = False) |
docs/code/EntryPoints.md
Outdated
|
|
||
| ## Overview | ||
|
|
||
| An 'entry point', is a representation of a ML.NET type in JSON format. Entry points are used to serialize and deserialize an ML.NET type in JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An 'entry point', is a representation of a ML.NET type in JSON format. [](start = 0, length = 70)
I am not entirely enthusiastic about that description. I think the primary reason why I don't like it is, I think the phrase ML.NET type is misleading, or at least vague. If I were asked what an ML.NET type is I might say something like VBuffer or IDataView, and to me a representation as JSON makes me think that thing is being serialized, which is not the point of entry-points at all.
So maybe, we could replace a lot of this language with something like this (I don't insist on this exact wording):
Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. Etc. Etc. #Closed
docs/code/EntryPoints.md
Outdated
|
|
||
| An 'entry point', is a representation of a ML.NET type in JSON format. Entry points are used to serialize and deserialize an ML.NET type in JSON. | ||
| It is also the recommended way to interface with other languages. | ||
| Defined based on entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defined based on entry points, experiments are entry points DAGs, and respectively...
Could this be rephrased? I'm not quite sure what it is mean to express.
Experiments #Closed
|
|
||
| An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is: | ||
|
|
||
| ```javascript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it is actually written out, but I wonder if we could just format it a bit to make it a bit more tolerable. The document is dominated by this ~180 line monstrosity. I think it could be improved significantly by just deleting a bunch of whitespace... so for example if the stuff from lines 40 through 65, we could make it look more like this to save a bunch of lines.
"Values": ["I1", "U1", "I2", "U2", "I4", "U4", "I8", "U8",
"R4", "Num", R8", "TX", "Text", "TXT", "BL", "Bool",
"TimeSpan", "TS", "DT", DateTime", "DZ", "DateTimeZone",
"UG", "U16"]Basically I suppose I'd say if it looked more like someone actually wrote it vs. code-generated it would be a lot easier to appreciate and comprehend. I think we can get it to all fit on one page. Sometimes more lengthy cannot be helped, but in general and especially for the first example, I think it's important that it fit on one page. #Closed
docs/code/EntryPoints.md
Outdated
| "Name": "ResultType", | ||
| "Type": { | ||
| "Kind": "Enum", | ||
| "Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"] [](start = 32, length = 156)
Having judicious linebreaks is fine, just that one per element was a bit much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I condensed all the '[ ] ' to be on the same line as the element. Most of the arrays contain one element.
Keeping this in-line as well for consistency and to keep the graph shorter. I'll fix the spacing before/after the '['']'to be consistent.
In reply to: 194775271 [](ancestors = 194775271)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comprehensible document understandable by its reader is the goal. Syntactic "consistency" isn't a goal. One way this can be incomprehensible is for it to be so long that the reader gets lost in the weeds, as was the case previously (and, frankly, is still the case). The other way it can be incomprehensible is to put everything on one line so structure can't be appreciated.
Think of it in these terms. If you were personally writing out this yourself, I doubt you would structure code in this way.
In reply to: 194776883 [](ancestors = 194776883,194775271)
docs/code/EntryPoints.md
Outdated
|
|
||
| Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. | ||
| Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. | ||
| The recommended way of interacting with ML.NET through other programming languages is by composing, and exchanging pipeline or experiment graphs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
through other programming languages [](start = 47, length = 35)
Specifically, non-.NET programming languages. #Closed
TomFinley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sfilipi ! Still not wild about the example, but that's OK. If that's the most confusing thing people find about entry-points we ought to consider ourselves lucky I guess. :)
docs/code/EntryPoints.md
Outdated
|
|
||
| ## EntryPoint manifest - the definition of an entry point | ||
|
|
||
| An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MissingValueIndicator [](start = 67, length = 21)
Consider using code formatting for class names. #Resolved
|
Is this PR related to #160 ? #Resolved |
|
Addresses part of it. I keep logging bugs about Entry Points, need to give everybody context about what they are. In reply to: 397095572 [](ancestors = 397095572) |
docs/code/EntryPoints.md
Outdated
|
|
||
| ## Overview | ||
|
|
||
| Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we choose either "entry-points" or "entry points"? #Resolved
docs/code/EntryPoints.md
Outdated
| Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. | ||
| The recommended way of interacting with ML.NET through other, non-.NET programming languages, is by composing, and exchanging pipeline or experiment graphs. | ||
|
|
||
| Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo on "documentation" #Resolved
docs/code/EntryPoints.md
Outdated
| Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. | ||
| The recommended way of interacting with ML.NET through other, non-.NET programming languages, is by composing, and exchanging pipeline or experiment graphs. | ||
|
|
||
| Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo on "experiment" #Resolved
docs/code/EntryPoints.md
Outdated
|
|
||
| Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs. | ||
| The graph 'variables', the various values of the experiment graph JSON properties serve to describe the relationship between the entry point nodes. | ||
| The 'variables' are therefore the edges of the DAG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce the acronym "directed acyclic graph" #Resolved
docs/code/EntryPoints.md
Outdated
|
|
||
| ## EntryPoint manifest - the definition of an entry point | ||
|
|
||
| An example of an entry point manifest object, specifically for the `MissingValueIndicator` transform, is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the example is not the MissingValueIndicator transform #Resolved
docs/code/GraphRunner.md
Outdated
| - If the variable is present only in _inputs_, but never in _outputs_, it is a _graph input_. All graph inputs must be provided before | ||
| a graph can be run. | ||
| - The variable has a type, which is the type of inputs (and, optionally, output) that it appears in. If the type of the variable is | ||
| ambiguous, TLC throws an exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to ML.NET? #Resolved
|
Thank you for creating this! Would it be useful to include a bit more information on how to turn an ML.NET component into an entrypoint (the C# code that needs to be added) and how the manifest is created? #Resolved |
docs/code/EntryPoints.md
Outdated
|
|
||
| ## How to create an entry point for an existing ML.Net component | ||
|
|
||
| The steps to take, to create an entry point for an existing ML.Net component, are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ML.Net" -> ML.NET (also in the header) #Resolved
docs/code/EntryPoints.md
Outdated
| parameter. | ||
| parameter. | ||
|
|
||
| ## How to create an entry point for an existing ML.Net component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth linking to an example somewhere in the code. #Resolved
shauheen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Adding the EntryPoints.md and the GraphRunner.md files.
EntryPoints.md introduces the entry points, the entry points manifests and classes that are associated with them.
GraphRunner.md introduces and describes the entry points graph structure.
Addresses #390