-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Read and write binary file documentation #2811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
db837ca
4338862
57859ad
3d20a99
d5ca7c4
e28823e
aa28d83
5741c40
90b19f9
f1d1bf7
30c3ae0
d107706
ed2604e
7d41674
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,6 +41,7 @@ Please feel free to search this page and use any code that suits your needs. | |
| - [How do I train using cross-validation?](#how-do-i-train-using-cross-validation) | ||
| - [Can I mix and match static and dynamic pipelines?](#can-i-mix-and-match-static-and-dynamic-pipelines) | ||
| - [How can I define my own transformation of data?](#how-can-i-define-my-own-transformation-of-data) | ||
| - [How can I read and write binary data?](#how-can-i-read-and-write-binary-data) | ||
|
|
||
| ### General questions about the samples | ||
|
|
||
|
|
@@ -1022,3 +1023,49 @@ ITransformer loadedModel; | |
| using (var fs = File.OpenRead(modelPath)) | ||
| loadedModel = newContext.Model.Load(fs); | ||
| ``` | ||
|
|
||
| ## How can I read and write binary data? | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cookbook examples all have corresponding tests in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I must be missing something when I try to add the example. It's not finding the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rogancarr I switched to |
||
| Other than using text files, ML.NET will allow you to read and write binary data. This has a few advantages such as not having to specify a schema, can improve reading times, and are generally smaller than text files. | ||
|
|
||
| To write binary data you need some data to be able to save. Specifically you need an instance of an `IDavaView`. Below is a code snippet that uses the iris data as an example. | ||
|
|
||
| ```csharp | ||
| // Data model for the iris data | ||
| public class IrisData | ||
| { | ||
| public float Label { get; set; } | ||
| public float SepalLength { get; set; } | ||
| public float SepalWidth { get; set; } | ||
| public float PetalLength { get; set; } | ||
| public float PetalWidth { get; set; } | ||
| } | ||
|
|
||
| // An array of iris data points | ||
| var dataArray = new[] | ||
| { | ||
| new IrisData { Label=1, PetalLength=1, SepalLength=1, PetalWidth=1, SepalWidth=1 }, | ||
| new IrisData { Label=0, PetalLength=2, SepalLength=2, PetalWidth=2, SepalWidth=2 } | ||
| }; | ||
|
|
||
| // Create the ML.NET context. | ||
| var context = new MLContext(); | ||
|
|
||
| // Create the data view from an IEnumerable. | ||
| // This method will use the definition of IrisData to understand what columns there are | ||
| // in the data view. However, the objects in ML.NET are only "promises" of data since | ||
| // ML.NET operations are lazy. One way to get a look at the data is with Schema Comprehension. | ||
| // Refer to this document for more information - https://github.com/dotnet/machinelearning/blob/master/docs/code/SchemaComprehension.md | ||
| var data = context.Data.LoadFromEnumerable(dataArray); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that in the cookbook we tend to have high level explanation. Since we already discuss in other places how to create an Assume that we already have an If you see the section for saving and loading a model, it's pretty short and we don't go into how to generate the model in that section. |
||
|
|
||
| // Use a FileStream to create a file. Use the stream and the data view in the "SaveAsBinary" method. | ||
| using(var stream = new FileStream("./iris.idv", FileMode.Create)) | ||
| { | ||
| context.Data.SaveAsBinary(data, stream); | ||
| } | ||
| ``` | ||
|
|
||
| To read a binary file, simply use the `context.Data.ReadFromBinary` method and pass in the path of the binary file to read in. Notice that the schema of the data does not need to be defined here. | ||
|
|
||
| ```csharp | ||
| var data = context.Data.ReadFromBinary("./iris.idv"); | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| using System.Collections.Generic; | ||
| using System.IO; | ||
| using Microsoft.Data.DataView; | ||
| using Microsoft.ML.SamplesUtils; | ||
|
|
||
| namespace Microsoft.ML.Samples.Dynamic.DataOperations | ||
| { | ||
| public class SaveAndLoadFromBinary | ||
| { | ||
| public static void Example() | ||
| { | ||
| var mlContext = new MLContext(); | ||
|
|
||
| // Get a small dataset as an IEnumerable. | ||
| IEnumerable<DatasetUtils.SampleTemperatureData> enumerableOfData = DatasetUtils.GetSampleTemperatureData(5); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a |
||
|
|
||
| // Load dataset into an IDataView. | ||
| IDataView data = mlContext.Data.LoadFromEnumerable(enumerableOfData); | ||
|
|
||
| // Creating a FileStream object to create a file and use | ||
| // the stream to create a binary file. | ||
| using (var stream = new FileStream("./sample-temp-data.idv", FileMode.Create)) | ||
| { | ||
| mlContext.Data.SaveAsBinary(data, stream); | ||
| } | ||
|
|
||
| // Load a binary file by file path. | ||
| var binaryData = mlContext.Data.LoadFromBinary("./sample-temp-data.idv"); | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -506,6 +506,45 @@ public override Action<InputRow, OutputRow> GetMapping() | |
| } | ||
| } | ||
|
|
||
| public class IrisData | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You can use the |
||
| { | ||
| public float Label { get; set; } | ||
| public float SepalLength { get; set; } | ||
| public float SepalWidth { get; set; } | ||
| public float PetalLength { get; set; } | ||
| public float PetalWidth { get; set; } | ||
| } | ||
|
|
||
| [Fact] | ||
| public void ReadAndWriteBinaryData() => | ||
| BinaryData(); | ||
|
|
||
| private void BinaryData() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can collapse this together and just have the one function because it's not parameterized. |
||
| { | ||
| // An array of iris data points | ||
| var dataArray = new[] | ||
| { | ||
| new IrisData { Label=1, PetalLength=1, SepalLength=1, PetalWidth=1, SepalWidth=1 }, | ||
| new IrisData { Label=0, PetalLength=2, SepalLength=2, PetalWidth=2, SepalWidth=2 } | ||
| }; | ||
|
|
||
| // Create the ML.NET context. | ||
| var context = new MLContext(); | ||
|
|
||
| // Create the data view from an IEnumerable. | ||
| // This method will use the definition of IrisData to understand what columns there are | ||
| // in the data view. However, the objects in ML.NET are only "promises" of data since | ||
| // ML.NET operations are lazy. One way to get a look at the data is with Schema Comprehension. | ||
| // Refer to this document for more information - https://github.com/dotnet/machinelearning/blob/master/docs/code/SchemaComprehension.md | ||
| var data = context.Data.LoadFromEnumerable(dataArray); | ||
|
|
||
| // Use a FileStream to create a file. Use the stream and the data view in the "SaveAsBinary" method. | ||
| using (var stream = new FileStream("./iris.idv", FileMode.Create)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You can register the temp file for deletion using |
||
| { | ||
| context.Data.SaveAsBinary(data, stream); | ||
| } | ||
| } | ||
|
|
||
| private static void RunEndToEnd(MLContext mlContext, IDataView trainData, string modelPath) | ||
| { | ||
| // Construct the learning pipeline. Note that we are now providing a contract name for the custom mapping: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this make sense to be next to the "How do I load data from a text file?" question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree! :)
In reply to: 263541837 [](ancestors = 263541837)