Add Seed property to MLContext and use as default for data splits#4775
Add Seed property to MLContext and use as default for data splits#4775najeeb-kazmi merged 8 commits intodotnet:masterfrom
Conversation
Should this just use Refers to: src/Microsoft.ML.Core/Environment/HostEnvironmentBase.cs:394 in 9bbccaf. [](commit_id = 9bbccaf, deletion_comment = False) |
Should Refers to: src/Microsoft.ML.Core/Environment/HostEnvironmentBase.cs:136 in 9bbccaf. [](commit_id = 9bbccaf, deletion_comment = False) |
mstfbl
left a comment
There was a problem hiding this comment.
The CI builds are giving the following errors:
2020-02-04T02:22:45.6483640Z /__w/1/s/packages/microsoft.dotnet.apicompat/1.0.0-beta.19225.5/build/Microsoft.DotNet.ApiCompat.targets(72,5): error : InterfacesShouldHaveSameMembers : Interface member 'Microsoft.ML.Runtime.IHostEnvironment.Seed' is present in the implementation but not in the contract. [/__w/1/s/src/Microsoft.ML.Core/Microsoft.ML.Core.csproj]
2020-02-04T02:22:45.6484602Z /__w/1/s/packages/microsoft.dotnet.apicompat/1.0.0-beta.19225.5/build/Microsoft.DotNet.ApiCompat.targets(72,5): error : InterfacesShouldHaveSameMembers : Interface member 'Microsoft.ML.Runtime.IHostEnvironment.Seed.get()' is present in the implementation but not in the contract. [/__w/1/s/src/Microsoft.ML.Core/Microsoft.ML.Core.csproj]
Perhaps you are forgetting to declare int Seed and Seed's get() function somewhere?
This is because you are adding a member to an interface, which is a breaking change. Interfaces, once shipped, can't be changed. #Resolved |
| var origStratCol = samplingKeyColumn; | ||
| samplingKeyColumn = data.Schema.GetTempColumnName(samplingKeyColumn); | ||
| var columnOptions = new HashingEstimator.ColumnOptionsInternal(samplingKeyColumn, origStratCol, 30, (uint)(seed ?? host.Rand.Next())); | ||
| var columnOptions = new HashingEstimator.ColumnOptionsInternal(samplingKeyColumn, origStratCol, 30, (uint)(seed ?? env.Seed)); |
There was a problem hiding this comment.
env.Seed [](start = 135, length = 8)
This can also be null, can't it? #Resolved
There was a problem hiding this comment.
Yes, if env.Seed is also null, it will fall back to the HashingEstimator default seed:
machinelearning/src/Microsoft.ML.Data/Transforms/Hashing.cs
Lines 1129 to 1132 in 46e7dc6
There was a problem hiding this comment.
But you are casting it to uint here. If it's null it will throw an exception.
In reply to: 375521058 [](ancestors = 375521058)
There was a problem hiding this comment.
Ah yes, you're right. Thanks for catching that @yaeldekel. I've changed the handling back to how it was before I added (uint)(seed ?? host.Rand.Next()) and added a case to handle seed from the environment. Order of precedence for seed now is (1) user specified to TrainTestSplit/CrossValidata, (2) MLContext seed, (3) default seed for HashingEstimator #Resolved
Fixes #4752
Addresses leftover feedback from #4764