Conversation
| } | ||
|
|
||
| /// <summary> | ||
| /// |
There was a problem hiding this comment.
Are you going to fill these docs out? #Resolved
| public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, LdSvmTrainer.Options options) | ||
| => new LdSvmTrainer(catalog.GetEnvironment(), options); | ||
|
|
||
| public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, |
There was a problem hiding this comment.
This public method should have docs too. #Resolved
| => new LdSvmTrainer(catalog.GetEnvironment(), options); | ||
|
|
||
| public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, | ||
| string labelColumnName = DefaultColumnNames.Label, |
There was a problem hiding this comment.
Why aren't you using labelColumnName, featureColumnName, and exampleWeightColumnName in the method body? #Resolved
| IValueMapper, | ||
| ICanSaveModel | ||
| { | ||
| public const string LoaderSignature = "LDSVMBinaryPredictor"; |
There was a problem hiding this comment.
Does this need to be public? #Resolved
| // badly written and the first thing is it just grabs all instances. If this | ||
| // ever changes, do review this return value to make sure it is still | ||
| // appropriate. | ||
| private static TrainerInfo _info = new TrainerInfo(calibration: true, caching: false); |
There was a problem hiding this comment.
should be private readonly static ? #Resolved
| } | ||
|
|
||
| // REVIEW: This does not need caching, but only because it's very | ||
| // badly written and the first thing is it just grabs all instances. If this |
There was a problem hiding this comment.
badly written and the first thing is it just grabs all instances
Does this mean this trainer can't stream data, but instead it needs to load all the data into memory before it can train on it? Is that acceptable?
There was a problem hiding this comment.
When looking thru the code, it seemed like a sizable amount of work to bring this to a streaming mode. I didn't see an obvious method to be performant and streaming -- seems like a lots of passes over the dataset. I very well may be missing the obvious though.
There was a problem hiding this comment.
In the most recent iteration I added two code paths: one does streaming and one does caching. From running some experiments, the streaming version is around 2x slower. Could you take a look and let me know whether it's worth keeping both?
In reply to: 313606773 [](ancestors = 313606773)
There was a problem hiding this comment.
Is the streaming version 2x slower even in the case of large data? If there is a real tradeoff for large data sizes, then it is worth keeping both.
In reply to: 368572412 [](ancestors = 368572412,313606773)
| [Argument(ArgumentType.AtMostOnce, HelpText = "No bias", ShortName = "noBias")] | ||
| [TlcModule.SweepableDiscreteParam("NoBias", null, isBool: true)] | ||
| public bool NoBias = Defaults.NoBias; | ||
|
|
There was a problem hiding this comment.
There seems to be a double negative in naming here. The NoBias term defaults to false. Would it be better to name this variable UseBias and default it to true? #Resolved
Codecov Report
@@ Coverage Diff @@
## master #4060 +/- ##
=========================================
Coverage ? 75.91%
=========================================
Files ? 953
Lines ? 172850
Branches ? 18663
=========================================
Hits ? 131221
Misses ? 36444
Partials ? 5185
|
Add an LDSVM trainer.
Fixes #4031 .
I will add documentation and samples in the next iterations.