-
-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct SVM Use #346
Correct SVM Use #346
Conversation
CLA Assistant Lite bot: I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @LouisAUTHIE thanks great work
I left a few comment and questions
src/AnomalyDetectors/OneClassSVM.php
Outdated
$this->model = $this->svm->train($dataset->samples()); | ||
$data = []; | ||
|
||
foreach ($dataset->samples() as $i => $sample) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't need this $i
variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are completely right, this is now corrected !
src/AnomalyDetectors/OneClassSVM.php
Outdated
$data = []; | ||
|
||
foreach ($dataset->samples() as $i => $sample) { | ||
$data[] = array_merge([1], $sample); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder which is faster ...
array_merge([1], $sample)
or
array_unshift(1, $sample)
From what I recall, unshift is linear because it needs to reindex the array or something. I think merge is also linear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change is done ;)
$sampleWithOffset[$key + 1] = $value; | ||
} | ||
|
||
return $this->model->predict($sampleWithOffset) == 1 ? 0 : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice we are "inversing" the logic here i.e. 1 is now 0, 0 is now 1. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in fact in the one class mode of libsvm, the "normal" samples are to be labelled with the 1 class. And the anomalies, are to be labelled with -1. That's why !
@@ -185,8 +185,10 @@ public function train(Dataset $dataset) : void | |||
|
|||
$data = []; | |||
|
|||
foreach ($dataset->samples() as $i => $sample) { | |||
$data[] = array_merge([1], $sample); | |||
foreach ($dataset->samples() as $sample) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So did array_unshift() turn out to be faster?
Is it necessary to assign the sample to an intermediate variable or would
foreach ($dataset->samples() as $sample) {
array_unshift($sample, 1);
$data[] = $sample;
}
work here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is faster yes. And the rest works also, please see the next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome, thank you @LouisAUTHIE
We'll get this deployed in a bugfix release ASAP
Correcting the use of SVM for one class anomaly detection