Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct SVM Use #346

Merged
merged 3 commits into from
Nov 2, 2024
Merged

Conversation

LouisAUTHIE
Copy link
Contributor

Correcting the use of SVM for one class anomaly detection

Copy link
Contributor

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

Copy link
Member

@andrewdalpino andrewdalpino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @LouisAUTHIE thanks great work

I left a few comment and questions

$this->model = $this->svm->train($dataset->samples());
$data = [];

foreach ($dataset->samples() as $i => $sample) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need this $i variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right, this is now corrected !

$data = [];

foreach ($dataset->samples() as $i => $sample) {
$data[] = array_merge([1], $sample);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder which is faster ...

array_merge([1], $sample)

or

array_unshift(1, $sample)

From what I recall, unshift is linear because it needs to reindex the array or something. I think merge is also linear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is done ;)

$sampleWithOffset[$key + 1] = $value;
}

return $this->model->predict($sampleWithOffset) == 1 ? 0 : 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice we are "inversing" the logic here i.e. 1 is now 0, 0 is now 1. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in fact in the one class mode of libsvm, the "normal" samples are to be labelled with the 1 class. And the anomalies, are to be labelled with -1. That's why !

@@ -185,8 +185,10 @@ public function train(Dataset $dataset) : void

$data = [];

foreach ($dataset->samples() as $i => $sample) {
$data[] = array_merge([1], $sample);
foreach ($dataset->samples() as $sample) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So did array_unshift() turn out to be faster?

Is it necessary to assign the sample to an intermediate variable or would

foreach ($dataset->samples() as $sample) {
    array_unshift($sample, 1);

    $data[] = $sample;
}

work here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is faster yes. And the rest works also, please see the next commit

Copy link
Member

@andrewdalpino andrewdalpino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thank you @LouisAUTHIE

We'll get this deployed in a bugfix release ASAP

@andrewdalpino andrewdalpino merged commit 715b0e7 into RubixML:master Nov 2, 2024
0 of 13 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants