Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] PyODAdapter only returns decision_scores_ of train-set #1837

Closed
roadrunner-gs opened this issue Jul 22, 2024 · 2 comments · Fixed by #1932
Closed

[BUG] PyODAdapter only returns decision_scores_ of train-set #1837

roadrunner-gs opened this issue Jul 22, 2024 · 2 comments · Fixed by #1932
Assignees
Labels
anomaly detection Anomaly detection package bug Something isn't working interfacing algorithms Interfacing existing algorithms/estimators for other packages

Comments

@roadrunner-gs
Copy link

Describe the bug

The PyODAdapter currently does not support predict() on test-data, only decision_scores_ on data classifier was fitted on is available.
https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.lof
(...)
decision_scores_numpy array of shape (n_samples,)
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
(...)
I would expect to fit on train-data and predict on test-data.
Furthermore fit_predict of PyOD is deprecated and therefore should not be used by an adapter as to not elicit unexpected behaviour for persons versed with the underlying PyOD.

Output:

Steps/Code to reproduce the bug

import numpy as np
import warnings
from pyod.models.lof import LOF  
from aeon.anomaly_detection import PyODAdapter
from aeon.utils.windowing import reverse_windowing

warnings.simplefilter('ignore')

def sliding_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

X = np.asarray([0, 0, 0, 0, 0, 0, 1, 0, 0, 0])
Y = np.asarray([0, 0, 0, 0, 0, 1, 0, 0, 0, 0])
X_win = sliding_window(X, 2)
Y_win = sliding_window(Y, 2)
print("train:", X)
print("test :", Y)

detector = PyODAdapter(LOF(), window_size=2)
print("LOF via PyODAdapter")
print(detector.fit_predict(X, axis=0))

detector.fit(X)
print("predicting on test via PyODAdapter")
print(detector.predict(Y))

print("LOF via PyOD")
clf = LOF()
clf.fit(X_win)
print(reverse_windowing(clf.decision_scores_, 2, np.nanmean, 1, 2))
print("decision_function on test via PyOD")
print(reverse_windowing(clf.decision_function(Y_win), 2, np.nanmean, 1, 2))

Expected results

Ability to use test-sets and getting scores for the test-set returned, see below for comparison.

Actual results

$ python pyod_test.py
<frozen importlib._bootstrap>:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
/home/roadrunner/miniconda3/envs/py3k/lib/python3.9/site-packages/aeon/base/__init__.py:24: FutureWarning: The aeon package will soon be releasing v1.0.0 with the removal of legacy modules and interfaces such as BaseTransformer and BaseForecaster. This will contain breaking changes. See aeon-toolkit.org for more information. Set aeon.AEON_DEPRECATION_WARNING or the AEON_DEPRECATION_WARNING environmental variable to 'False' to disable this warning.
  warnings.warn(
train: [0 0 0 0 0 0 1 0 0 0]
test : [0 0 0 0 0 1 0 0 0 0]
LOF via PyODAdapter
[1.01230696 1.01230696 1.01230696 1.01230696 1.01230696 0.98562678
 0.95894661 0.98562678 1.01230696 1.01230696]
predicting on test via PyODAdapter
[1.01230696 1.01230696 1.01230696 1.01230696 0.98562678 0.95894661
 0.98562678 1.01230696 1.01230696 1.01230696]
LOF via PyOD
[1.01230696 1.01230696 1.01230696 1.01230696 1.01230696 0.98562678
 0.95894661 0.98562678 1.01230696 1.01230696]
decision_function on test via PyOD
[0.95894661 0.95894661 0.95894661 0.95894661 0.95894661 0.95894661
 0.95894661 0.95894661 0.95894661 0.95894661]

Versions

No response

@roadrunner-gs roadrunner-gs added the bug Something isn't working label Jul 22, 2024
@SebastianSchmidl SebastianSchmidl added interfacing algorithms Interfacing existing algorithms/estimators for other packages anomaly detection Anomaly detection package labels Jul 24, 2024
@SebastianSchmidl SebastianSchmidl self-assigned this Jul 24, 2024
@SebastianSchmidl
Copy link
Member

Thank you for your issue. We are currently discussing this in the team; will let you know of the result.

@SebastianSchmidl
Copy link
Member

We decided to change the PyODAdapter to be unsupervised and semi-supervised at the same time, meaning it supports both conventions:

  • unsupervised: fit_predict(X) on the same input data X
  • semi-supervised:
    • fit(X_train, y) to build the normal behavior model on some data X_train, ignoring y
    • predict(X_target) to get the anomaly scores on different data X_target

For the semi-supervised case, most PyOD models can actually deal with somewhat dirty data (non-annotated). So, it does not fit the definition fully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
anomaly detection Anomaly detection package bug Something isn't working interfacing algorithms Interfacing existing algorithms/estimators for other packages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants