Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Issue 1641 - Matrix profile-based anomaly detectors: left STAMPi #2091

Merged

Conversation

ferewi
Copy link
Contributor

@ferewi ferewi commented Sep 24, 2024

Reference Issues/PRs

Fixes #1641

What does this implement/fix? Explain your changes.

This PR implements the LeftSTAMPi Anomaly Detector based on the implementation in TimeEval (https://github.com/TimeEval/TimeEval-algorithms/blob/main/left_stampi/algorithm.py)

The Algorithm can be run in two modes.

  1. Batch mode: Here the whole time series is put in at once. Internally the LeftSTAMPi algorithm is applied incrementally.
  2. Stream mode: Here the Algorithm is initialized on a specified number of data points. The Matrix profile is then calculated incrementally.

Remarks:
The batch mode is implemented in the fit_predict method.
The stream mode is implemented in the fit and predictmethods, where fitis used to init the algorithm and predict is used for imcremental updates. As the predictmethod only accepts np.ndarray as its argument, on every update the new data point, which is a scalar has to be wrapped in a one-element numpy array. This is actually unnecessarry but avoiding this would involve a change to the interface in BaseAnomalyDetector.

Does your contribution introduce a new dependency? If yes, which one?

No.

Any other comments?

See remarks in the implementation description.

PR checklist

For all contributions
  • I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you.
    @all-contributors please add @ferewi for code, doc and test
  • The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.
For new estimators and functions
  • I've added the estimator to the online API documentation.
  • (OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.
For developers with write access
  • (OPTIONAL) I've updated aeon's CODEOWNERS to receive notifications about future changes to these files.

@aeon-actions-bot aeon-actions-bot bot added anomaly detection Anomaly detection package enhancement New feature, improvement request or other non-bug code enhancement labels Sep 24, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#6F6E8D}{\textsf{anomaly detection}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Push an empty commit to re-run CI checks

Copy link
Member

@SebastianSchmidl SebastianSchmidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

I'm in favor of supporting streaming use cases, and I also like the idea of repeatedly calling predict. However, this violates the convention that no internal representation is updated in predict. We can neither use repeated calls to fit because it is assumed to reset the estimator on the beginning of each call.

I guess, we need to design a new API for streaming. @MatthewMiddlehurst how do you think about this?

Until we have decided on the new API, I would suggest adding leftSTAMPi only with its batch API.

aeon/anomaly_detection/_left_stampi.py Outdated Show resolved Hide resolved
aeon/anomaly_detection/_left_stampi.py Outdated Show resolved Hide resolved
aeon/anomaly_detection/_left_stampi.py Outdated Show resolved Hide resolved
aeon/anomaly_detection/_left_stampi.py Outdated Show resolved Hide resolved
aeon/anomaly_detection/_left_stampi.py Show resolved Hide resolved
aeon/anomaly_detection/_left_stampi.py Show resolved Hide resolved
aeon/anomaly_detection/tests/test_left_stampi.py Outdated Show resolved Hide resolved
@ferewi
Copy link
Contributor Author

ferewi commented Sep 24, 2024

@CodeLionX Thanks for your comments and suggestions. The failing test-suite made me aware that the approach I took for the streaming case is violating the concept of fit, predictand fit_predict. I'll remove the streaming case for now. An idea for the streaming API might be to have an update method that is allowed to modify the internal representation after the intitial fitting.

Also, could you point me to the documentation of the sklearn and aeon conventions regarding the naming conventions (self.mp_vs self.mp), etc.? (If you have that at hand - I surely can google that myself)

…ter a decision about the streaming API has been made.
@ferewi
Copy link
Contributor Author

ferewi commented Sep 25, 2024

Still making changes to fix the failling tests. I will re-request a new review when I am done.

@SebastianSchmidl
Copy link
Member

SebastianSchmidl commented Sep 25, 2024

Also, could you point me to the documentation of the sklearn and aeon conventions regarding the naming conventions (self.mp_vs self.mp), etc.? (If you have that at hand - I surely can google that myself)

It is sprinkled in this guide: https://scikit-learn.org/dev/developers/develop.html

E.g.

Also it is expected that parameters with trailing _ are not to be set inside the __init__ method. All and only the public attributes set by fit have a trailing _. As a result the existence of parameters with trailing _ is used to check if the estimator has been fitted.

is in Parameters and Init-section

@MatthewMiddlehurst
Copy link
Member

Thanks for the contribution. Feel free to ask if you have any questions regarding the failures, i.e. we have a tag for estimators which can't be pickled (usually due to dependencies outside of our control).

@MatthewMiddlehurst
Copy link
Member

Think adding to the base API should be a separate PR yeah, would need to see a proposal but if update or a similar method would work that sounds fine.

@SebastianSchmidl
Copy link
Member

Think adding to the base API should be a separate PR yeah, would need to see a proposal but if update or a similar method would work that sounds fine.

Another question is whether we actually want to introduce streaming algorithms in aeon. Are there already other streaming/online estimators? The current architecture is tailored to the batch-case. Maybe this is a topic for the next dev-meeting.

@ferewi
Copy link
Contributor Author

ferewi commented Sep 25, 2024

Then maybe you discuss this in your next dev meeting and if you decide that you want to integrat a streaming api, I am happy to put in a proposal and another PR to implement this for the LeftSTUMPi algorithm.

@ferewi
Copy link
Contributor Author

ferewi commented Sep 25, 2024

Thanks for the contribution. Feel free to ask if you have any questions regarding the failures, i.e. we have a tag for estimators which can't be pickled (usually due to dependencies outside of our control).

I guess I figured it out. I tagged the class with 'cant-pickle' and the tests are green now.

Copy link
Member

@SebastianSchmidl SebastianSchmidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, looks good 👍🏼

aeon/anomaly_detection/_left_stampi.py Outdated Show resolved Hide resolved
@ferewi
Copy link
Contributor Author

ferewi commented Sep 27, 2024

@CodeLionX How is the process after the PR is approved? Do I have to do something or is this simply integrated at some point?

@MatthewMiddlehurst
Copy link
Member

Don't have to do anything, can be merged at any point really. Usually give it some time for other comments though. Not really a massive rush until it's close to release time 🙂

@SebastianSchmidl
Copy link
Member

SebastianSchmidl commented Sep 27, 2024

We will wait a bit, so that other maintainers / core devs get the chance to object; otherwise, I'll merge it in later.

EDIT: Matthew was faster 🤷🏼

@SebastianSchmidl
Copy link
Member

SebastianSchmidl commented Sep 27, 2024

@ferewi regarding the online-API, we decided in our dev-meeting that aeon will not support this in the near future. If you have a use case for streaming/online anomaly detection, we can, of course, talk about this again.

Matthew will create an issue to track this.

@TonyBagnall
Copy link
Contributor

Fantastic, thanks for this

@TonyBagnall TonyBagnall merged commit 5fadd1c into aeon-toolkit:main Sep 27, 2024
14 checks passed
@ferewi
Copy link
Contributor Author

ferewi commented Sep 27, 2024

Cool - I was just interested in how the usual process is :)

Thank you for your help @CodeLionX @MatthewMiddlehurst :)

@MatthewMiddlehurst
Copy link
Member

@all-contributors add @ferewi for code

This may be duplicated, dont remember if we did this 🙂

Copy link
Contributor

@MatthewMiddlehurst

I've put up a pull request to add @ferewi! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
anomaly detection Anomaly detection package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Matrix profile-based anomaly detectors: left STAMPi
4 participants