[ENH] Vendor fracdiff library #6777

DinoBektesevic · 2024-07-14T23:08:41Z

Reference Issues/PRs

Resolves #6700 core issue of vendoring fracdiff package.

What does this implement/fix? Explain your changes.

The fracdiff package has been archived which reduced the usability of sktime for some users.

This PR is the initial step in vendoring the library as a libs component of sktime.
The intention at the time of initial PR is to have the tests pass with minimal changes to the original library, however, some changes were necessary:

docs were removed from the original library
examples were moved into the top-level examples directory
fracdiff tensor interface was removed
The FracDiffStat class was removed (depended on StatTester class)
StatTester class was removed, depended on statsmodels
- the FracDiffStat seems useful, but StatTester, I was told, should be re-implemented (or changed in FracDiffStat to some other tester of "stationarity" (code defaulted to "ADF") that already exists in sktime. I'm just not knowledgeable enough about what's happening here, but I can work it out if there's an example somewhere that shows me how to use that test and try to rewrite. YMMV.
tests related to StatTester and FracDiffStat were then also removed
The import (and structure) was altered
- the deprecation wrappers were removed
- the Fracdiff class is imported to top-level from sklearn because the tensor wrappers were removed
- the numpy fdiff and fdiff_coefffunctions are also now imported to top-level to avoid the a triple fdiff.fdiff.fdiff

Does your contribution introduce a new dependency? If yes, which one?

Not as is, potentially tesorflow or statsmodel, depending on further advice I get.

What should a reviewer concentrate their feedback on?

[edit: I just saw some of the discussion in the Issue relating to the first two points, whoops]

Is FracDiffStat useful, should it be added back in or is this mute because a different tester exists that is already usable as-is?
How should I wrap the statsmodel then back in?
Should examples be deleted?
I was told the FracDiff should be wrapped in a Differencer (or some such, or was it a "transformation") but I spent an hour driving home after the hack and now I can't remember what was the base class I was pointed to exactly. If someone can remind me of the class I should wrap the SKLearn FracDiff in that'd be swell...
This PR can be considered a work-in-progress, not the final state of the code, but a place for me to understand the requirements for merging

Did you add any tests for the change?

Tests are all as in original library, minus the removed ones.

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

review-notebook-app · 2024-07-14T23:08:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sktime/libs/README.md

fkiraly

Thanks a lot!

Great, 1:1 vendor distribution minus the parts dependent on statsmodels, as discussed. I will comment on the latter in the original issue, i.e., how I would design it. My preference would be to design a full replacement in sktime instead of keeping the equivalent class inside the vendored package.

What I would say, we merge this as-is, only minor requests:

the fracdiff.__init__.py should include a license and contributions preamble, see pykalman
could you kindly decorate all the test clsases with pytest.skipif depending on whether anything in the vendor lib has changed? Similar to, e.g., pykalman.tests.test_standard
I see the sklearn estimators' tests but am surprised to not see a check_estimator conformance test. More specifically, I would add a parametrize_with_checks decorated test to run the full suite of sklearn API conformance tests: https://scikit-learn.org/stable/modules/generated/sklearn.utils.estimator_checks.parametrize_with_checks.html#sklearn.utils.estimator_checks.parametrize_with_checks
- I would expect this to simply run, but if it fails, let's add it and xfail, and report as an issue that we have API non-conformance
- if it is too fiddly, we can also do this separately, then we should record this as an open issue that such tests need to be added
a minor typo in the README, see above

DinoBektesevic · 2024-07-18T17:06:24Z

Ok, I think this addresses all the pr comments.

The preamble existed, I just put it into the wrong init (whoops). I also left an empty directory in tests (issues/issue32.py used to be a test for the stat estimator). I think I understood how to write the check estimator test - but double checking would be good because I saw you do have some utils that are similarly named and I've not written tests like these before.

I'll leave the non-comformance failures here to copy into the issue you're asking for if this is allowed to merge:

FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_n_features_in] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_sample_order_invariance] - IndexError: index 19 is out of bounds for axis 0 with size 19
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_estimators_nan_inf] - AssertionError: Estimator Fracdiff doesn't check for NaN and inf in fit.
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_estimators_empty_data_messages] - AssertionError: The estimator Fracdiff does not raise a ValueError when an empty data is used to train. Perhaps use check_array in train.
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_dtype_object] - AssertionError: Did not raise: [<class 'TypeError'>]
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_subset_invariance] - AssertionError: 
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_data_not_an_array] - TypeError: Don't want to call array_function diff!
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_complex_data] - AssertionError: Did not raise: [<class 'ValueError'>]
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_fit1d] - AssertionError: Did not raise: [<class 'ValueError'>]
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general(readonly_memmap=True)] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general] - AssertionError

fkiraly · 2024-07-22T01:08:13Z

sktime/libs/fracdiff/__init__.py

+Unofficial fork of the ``fracdiff`` package, maintained in ``sktime``.
+
+sktime migration: 2024, July
+Version 0.9.1 release: 2023, Feb 4 (DinoBektesevic)


this is not correct, is it? It should be the last releast on pypi and account name.
Migration was done by DinoBektesevic

Uh oh, I cloned the current tagged release on GitHub: fracdiff/fracdiff@93ce110 which is marked as 0.9.1 - is this a problem?

fkiraly

Great, thanks! Some minor things:

code formatting is failing, could you run the pre-commits? https://www.sktime.net/en/latest/developer_guide/coding_standards.html
the sklearn estimator does not seem to pass the sklearn conformance tests. Though it seems like it could be easy to fix - by adding check_array at the start of fit.

DinoBektesevic · 2024-07-30T04:24:06Z

Hey, sorry it took me a while for this simple request. My week suddenly got really interesting. I added the check_array (not sure if that's the proper usage though, but I tried it with checked_array = check_array(X); ... fit(checked_array) and it failed with the same errors as without that:

I'm not sure what that does so I didn't fix it - better to say I didn't understand what the expected returns of the fracdiff are. There's less errors than before though!

FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general(readonly_memmap=True)] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_sample_order_invariance] - IndexError: index 19 is out of bounds for axis 0 with size 19
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_subset_invariance] - AssertionError: 
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_n_features_in] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_data_not_an_array] - TypeError: Don't want to call array_function diff!

DinoBektesevic · 2024-07-30T17:07:32Z

Hmm, strange, on my end all the tests get skipped by pre-commit let me try to see if I run it manually with --all-files the same thing happens.

Initial commit contians all of the code, as an attempt to have the tests pass with absolutely mininmal changes to the base library. This commit is intended as a reference for posterity. Future commits have the commented out code deleted.

DinoBektesevic · 2024-07-30T17:37:18Z

Ok, running the pre-commit manually pointed out to some files in examples dir, which I did indeed copy over when I started but never updated. I had to remove one of them because it was still using the FracdiffStats class and add some extra docstrings. I also refreshed and rebased on main. The tests now seem to pass:

check for added large files..............................................Passed
check for case conflicts.................................................Passed
check for merge conflicts................................................Passed
check for broken symlinks............................(no files to check)Skipped
check yaml...........................................(no files to check)Skipped
debug statements (python)................................................Passed
fix end of files.........................................................Passed
fix requirements.txt.................................(no files to check)Skipped
trim trailing whitespace.................................................Passed
ruff-format..............................................................Passed
ruff.....................................................................Passed
shellcheck...........................................(no files to check)Skipped

3437 passed, 17033 skipped, 9 xfailed, 34 xpassed, 227 warnings in 112.06s (0:01:52)

Should be good to go this time.

fkiraly · 2024-07-31T13:47:25Z

Great, thx!

Re sklearn failures, I will have a look. They seem to be genuine and probably have been part of the fracdiff library due to lack of testing.

DinoBektesevic requested review from achieveordie, benHeid, fkiraly and yarnabrina as code owners July 14, 2024 23:08

fkiraly reviewed Jul 15, 2024

View reviewed changes

sktime/libs/README.md Outdated Show resolved Hide resolved

fkiraly requested changes Jul 15, 2024

View reviewed changes

fkiraly reviewed Jul 22, 2024

View reviewed changes

fkiraly requested changes Jul 22, 2024

View reviewed changes

DinoBektesevic force-pushed the fracdiff/init branch from a4a3f98 to 6f6e7c9 Compare July 30, 2024 17:25

DinoBektesevic added 10 commits July 30, 2024 10:31

Vendoring FracDiff library.

f8d4fd6

Initial commit contians all of the code, as an attempt to have the tests pass with absolutely mininmal changes to the base library. This commit is intended as a reference for posterity. Future commits have the commented out code deleted.

Cleanup and remove comments, add credits.

8a33c95

Move examples into examples.

b05f5aa

Delete tests. Delete FracDiffStat and the StatTester.

0b36876

Add myself to contributors.

62ae36a

Move licence preamble to module level init.

14a48ec

Protect tests from execution if no changes are detected.

ef0a2d6

Remove extraneous test directory.

b7de870

Add sklearn estimator compatibility test.

4babec2

Add check-array check in fit method.

650693b

DinoBektesevic force-pushed the fracdiff/init branch from 6f6e7c9 to 650693b Compare July 30, 2024 17:37

fkiraly added module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing enhancement Adding new functionality labels Aug 4, 2024

fkiraly added 2 commits August 7, 2024 21:15

Update README.md

654c19b

fixes to readme and license

e351863

fkiraly merged commit 70e3b80 into sktime:main Aug 7, 2024
32 of 33 checks passed

fkiraly added the lib:fracdiff fracdiff onboard library label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Vendor fracdiff library #6777

[ENH] Vendor fracdiff library #6777

DinoBektesevic commented Jul 14, 2024 •

edited

Loading

review-notebook-app bot commented Jul 14, 2024

fkiraly left a comment

DinoBektesevic commented Jul 18, 2024

fkiraly Jul 22, 2024

DinoBektesevic Jul 30, 2024

fkiraly left a comment

DinoBektesevic commented Jul 30, 2024

DinoBektesevic commented Jul 30, 2024

DinoBektesevic commented Jul 30, 2024 •

edited

Loading

fkiraly commented Jul 31, 2024

[ENH] Vendor fracdiff library #6777

[ENH] Vendor fracdiff library #6777

Conversation

DinoBektesevic commented Jul 14, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

PR checklist

For all contributions

review-notebook-app bot commented Jul 14, 2024

fkiraly left a comment

Choose a reason for hiding this comment

DinoBektesevic commented Jul 18, 2024

fkiraly Jul 22, 2024

Choose a reason for hiding this comment

DinoBektesevic Jul 30, 2024

Choose a reason for hiding this comment

fkiraly left a comment

Choose a reason for hiding this comment

DinoBektesevic commented Jul 30, 2024

DinoBektesevic commented Jul 30, 2024

DinoBektesevic commented Jul 30, 2024 • edited Loading

fkiraly commented Jul 31, 2024

DinoBektesevic commented Jul 14, 2024 •

edited

Loading

DinoBektesevic commented Jul 30, 2024 •

edited

Loading