Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Vendor fracdiff library #6777

Merged
merged 12 commits into from
Aug 7, 2024
Merged

Conversation

DinoBektesevic
Copy link
Contributor

@DinoBektesevic DinoBektesevic commented Jul 14, 2024

Reference Issues/PRs

Resolves #6700 core issue of vendoring fracdiff package.

What does this implement/fix? Explain your changes.

The fracdiff package has been archived which reduced the usability of sktime for some users.

This PR is the initial step in vendoring the library as a libs component of sktime.
The intention at the time of initial PR is to have the tests pass with minimal changes to the original library, however, some changes were necessary:

  • docs were removed from the original library
  • examples were moved into the top-level examples directory
  • fracdiff tensor interface was removed
  • The FracDiffStat class was removed (depended on StatTester class)
  • StatTester class was removed, depended on statsmodels
    • the FracDiffStat seems useful, but StatTester, I was told, should be re-implemented (or changed in FracDiffStat to some other tester of "stationarity" (code defaulted to "ADF") that already exists in sktime. I'm just not knowledgeable enough about what's happening here, but I can work it out if there's an example somewhere that shows me how to use that test and try to rewrite. YMMV.
  • tests related to StatTester and FracDiffStat were then also removed
  • The import (and structure) was altered
    • the deprecation wrappers were removed
    • the Fracdiff class is imported to top-level from sklearn because the tensor wrappers were removed
    • the numpy fdiff and fdiff_coefffunctions are also now imported to top-level to avoid the a triple fdiff.fdiff.fdiff

Does your contribution introduce a new dependency? If yes, which one?

Not as is, potentially tesorflow or statsmodel, depending on further advice I get.

What should a reviewer concentrate their feedback on?

[edit: I just saw some of the discussion in the Issue relating to the first two points, whoops]

  • Is FracDiffStat useful, should it be added back in or is this mute because a different tester exists that is already usable as-is?
  • How should I wrap the statsmodel then back in?
  • Should examples be deleted?
  • I was told the FracDiff should be wrapped in a Differencer (or some such, or was it a "transformation") but I spent an hour driving home after the hack and now I can't remember what was the base class I was pointed to exactly. If someone can remind me of the class I should wrap the SKLearn FracDiff in that'd be swell...
  • This PR can be considered a work-in-progress, not the final state of the code, but a place for me to understand the requirements for merging

Did you add any tests for the change?

Tests are all as in original library, minus the removed ones.

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

sktime/libs/README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Great, 1:1 vendor distribution minus the parts dependent on statsmodels, as discussed. I will comment on the latter in the original issue, i.e., how I would design it. My preference would be to design a full replacement in sktime instead of keeping the equivalent class inside the vendored package.

What I would say, we merge this as-is, only minor requests:

  • the fracdiff.__init__.py should include a license and contributions preamble, see pykalman
  • could you kindly decorate all the test clsases with pytest.skipif depending on whether anything in the vendor lib has changed? Similar to, e.g., pykalman.tests.test_standard
  • I see the sklearn estimators' tests but am surprised to not see a check_estimator conformance test. More specifically, I would add a parametrize_with_checks decorated test to run the full suite of sklearn API conformance tests: https://scikit-learn.org/stable/modules/generated/sklearn.utils.estimator_checks.parametrize_with_checks.html#sklearn.utils.estimator_checks.parametrize_with_checks
    • I would expect this to simply run, but if it fails, let's add it and xfail, and report as an issue that we have API non-conformance
    • if it is too fiddly, we can also do this separately, then we should record this as an open issue that such tests need to be added
  • a minor typo in the README, see above

@DinoBektesevic
Copy link
Contributor Author

Ok, I think this addresses all the pr comments.

The preamble existed, I just put it into the wrong init (whoops). I also left an empty directory in tests (issues/issue32.py used to be a test for the stat estimator). I think I understood how to write the check estimator test - but double checking would be good because I saw you do have some utils that are similarly named and I've not written tests like these before.

I'll leave the non-comformance failures here to copy into the issue you're asking for if this is allowed to merge:

FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_n_features_in] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_sample_order_invariance] - IndexError: index 19 is out of bounds for axis 0 with size 19
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_estimators_nan_inf] - AssertionError: Estimator Fracdiff doesn't check for NaN and inf in fit.
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_estimators_empty_data_messages] - AssertionError: The estimator Fracdiff does not raise a ValueError when an empty data is used to train. Perhaps use check_array in train.
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_dtype_object] - AssertionError: Did not raise: [<class 'TypeError'>]
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_subset_invariance] - AssertionError: 
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_data_not_an_array] - TypeError: Don't want to call array_function diff!
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_complex_data] - AssertionError: Did not raise: [<class 'ValueError'>]
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_fit1d] - AssertionError: Did not raise: [<class 'ValueError'>]
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general(readonly_memmap=True)] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general] - AssertionError

Unofficial fork of the ``fracdiff`` package, maintained in ``sktime``.

sktime migration: 2024, July
Version 0.9.1 release: 2023, Feb 4 (DinoBektesevic)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, is it? It should be the last releast on pypi and account name.
Migration was done by DinoBektesevic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh, I cloned the current tagged release on GitHub: fracdiff/fracdiff@93ce110 which is marked as 0.9.1 - is this a problem?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Some minor things:

@DinoBektesevic
Copy link
Contributor Author

Hey, sorry it took me a while for this simple request. My week suddenly got really interesting. I added the check_array (not sure if that's the proper usage though, but I tried it with checked_array = check_array(X); ... fit(checked_array) and it failed with the same errors as without that:

I'm not sure what that does so I didn't fix it - better to say I didn't understand what the expected returns of the fracdiff are. There's less errors than before though!

FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general(readonly_memmap=True)] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_general] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_sample_order_invariance] - IndexError: index 19 is out of bounds for axis 0 with size 19
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_methods_subset_invariance] - AssertionError: 
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_n_features_in] - AssertionError
FAILED tests/sklearn/test_fracdiff.py::TestFracdiff::test_sklearn_compatible_estimator[Fracdiff(d=1,window=10,mode=same,window_policy=fixed)-check_transformer_data_not_an_array] - TypeError: Don't want to call array_function diff!

@DinoBektesevic
Copy link
Contributor Author

Hmm, strange, on my end all the tests get skipped by pre-commit let me try to see if I run it manually with --all-files the same thing happens.

@DinoBektesevic
Copy link
Contributor Author

DinoBektesevic commented Jul 30, 2024

Ok, running the pre-commit manually pointed out to some files in examples dir, which I did indeed copy over when I started but never updated. I had to remove one of them because it was still using the FracdiffStats class and add some extra docstrings. I also refreshed and rebased on main. The tests now seem to pass:

check for added large files..............................................Passed
check for case conflicts.................................................Passed
check for merge conflicts................................................Passed
check for broken symlinks............................(no files to check)Skipped
check yaml...........................................(no files to check)Skipped
debug statements (python)................................................Passed
fix end of files.........................................................Passed
fix requirements.txt.................................(no files to check)Skipped
trim trailing whitespace.................................................Passed
ruff-format..............................................................Passed
ruff.....................................................................Passed
shellcheck...........................................(no files to check)Skipped
3437 passed, 17033 skipped, 9 xfailed, 34 xpassed, 227 warnings in 112.06s (0:01:52)

Should be good to go this time.

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 31, 2024

Great, thx!

Re sklearn failures, I will have a look. They seem to be genuine and probably have been part of the fracdiff library due to lack of testing.

@fkiraly fkiraly added module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing enhancement Adding new functionality labels Aug 4, 2024
@fkiraly fkiraly merged commit 70e3b80 into sktime:main Aug 7, 2024
32 of 33 checks passed
@fkiraly fkiraly added the lib:fracdiff fracdiff onboard library label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality lib:fracdiff fracdiff onboard library module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Fractional differencing and ARFIMA
2 participants