Skip to content
This repository was archived by the owner on Jul 20, 2023. It is now read-only.

Prediction seems to erroneously use knowledge of the value being predicted #15

Open
andrew-edwards opened this issue Aug 29, 2019 · 2 comments

Comments

@andrew-edwards
Copy link

I'm using the notation from Deyle et al.'s (2013, PNAS, 110:6430-6435) Supporting Information. A time series has values X(t), and so for E=2 the lagged space consists of vectors
x(t) = (X(t), X(t-1)).

For target time t*, we are trying to predict the X(t* + 1) value.
By definition, X(t* + 1) is included in
x(t* + 2) = ( X(t* + 2), X(t* + 1) )
and so it seems that we should not be allowed to use x(t* + 2) to predict X(t* + 1).

However, I think that the current rEDM::block_lnlp() function does allow this.

I have forked rEDM and created a new test test_12_block_neighbor.R to demonstrate the issue, see: https://github.com/andrew-edwards/rEDM/tree/andydev (andydev is the branch). The test currently fails, but I think should pass if I am correct and block_lnlp() gets updated (I can't easily see how to do that, sorry). The test code can be stepped through to check the results (just don't run the testthat::test_that line).

I can provide more details if anything is not clear, and happy to discuss anything. I started with test_05_simplex_calculations.R that Hao wrote to test an earlier issue, and adapted that to make the new calculations and the new test. Thanks.

@andrew-edwards
Copy link
Author

Hi, anyone have any thoughts on this issue? Thanks.

@ha0ye
Copy link
Owner

ha0ye commented Nov 23, 2019

Hi @andrew-edwards,

Sorry, your original message came through while I was traveling, and I forgot to get back to it.

I'm a bit confused about your question:

For target time t*, we are trying to predict the X(t* + 1) value.
By definition, X(t* + 1) is included in
x(t* + 2) = ( X(t* + 2), X(t* + 1) )
and so it seems that we should not be allowed to use x(t* + 2) to predict X(t* + 1).

We don't currently have any checks for this sort of thing in block_lnlp() or simplex() and s_map(). For the latter, a reasonable fix might be to issue a warning when tp = 0, and so the unlagged time series is included as a coordinate in predicting itself.

In the case of block_lnlp(), adding the same warning (but there also needing to check whether the selected column as a predictor is also the column set as the target).

Is this summary about right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants