Prediction seems to erroneously use knowledge of the value being predicted #15

andrew-edwards · 2019-08-29T22:30:03Z

I'm using the notation from Deyle et al.'s (2013, PNAS, 110:6430-6435) Supporting Information. A time series has values X(t), and so for E=2 the lagged space consists of vectors
x(t) = (X(t), X(t-1)).

For target time t*, we are trying to predict the X(t* + 1) value.
By definition, X(t* + 1) is included in
x(t* + 2) = ( X(t* + 2), X(t* + 1) )
and so it seems that we should not be allowed to use x(t* + 2) to predict X(t* + 1).

However, I think that the current rEDM::block_lnlp() function does allow this.

I have forked rEDM and created a new test test_12_block_neighbor.R to demonstrate the issue, see: https://github.com/andrew-edwards/rEDM/tree/andydev (andydev is the branch). The test currently fails, but I think should pass if I am correct and block_lnlp() gets updated (I can't easily see how to do that, sorry). The test code can be stepped through to check the results (just don't run the testthat::test_that line).

I can provide more details if anything is not clear, and happy to discuss anything. I started with test_05_simplex_calculations.R that Hao wrote to test an earlier issue, and adapted that to make the new calculations and the new test. Thanks.

The text was updated successfully, but these errors were encountered:

andrew-edwards · 2019-11-23T00:12:58Z

Hi, anyone have any thoughts on this issue? Thanks.

ha0ye · 2019-11-23T13:44:36Z

Hi @andrew-edwards,

Sorry, your original message came through while I was traveling, and I forgot to get back to it.

I'm a bit confused about your question:

For target time t*, we are trying to predict the X(t* + 1) value.
By definition, X(t* + 1) is included in
x(t* + 2) = ( X(t* + 2), X(t* + 1) )
and so it seems that we should not be allowed to use x(t* + 2) to predict X(t* + 1).

We don't currently have any checks for this sort of thing in block_lnlp() or simplex() and s_map(). For the latter, a reasonable fix might be to issue a warning when tp = 0, and so the unlagged time series is included as a coordinate in predicting itself.

In the case of block_lnlp(), adding the same warning (but there also needing to check whether the selected column as a predictor is also the column set as the target).

Is this summary about right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction seems to erroneously use knowledge of the value being predicted #15

Prediction seems to erroneously use knowledge of the value being predicted #15

andrew-edwards commented Aug 29, 2019

andrew-edwards commented Nov 23, 2019

ha0ye commented Nov 23, 2019

Prediction seems to erroneously use knowledge of the value being predicted #15

Prediction seems to erroneously use knowledge of the value being predicted #15

Comments

andrew-edwards commented Aug 29, 2019

andrew-edwards commented Nov 23, 2019

ha0ye commented Nov 23, 2019