max_gap_fix #148

daniel-caichac-DHI · 2022-12-05T13:34:48Z

Hi @jsmariegaard / @ecomodeller , I need help :(
I am trying to implement a max_gap feature when interpolating Model dt to Measurement dt to avoid unrealistic interpolations as the following

It works when I hard-code a max dt with a given value (eg, if my dt is 10min and I want a max interpolation of gap of 1 hour, then when I set limit=6 in here it works ok)
https://github.com/DHI/fmskill/blob/74f15e9e5b0ad1116f4a842297e0010c635bbe7d/fmskill/comparison.py#L1173-L1177
but then I tried to implement this max_gap as a user input and failed miserably. I got lost with all the child classes and inheritances.

Could you please help me getting this one working? I know it is not far away from being there.

Hendrik1987 · 2022-12-05T14:00:29Z

#147

ecomodeller · 2022-12-07T08:28:19Z

I have changed the code to put the max_gap parameter on the Comparers. The user will supply this parameter in the call to .extract(max_gap=...)

Please go ahead and include a test file where using this parameter makes sense.

daniel-caichac-DHI · 2022-12-07T10:50:07Z

@ecomodeller I added a new line in the case max_gap < measurement dt (in which case a warning is prompted and no model interpolation is performed), and I also added a connection test with a new test_file.dfsu with gaps and and a new test that connects it with and without a specified max_gap
I think we are good to merge now.

Hendrik1987 · 2022-12-07T10:51:51Z

@ecomodeller and @daniel-caichac-DHI , I have an example and test on this PR #147. Should I bring it along to here? It will still fail, as it uses .compare(), which does not (yet) pass max_gap to .extract().
Do we need to add this or are we happy as it is?

Hendrik1987 · 2022-12-07T10:53:32Z

@daniel-caichac-DHI : I see the gap in your dfsu is with nan's. Which is the only option, as I have learned .dfsu files cannot be non-equidistant. However, is that really covering our use case? Or do we need to test also the gap in a non-equidistant model result, i.e., limited to .dfs0, df, .grib and .nc model results.

daniel-caichac-DHI · 2022-12-07T11:50:58Z

@Hendrik1987 I tested with the dfsu with nans for the same reasons you mentioned, but I also tested (maybe I should add another file to the test files? it just seemed redundant) a dfs0 with gaps (non-eq dfs0) and also worked.
Maybe I will do it now to clear doubts

Hendrik1987 · 2022-12-07T11:52:57Z

@daniel-caichac-DHI : give me a moment, just bringing the other tests along...

jsmariegaard · 2022-12-07T11:54:12Z

Could someone please explain the use case to me? I guess >99% of the time, model data will be equidistant and without gaps. If it contains gaps, then I guess you could just convert it to equidistant (and add deletevalues/nan in the gaps)? Am I missing something?

Furthermore, if max_gap is a property of the modeldata, then I think it should be specified when defining the ModelResult, as it is not clear in the proposed implementation if it a gap in the measurement or in the model...

Hendrik1987 · 2022-12-07T12:02:00Z

@jsmariegaard

Calibrating a model on selected events only, say, 50 storm periods of 1 week each distributed over 20 years. (For most user groups this might be <1%, for hindcast models we seem to have 99% of the time the 1% situation :-))
I agree, with the current solution I am missing the option to define max_gap in .compare and mr.extract_observation() and maybe more places.

Also, I think it would be nice if max_gap was a timedelta and by default the model timestep.

jsmariegaard · 2022-12-07T12:03:59Z

Thanks, @Hendrik1987, your example convinced me of its usefulness :-)

jsmariegaard · 2022-12-07T12:05:14Z

I still think it should be defined on the ModelResult though :-)

mr = fmskill.ModelResult(fn, max_gap=1800)

daniel-caichac-DHI · 2022-12-07T12:08:46Z

I submited a new PR with a new test when connecting a non-eq measurement with a non-eq model results, and the test is successful with the current implementation

Hendrik1987 · 2022-12-07T12:12:34Z

I still think it should be defined on the ModelResult though :-)
mr = fmskill.ModelResult(fn, max_gap=1800) 

I tried to agree in my point 2 - which was maybe not clear :-)
I would further make it optional and by default it is the model time step.

Hendrik1987 · 2022-12-07T12:31:43Z

Added test for a df model result and .compare(). I agree with @jsmariegaard 's argument that max_gap is a model property, however, .compare() does not require to define a ModelResult...

daniel-caichac-DHI · 2022-12-07T14:08:54Z

with the new test the current implementations fails though , as it is right now implemented only in extract not in compare

daniel-caichac-DHI · 2022-12-16T14:22:37Z

@jsmariegaard I moved the max_gap from the extraction to model result and I tested it and it works eg with

mr=ModelResult(r'file.dfs0')
mr=ModelResult(r'file.dfs0',max_gap=3600)

and then the usual

con = fmskill.Connector([obs], [mr])
c = con.extract()
c.scatter()

it also works with Dataframes as model results and I think Xarrays.
The problem I now have is that by moving this max_gap property to model result now makes me fail all of the multi-model comparison compare tests (so I cannot even do cc.skill() if I have more than 1 model in my connector, eg,
con = fmskill.Connector([obs], [mr1,mr2]) would not work anymore.
I honestly do not know how to fix this, I kind of need help with this.

…y; include where missing in model results

daniel-caichac-DHI · 2022-12-20T14:23:03Z

@jsmariegaard
I changed the max_gap_tests because they were made by Hendrik before we built the max_gap as a property of a ModelResult

Now the tests are successful and the max_gap is inside
mr=ModelResult( file.dfs0, max_gap=3600 )

I think we are ready to merge as soon as your review is finished. Thanks for fixing it !

jsmariegaard · 2022-12-22T13:35:24Z

Thanks for the notebook, @Hendrik1987, would be great to reformulate it as a number of unit tests for the interp_df method 👍

Hendrik1987 · 2022-12-22T14:44:32Z

Sure @jsmariegaard , where should the tests be placed?
And what are your thoughts on limit (v2) vs. valid_query_times (v3). If you approve I can work on the latter, too.

jsmariegaard · 2022-12-22T15:04:58Z

Sure @jsmariegaard , where should the tests be placed? And what are your thoughts on limit (v2) vs. valid_query_times (v3). If you approve I can work on the latter, too.

Place the tests in of the comparison test files. For this it would be nice to have a test_comparison.py but we don't - but it's not super important we can always move them around if we like.

I will get back to you on the other questions after the break - I am leaving now :)

daniel-caichac-DHI · 2023-01-04T13:01:12Z

I updated this branch but now I can't seem to be able to add neither max_gap nor max_model_gap anywhere, I tried in the PointObservation, ModelResult, Connector and connector.extract() but nowhere seems to be working.
Is this ok or are we missing something?

Hendrik1987 · 2023-01-04T13:12:10Z

fmskill/connection.py

@@ -295,7 +296,7 @@ def _parse_observation(self, obs) -> PointObservation:
        else:
            raise ValueError(f"Unknown observation type {type(obs)}")

-    def extract(self) -> PointComparer:
+    def extract(self, max_model_gap: float=None) -> PointComparer:


@daniel-caichac-DHI : .extract() should take max_model_gap, no?

I don't know, I had a script and tested with this commit and max_gap=3600 in extract() and it works,
95cd56b
then I tested with the most recent version of this branch and max_model_gap=3600 in extract() and it does not connect anything :(

Could you maybe add a test that illustrates the problem? expected vs actual...

When I don't use max gap I get this, 278 points

When use the old commit (95cd56b ) I get this, 28 points

When I use the lastest commit of this branch , I get an error

Just for background, I am using the data from the test folder

It seems that the Connector misses this argument - only the specialized connectors, like PointConnector and TrackConnector has it.

You can add a failing test

I just added a failing test where connector.extract(max_model_gap=3600) does not work.

Hendrik1987 · 2023-01-04T15:46:15Z

@jsmariegaard : You asked me to reformulate notebook as couple of unit tests. I am generating test data within the notebook with a random component. Is it OK to save those as a few extra .dfs0 as additional test data? Maybe it is getting a bit messy with the test data, but I don't think we have an equivalent yet.

jsmariegaard · 2023-01-04T15:50:43Z

@jsmariegaard : You asked me to reformulate notebook as couple of unit tests. I am generating test data within the notebook with a random component. Is it OK to save those as a few extra .dfs0 as additional test data? Maybe it is getting a bit messy with the test data, but I don't think we have an equivalent yet.

Please do add more test data preferably with descriptive names 👍

Hendrik1987 · 2023-01-04T15:53:38Z

Please do add more test data preferably with descriptive names 👍

Cool, will do! You mean something like model_final_new_v2_latest.dfs0, right? ;-)

tests/test_connector.py

Hendrik1987 · 2023-01-27T07:55:19Z

Thanks @jsmariegaard for picking this up and sorry for not contributing more actively here!

jsmariegaard · 2023-01-27T07:57:08Z

Thanks @jsmariegaard for picking this up and sorry for not contributing more actively here!

No worries, @Hendrik1987, I really appreciated the work you did on the notebook! It was a great help finishing this PR.

jsmariegaard

Reviewed together with @ecomodeller IRL

max_gap_fix

74f15e9

Hendrik1987 mentioned this pull request Dec 5, 2022

handle model results with gaps #147

Closed

max_gap belongs to the Comparers

b33bb50

New test for max_gap extraction

0489f6d

new non-eq test

95cd56b

daniel-caichac-DHI marked this pull request as ready for review December 7, 2022 12:11

add nan and gap tests

59c3507

max gap in model

7210b7b

daniel-caichac-DHI and others added 5 commits December 20, 2022 10:47

fixed max_gap with dataframe as input

cc9b7f7

max_gap default None, pass max_gap to Comparer, remove public propert…

3786f7d

…y; include where missing in model results

mr._max_gap if hasattr(mr, "_max_gap") else None

1310376

_parse_max_gap to create list of correct length

a2fa6c1

Changed_tests

cfe0a2a

Hendrik1987 reviewed Jan 4, 2023

View reviewed changes

daniel-caichac-DHI and others added 9 commits January 12, 2023 15:45

Failing test with current implementation

93a044f

Failing test with current implementation

9af1d99

Merge branch 'main' into max_gap_interp

949e0c1

Fix Merge Error

5e0e4c9

Merge branch 'main' into max_gap_interp

944901d

remove unused fixtures

d0ed025

add new tests

7672d8e

pass arguments to specific Connectors

20a6bb3

fix and extend test_extract_gaps1 (and make input less random)

898892f

ecomodeller reviewed Jan 26, 2023

View reviewed changes

tests/test_connector.py Outdated Show resolved Hide resolved

jsmariegaard added 2 commits January 27, 2023 08:43

Use @Hendrik1987 implementation to remove index in model_gaps

f761d91

clean up

d17142f

jsmariegaard added 2 commits January 27, 2023 11:04

test other data types

0239af0

TimeDeltaTypes, _time_delta_to_pd_timedelta

c623c36

jsmariegaard self-requested a review January 27, 2023 10:06

jsmariegaard approved these changes Jan 27, 2023

View reviewed changes

jsmariegaard added 2 commits January 27, 2023 11:12

black formatting

f44c184

remove random component

630d77d

jsmariegaard merged commit a575d96 into main Jan 27, 2023

jsmariegaard deleted the max_gap_interp branch January 27, 2023 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_gap_fix #148

max_gap_fix #148

daniel-caichac-DHI commented Dec 5, 2022 •

edited

Loading

Hendrik1987 commented Dec 5, 2022

ecomodeller commented Dec 7, 2022

daniel-caichac-DHI commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022 •

edited

Loading

daniel-caichac-DHI commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022

jsmariegaard commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022 •

edited

Loading

jsmariegaard commented Dec 7, 2022

jsmariegaard commented Dec 7, 2022

daniel-caichac-DHI commented Dec 7, 2022 •

edited

Loading

Hendrik1987 commented Dec 7, 2022 •

edited

Loading

Hendrik1987 commented Dec 7, 2022

daniel-caichac-DHI commented Dec 7, 2022

daniel-caichac-DHI commented Dec 16, 2022 •

edited

Loading

daniel-caichac-DHI commented Dec 20, 2022 •

edited

Loading

jsmariegaard commented Dec 22, 2022

Hendrik1987 commented Dec 22, 2022

jsmariegaard commented Dec 22, 2022

daniel-caichac-DHI commented Jan 4, 2023

Hendrik1987 Jan 4, 2023

daniel-caichac-DHI Jan 4, 2023 •

edited

Loading

jsmariegaard Jan 4, 2023

daniel-caichac-DHI Jan 5, 2023 •

edited

Loading

jsmariegaard Jan 5, 2023

jsmariegaard Jan 5, 2023

daniel-caichac-DHI Jan 12, 2023 •

edited

Loading

Hendrik1987 commented Jan 4, 2023

jsmariegaard commented Jan 4, 2023

Hendrik1987 commented Jan 4, 2023

Hendrik1987 commented Jan 27, 2023

jsmariegaard commented Jan 27, 2023

jsmariegaard left a comment

max_gap_fix #148

max_gap_fix #148

Conversation

daniel-caichac-DHI commented Dec 5, 2022 • edited Loading

Hendrik1987 commented Dec 5, 2022

ecomodeller commented Dec 7, 2022

daniel-caichac-DHI commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022 • edited Loading

daniel-caichac-DHI commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022

jsmariegaard commented Dec 7, 2022

Hendrik1987 commented Dec 7, 2022 • edited Loading

jsmariegaard commented Dec 7, 2022

jsmariegaard commented Dec 7, 2022

daniel-caichac-DHI commented Dec 7, 2022 • edited Loading

Hendrik1987 commented Dec 7, 2022 • edited Loading

Hendrik1987 commented Dec 7, 2022

daniel-caichac-DHI commented Dec 7, 2022

daniel-caichac-DHI commented Dec 16, 2022 • edited Loading

daniel-caichac-DHI commented Dec 20, 2022 • edited Loading

jsmariegaard commented Dec 22, 2022

Hendrik1987 commented Dec 22, 2022

jsmariegaard commented Dec 22, 2022

daniel-caichac-DHI commented Jan 4, 2023

Hendrik1987 Jan 4, 2023

Choose a reason for hiding this comment

daniel-caichac-DHI Jan 4, 2023 • edited Loading

Choose a reason for hiding this comment

jsmariegaard Jan 4, 2023

Choose a reason for hiding this comment

daniel-caichac-DHI Jan 5, 2023 • edited Loading

Choose a reason for hiding this comment

jsmariegaard Jan 5, 2023

Choose a reason for hiding this comment

jsmariegaard Jan 5, 2023

Choose a reason for hiding this comment

daniel-caichac-DHI Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Hendrik1987 commented Jan 4, 2023

jsmariegaard commented Jan 4, 2023

Hendrik1987 commented Jan 4, 2023

Hendrik1987 commented Jan 27, 2023

jsmariegaard commented Jan 27, 2023

jsmariegaard left a comment

Choose a reason for hiding this comment

daniel-caichac-DHI commented Dec 5, 2022 •

edited

Loading

Hendrik1987 commented Dec 7, 2022 •

edited

Loading

Hendrik1987 commented Dec 7, 2022 •

edited

Loading

daniel-caichac-DHI commented Dec 7, 2022 •

edited

Loading

Hendrik1987 commented Dec 7, 2022 •

edited

Loading

daniel-caichac-DHI commented Dec 16, 2022 •

edited

Loading

daniel-caichac-DHI commented Dec 20, 2022 •

edited

Loading

daniel-caichac-DHI Jan 4, 2023 •

edited

Loading

daniel-caichac-DHI Jan 5, 2023 •

edited

Loading

daniel-caichac-DHI Jan 12, 2023 •

edited

Loading