Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_gap_fix #148

Merged
merged 35 commits into from
Jan 27, 2023
Merged

max_gap_fix #148

merged 35 commits into from
Jan 27, 2023

Conversation

daniel-caichac-DHI
Copy link
Collaborator

@daniel-caichac-DHI daniel-caichac-DHI commented Dec 5, 2022

Hi @jsmariegaard / @ecomodeller , I need help :(
I am trying to implement a max_gap feature when interpolating Model dt to Measurement dt to avoid unrealistic interpolations as the following
image
image

It works when I hard-code a max dt with a given value (eg, if my dt is 10min and I want a max interpolation of gap of 1 hour, then when I set limit=6 in here it works ok)
https://github.com/DHI/fmskill/blob/74f15e9e5b0ad1116f4a842297e0010c635bbe7d/fmskill/comparison.py#L1173-L1177
but then I tried to implement this max_gap as a user input and failed miserably. I got lost with all the child classes and inheritances.

Could you please help me getting this one working? I know it is not far away from being there.

@Hendrik1987
Copy link
Contributor

#147

@ecomodeller
Copy link
Member

I have changed the code to put the max_gap parameter on the Comparers. The user will supply this parameter in the call to .extract(max_gap=...)

Please go ahead and include a test file where using this parameter makes sense.

@daniel-caichac-DHI
Copy link
Collaborator Author

@ecomodeller I added a new line in the case max_gap < measurement dt (in which case a warning is prompted and no model interpolation is performed), and I also added a connection test with a new test_file.dfsu with gaps and and a new test that connects it with and without a specified max_gap
I think we are good to merge now.

@Hendrik1987
Copy link
Contributor

@ecomodeller and @daniel-caichac-DHI , I have an example and test on this PR #147. Should I bring it along to here? It will still fail, as it uses .compare(), which does not (yet) pass max_gap to .extract().
Do we need to add this or are we happy as it is?

@Hendrik1987
Copy link
Contributor

Hendrik1987 commented Dec 7, 2022

@daniel-caichac-DHI : I see the gap in your dfsu is with nan's. Which is the only option, as I have learned .dfsu files cannot be non-equidistant. However, is that really covering our use case? Or do we need to test also the gap in a non-equidistant model result, i.e., limited to .dfs0, df, .grib and .nc model results.

@daniel-caichac-DHI
Copy link
Collaborator Author

@Hendrik1987 I tested with the dfsu with nans for the same reasons you mentioned, but I also tested (maybe I should add another file to the test files? it just seemed redundant) a dfs0 with gaps (non-eq dfs0) and also worked.
Maybe I will do it now to clear doubts

@Hendrik1987
Copy link
Contributor

@daniel-caichac-DHI : give me a moment, just bringing the other tests along...

@jsmariegaard
Copy link
Member

Could someone please explain the use case to me? I guess >99% of the time, model data will be equidistant and without gaps. If it contains gaps, then I guess you could just convert it to equidistant (and add deletevalues/nan in the gaps)? Am I missing something?

Furthermore, if max_gap is a property of the modeldata, then I think it should be specified when defining the ModelResult, as it is not clear in the proposed implementation if it a gap in the measurement or in the model...

@Hendrik1987
Copy link
Contributor

Hendrik1987 commented Dec 7, 2022

@jsmariegaard

  1. Calibrating a model on selected events only, say, 50 storm periods of 1 week each distributed over 20 years. (For most user groups this might be <1%, for hindcast models we seem to have 99% of the time the 1% situation :-))
  2. I agree, with the current solution I am missing the option to define max_gap in .compare and mr.extract_observation() and maybe more places.

Also, I think it would be nice if max_gap was a timedelta and by default the model timestep.

@jsmariegaard
Copy link
Member

Thanks, @Hendrik1987, your example convinced me of its usefulness :-)

@jsmariegaard
Copy link
Member

I still think it should be defined on the ModelResult though :-)

mr = fmskill.ModelResult(fn, max_gap=1800) 

@daniel-caichac-DHI
Copy link
Collaborator Author

daniel-caichac-DHI commented Dec 7, 2022

I submited a new PR with a new test when connecting a non-eq measurement with a non-eq model results, and the test is successful with the current implementation

@daniel-caichac-DHI daniel-caichac-DHI marked this pull request as ready for review December 7, 2022 12:11
@Hendrik1987
Copy link
Contributor

Hendrik1987 commented Dec 7, 2022

I still think it should be defined on the ModelResult though :-)

mr = fmskill.ModelResult(fn, max_gap=1800) 

I tried to agree in my point 2 - which was maybe not clear :-)
I would further make it optional and by default it is the model time step.

@Hendrik1987
Copy link
Contributor

Added test for a df model result and .compare(). I agree with @jsmariegaard 's argument that max_gap is a model property, however, .compare() does not require to define a ModelResult...

@daniel-caichac-DHI
Copy link
Collaborator Author

with the new test the current implementations fails though , as it is right now implemented only in extract not in compare

@daniel-caichac-DHI
Copy link
Collaborator Author

daniel-caichac-DHI commented Dec 16, 2022

@jsmariegaard I moved the max_gap from the extraction to model result and I tested it and it works eg with

mr=ModelResult(r'file.dfs0')
mr=ModelResult(r'file.dfs0',max_gap=3600)

and then the usual

con = fmskill.Connector([obs], [mr])
c = con.extract()
c.scatter()

it also works with Dataframes as model results and I think Xarrays.
The problem I now have is that by moving this max_gap property to model result now makes me fail all of the multi-model comparison compare tests (so I cannot even do cc.skill() if I have more than 1 model in my connector, eg,
con = fmskill.Connector([obs], [mr1,mr2]) would not work anymore.
I honestly do not know how to fix this, I kind of need help with this.

@daniel-caichac-DHI
Copy link
Collaborator Author

daniel-caichac-DHI commented Dec 20, 2022

@jsmariegaard
I changed the max_gap_tests because they were made by Hendrik before we built the max_gap as a property of a ModelResult

Now the tests are successful and the max_gap is inside
mr=ModelResult( file.dfs0, max_gap=3600 )

I think we are ready to merge as soon as your review is finished. Thanks for fixing it !

@jsmariegaard
Copy link
Member

Thanks for the notebook, @Hendrik1987, would be great to reformulate it as a number of unit tests for the interp_df method 👍

@Hendrik1987
Copy link
Contributor

Sure @jsmariegaard , where should the tests be placed?
And what are your thoughts on limit (v2) vs. valid_query_times (v3). If you approve I can work on the latter, too.

@jsmariegaard
Copy link
Member

Sure @jsmariegaard , where should the tests be placed? And what are your thoughts on limit (v2) vs. valid_query_times (v3). If you approve I can work on the latter, too.

Place the tests in of the comparison test files. For this it would be nice to have a test_comparison.py but we don't - but it's not super important we can always move them around if we like.

I will get back to you on the other questions after the break - I am leaving now :)

@daniel-caichac-DHI
Copy link
Collaborator Author

I updated this branch but now I can't seem to be able to add neither max_gap nor max_model_gap anywhere, I tried in the PointObservation, ModelResult, Connector and connector.extract() but nowhere seems to be working.
Is this ok or are we missing something?

@@ -295,7 +296,7 @@ def _parse_observation(self, obs) -> PointObservation:
else:
raise ValueError(f"Unknown observation type {type(obs)}")

def extract(self) -> PointComparer:
def extract(self, max_model_gap: float=None) -> PointComparer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daniel-caichac-DHI : .extract() should take max_model_gap, no?

Copy link
Collaborator Author

@daniel-caichac-DHI daniel-caichac-DHI Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I had a script and tested with this commit and max_gap=3600 in extract() and it works,
95cd56b
then I tested with the most recent version of this branch and max_model_gap=3600 in extract() and it does not connect anything :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe add a test that illustrates the problem? expected vs actual...

Copy link
Collaborator Author

@daniel-caichac-DHI daniel-caichac-DHI Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When I don't use max gap I get this, 278 points
    image

  • When use the old commit (95cd56b ) I get this, 28 points
    image

  • When I use the lastest commit of this branch , I get an error
    image

  • Just for background, I am using the data from the test folder

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the Connector misses this argument - only the specialized connectors, like PointConnector and TrackConnector has it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a failing test

Copy link
Collaborator Author

@daniel-caichac-DHI daniel-caichac-DHI Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a failing test where connector.extract(max_model_gap=3600) does not work.

@Hendrik1987
Copy link
Contributor

@jsmariegaard : You asked me to reformulate notebook as couple of unit tests. I am generating test data within the notebook with a random component. Is it OK to save those as a few extra .dfs0 as additional test data? Maybe it is getting a bit messy with the test data, but I don't think we have an equivalent yet.

@jsmariegaard
Copy link
Member

@jsmariegaard : You asked me to reformulate notebook as couple of unit tests. I am generating test data within the notebook with a random component. Is it OK to save those as a few extra .dfs0 as additional test data? Maybe it is getting a bit messy with the test data, but I don't think we have an equivalent yet.

Please do add more test data preferably with descriptive names 👍

@Hendrik1987
Copy link
Contributor

Please do add more test data preferably with descriptive names 👍

Cool, will do! You mean something like model_final_new_v2_latest.dfs0, right? ;-)

@Hendrik1987
Copy link
Contributor

Thanks @jsmariegaard for picking this up and sorry for not contributing more actively here!

@jsmariegaard
Copy link
Member

Thanks @jsmariegaard for picking this up and sorry for not contributing more actively here!

No worries, @Hendrik1987, I really appreciated the work you did on the notebook! It was a great help finishing this PR.

@jsmariegaard jsmariegaard self-requested a review January 27, 2023 10:06
Copy link
Member

@jsmariegaard jsmariegaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed together with @ecomodeller IRL

@jsmariegaard jsmariegaard merged commit a575d96 into main Jan 27, 2023
@jsmariegaard jsmariegaard deleted the max_gap_interp branch January 27, 2023 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants