Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying persistence services with date filter gives unexpected results #17028

Open
jlaur opened this issue Jul 9, 2024 · 8 comments
Open

Querying persistence services with date filter gives unexpected results #17028

jlaur opened this issue Jul 9, 2024 · 8 comments

Comments

@jlaur
Copy link
Contributor

jlaur commented Jul 9, 2024

I experience some inconsistent results when trying to create the same chart using different persistence services.

I think that the issue might be that FilterCriteria is only sparsely documented, so implementations could have different interpretations.

Specifically, how should beginDate and endDate be applied?

  • Should only state changes within the boundaries of [beginDate : endDate[ be returned?
  • Or should all states valid within the boundaries of [beginDate : endDate[ be returned, including the state still valid when entering the time interval?
  • Please note the [ above. Normally end date in date intervals are considered inclusive, while end datetime in datetime intervals are considered exclusive. I assume the same logic should be applied here? Changing this would require changes in both Javadoc and implementations though. See [persistence] Clarify intention of FilterCriteria beginDate/endDate openhab-core#3164. This is a minor side-comment, not the main issue.

The problem with the first interpretation is: How would it then be possible to query the state valid at any given moment? To be sure, you would have to provide a very old beginDate, and as a result you would get way too much data returned.

The JDBC persistence services applies this interpretation:

protected String resolveTimeFilter(FilterCriteria filter, ZoneId timeZone) {
String filterString = "";
ZonedDateTime beginDate = filter.getBeginDate();
if (beginDate != null) {
filterString += filterString.isEmpty() ? " WHERE" : " AND";
filterString += " TIME>='" + JDBC_DATE_FORMAT.format(beginDate.withZoneSameInstant(timeZone)) + "'";
}
ZonedDateTime endDate = filter.getEndDate();
if (endDate != null) {
filterString += filterString.isEmpty() ? " WHERE" : " AND";
filterString += " TIME<='" + JDBC_DATE_FORMAT.format(endDate.withZoneSameInstant(timeZone)) + "'";
}
return filterString;
}

And as a result when trying to create a chart:

2024-07-09 16:18:17.642 [DEBUG] [istence.jdbc.internal.db.JdbcBaseDAO] - JDBC::query queryString = SELECT time, value FROM EnergiDataService_GridTariff WHERE TIME>='2024-07-09 16:17:40' AND TIME<='2024-07-10 16:17:40' ORDER BY time ASC
2024-07-09 16:18:17.643 [DEBUG] [istence.jdbc.internal.db.JdbcBaseDAO] - JDBC::doGetHistItemFilterQuery sql=SELECT time, value FROM EnergiDataService_GridTariff WHERE TIME>='2024-07-09 16:17:40' AND TIME<='2024-07-10 16:17:40' ORDER BY time ASC

the first value here is missed:

time value
2024-07-09 06:00:00.000 0.20646875
2024-07-09 17:00:00.000 0.5368175
2024-07-09 21:00:00.000 0.20646875
2024-07-10 00:00:00.000 0.137645
2024-07-10 06:00:00.000 0.20646875

Expected Behavior

I would expect the first row above to be included in the result set as well, since that value is valid until the new value at 2024-07-09 17:00:00.000 replaces it.

Current Behavior

Only state changes that took place within the queried period are returned, not the last value before the period.

Possible Solution

The query could be adapted to take this into consideration, but the Javadoc should be updated as well, and all other implementations should be checked.

Steps to Reproduce

Create a chart like in the example below.

Context

I'm trying to create a stacked bar chart representing electricity prices with different components on top of each other. With rrd4j it looks like this:

image

Given this code:

config:
  future: true
  label: Elpriser
  period: D
  sidebar: true
slots:
  grid:
    - component: oh-chart-grid
      config: {}
  series:
    - component: oh-time-series
      config:
        gridIndex: 0
        item: EnergiDataService_SpotPrice
        name: Spotpris
        type: bar
        xAxisIndex: 0
        yAxisIndex: 0
        stack: one
        service: rrd4j
    - component: oh-time-series
      config:
        gridIndex: 0
        item: EnergiDataService_GridTariff
        name: Nettarif
        type: bar
        xAxisIndex: 0
        yAxisIndex: 0
        stack: one
        service: rrd4j
  xAxis:
    - component: oh-time-axis
      config:
        gridIndex: 0
  yAxis:
    - component: oh-value-axis
      config:
        gridIndex: 0

With JDBC (MySQL) it looks like this:

image

The issue is very likely related to the way the persistence service is queried, since there are no redundant updates, but only actual state changes (item EnergiDataService_GridTariff):

time value
2024-07-08 13:00:00.000 0.20646875
2024-07-08 17:00:00.000 0.5368175
2024-07-08 21:00:00.000 0.20646875
2024-07-09 00:00:00.000 0.137645
2024-07-09 06:00:00.000 0.20646875
2024-07-09 17:00:00.000 0.5368175
2024-07-09 21:00:00.000 0.20646875
2024-07-10 00:00:00.000 0.137645
2024-07-10 06:00:00.000 0.20646875

Please note that this is not the only issue, since even values within the period are not repeated. So either I have the wrong understanding of the chart functionality (quite likely) or my example would still not work even if FilterDate beginDate is clarified and fixed.

Your Environment

  • Version used: 4.2
@jlaur jlaur added bug An unexpected problem or unintended behavior of an add-on discussion labels Jul 9, 2024
@jlaur
Copy link
Contributor Author

jlaur commented Jul 9, 2024

@florian-h05 - I saw that you recently created some PR's related to charts. Sorry to tag you directly, but do you know if the chart I'm trying to create is even possible to create given the way data is persisted (and queried)? The item represents the current price per kWh, so as long as it doesn't change, it's still valid, and thus I'd like to include it in an hourly bar chart to represent to energy price in that hour.

@florian-h05
Copy link
Contributor

So when the UI requests persisted data from REST, it provides the starttime and endtime query params, e.g. /rest/persistence/items/OneCall_Forecast_Hourly_Temperature?serviceId=inmemory&starttime=2024-07-09T15%3A08%3A43.903Z&endtime=2024-07-11T15%3A08%3A43.902Z.

InMemory returns any state persisted within that time, no matter if it changed or not, and it does not return the last state before that timerange. I however cannot see if it does include the endtime.

Returning everything between starttime and endtime, both are inclusive bounds, is IMO the correct behaviour and what's best for the UI.
The only thing that is debatable, is whether to include the last state before the starttime. I think this should not be done by default, as it wouldn't make sense given starttime is called starttime, but it would be a nice feature, because then charts would be able to show the "current" value.

The stacked bar chart you want to create is possible, the RRD4J result looks really good, apart from the fact its bars are very thin. Looking at your JDBC example, it would work as intended if you would persist the data every hour.

ECharts does not support interpolation and interpolation in general is a tricky thing because there are different strategies and it heavily depends on the use case which one makes sense - I therefore wouldn't support a interpolation by default.

@jlaur
Copy link
Contributor Author

jlaur commented Jul 9, 2024

@florian-h05 - many thanks for your feedback. So I think that, after all, it is not possible to achieve what I want, but I may turn to the forum for additional advise. I don't want to persist redundant data just to affect how charts are created - in the binding providing this data, I even normalize it:

TimeSeries timeSeries = new TimeSeries(REPLACE);
BigDecimal previousTariff = null;
for (Entry<Instant, BigDecimal> price : prices) {
Instant hourStart = price.getKey();
BigDecimal priceValue = price.getValue();
if (deduplicate && priceValue.equals(previousTariff)) {
// Skip redundant states.
continue;
}
timeSeries.add(hourStart, getEnergyPrice(priceValue, currency));
previousTariff = priceValue;
}

Interpolation is not needed, just repetition. I probably cannot tell the chart that I'd like those bars to represent hours of the day with whatever value is valid in those hours. As an example, this should give the exact same result:

Fully normalized:

time value
2024-07-08 13:00:00.000 0.20646875
2024-07-08 17:00:00.000 0.5368175
2024-07-08 21:00:00.000 0.20646875
2024-07-09 00:00:00.000 0.137645

Redundantly persisted by the hour:

time value
2024-07-08 13:00:00.000 0.20646875
2024-07-08 14:00:00.000 0.20646875
2024-07-08 15:00:00.000 0.20646875
2024-07-08 16:00:00.000 0.20646875
2024-07-08 17:00:00.000 0.5368175
2024-07-08 18:00:00.000 0.5368175
2024-07-08 19:00:00.000 0.5368175
2024-07-08 20:00:00.000 0.5368175
2024-07-08 21:00:00.000 0.20646875
2024-07-08 22:00:00.000 0.20646875
2024-07-08 23:00:00.000 0.20646875
2024-07-09 00:00:00.000 0.137645

Redundantly persisted by the hour with some additional updates:

time value
2024-07-08 13:00:00.000 0.20646875
2024-07-08 13:01:00.000 0.20646875
2024-07-08 14:00:00.000 0.20646875
2024-07-08 14:30:00.000 0.20646875
2024-07-08 15:00:00.000 0.20646875
2024-07-08 15:50:00.000 0.20646875
2024-07-08 16:00:00.000 0.20646875
2024-07-08 17:00:00.000 0.5368175
2024-07-08 18:00:00.000 0.5368175
2024-07-08 19:00:00.000 0.5368175
2024-07-08 20:00:00.000 0.5368175
2024-07-08 21:00:00.000 0.20646875
2024-07-08 22:00:00.000 0.20646875
2024-07-08 23:00:00.000 0.20646875
2024-07-09 00:00:00.000 0.137645

(rrd4j cannot be used since it doesn's support future states, and also, like you mentioned, it looks strange with multiple ultrathin bars per hour)

What is left for this issue is still to decide how to solve the corner case that the initial state is currently impossible to obtain through a query.

Returning everything between starttime and endtime, both are inclusive bounds, is IMO the correct behaviour and what's best for the UI.

We should probably try to think what it most logical in general rather than coupling it too much with the UI, but still having it in mind as a very important use case. It's also a corner-case, but why do you think that endtime should be inclusive? Given this:

time value
2024-07-08 21:00:00.000 0.20646875
2024-07-09 00:00:00.000 0.137645

If I want to query the period 2024-07-08 21:00:00.000 - 2024-07-09 00:00:00.000, the value 0.137645 is not really relevant, because it only becomes relevant starting exactly 2024-07-09 00:00:00.000. In other words: These two periods do not overlap:

  • 2024-07-08 00:00:00.000 - 2024-07-09 00:00:00.000
  • 2024-07-09 00:00:00.000 - 2024-07-10 00:00:00.000

And both periods are exactly 24 hours, nothing more. If both 2024-07-08 00:00:00.000 and 2024-07-09 00:00:00.000 would be in that first period, it would be longer than 24 hours, and it would overlap with the last period.

The only thing that is debatable, is whether to include the last state before the starttime. I think this should not be done by default, as it wouldn't make sense given starttime is called starttime, but it would be a nice feature, because then charts would be able to show the "current" value.

I agree it could be unexpected if the query would return a timestamp outside the boundary, i.e. the timestamp of the state already valid when entering the period. We could truncate the timestamp though, WDYT? So that if starttime is 2024-07-09T15%3A08%3A43.903Z, we would include the state valid from 2024-07-08 13:00:00.000, but set timestamp = starttime.

@florian-h05
Copy link
Contributor

I understand that you don't want to persist redundant data, it would make life much easier if you would provide hourly values from the binding and persist them. This way, charts would look the way you want them to look.
Probably ECharts supports something to create the chart you want, I have to admit my ECharts knowledge is fairly limited and ECharts is extremely powerful, so it might have that feature in some way.

BTW, I can imagine two ways to repeat or interpolate data: Either in the UI, or inside openHAB core, and then let the user choose if he wants to use an interpolation or repetition algorithm. But this would be a good amount of work to implement ...

And I personally don't see a problem with having that redundant data ... I have just checked my InfluxDB, which is holding storing several measurements since November 2019 with everyChange and every 5 minute strategies (in the past I even had a every minute strategy) and the data directory of this InfluxDB instance is just 1.4 GB. On a NAS with a few TBs of storage this just doesn't matter. And apart from that, for forecasts I am using InMemory persistence anyway, as I don't need to the weather forecast from last week - if that weather is interesting, I rather look at my own measurements.

What is left for this issue is still to decide how to solve the corner case that the initial state is currently impossible to obtain through a query.

Agreed tho, what I wrote above is mostly just my personal opinion.

It's also a corner-case, but why do you think that endtime should be inclusive?

There are pros and cons for this. I agree that it logically doesn't make sense to have an inclusive end, e.g. if I request the energy forecast for tomorrow I will choose midnight to midnight, where the end time is irrelevent for that day.
For some chart types , e.g. line chart, it is however required to know the data from the endtime to draw the line for the last hour.
So having endtime explicit by default would make sense, but provide an option to make it implicit.

I agree it could be unexpected if the query would return a timestamp outside the boundary, i.e. the timestamp of the state already valid when entering the period. We could truncate the timestamp though, WDYT?

That feels a bit hacky but IMO makes a lot of sense and the result is "natural", I don't think this could cause a problem.

@jlaur
Copy link
Contributor Author

jlaur commented Jul 9, 2024

It's also a corner-case, but why do you think that endtime should be inclusive?

There are pros and cons for this. I agree that it logically doesn't make sense to have an inclusive end, e.g. if I request the energy forecast for tomorrow I will choose midnight to midnight, where the end time is irrelevent for that day. For some chart types , e.g. line chart, it is however required to know the data from the endtime to draw the line for the last hour. So having endtime explicit by default would make sense, but provide an option to make it implicit.

Do you need a persisted state with timestamp exactly at your end time, or do you need the next state after the requested period ends in order to interpolate (i.e. kind of the opposite problem of not getting the initial state)?

If you only need to include any given state becoming valid exactly at the end time, perhaps the chart should add a nanosecond to the requested end time to include that explicitly? Other queries might not need it, and they could avoid doing that.

If this would be a way forward, this could be the first step to take before changing anything else, because getting possible one entry too much can be handled. But... also it might not be that important. 😉

@jlaur jlaur removed the bug An unexpected problem or unintended behavior of an add-on label Jul 9, 2024
@florian-h05
Copy link
Contributor

If you only need to include any given state becoming valid exactly at the end time, perhaps the chart should add a nanosecond to the requested end time to include that explicitly? Other queries might not need it, and they could avoid doing that.

I don't think that would be enough, as there is probably no state becoming valid at that time. What if the next state is one hour forward?

But... also it might not be that important. 😉

Yes, I think so. Just one thing: If endTime currently is interpreted inclusively, I would like to be able to keep this behaviour when changing this to exclusively - an extra parameter to enable that would be great then.

@jlaur
Copy link
Contributor Author

jlaur commented Jul 13, 2024

If you only need to include any given state becoming valid exactly at the end time, perhaps the chart should add a nanosecond to the requested end time to include that explicitly? Other queries might not need it, and they could avoid doing that.

I don't think that would be enough, as there is probably no state becoming valid at that time. What if the next state is one hour forward?

But... also it might not be that important. 😉

Yes, I think so. Just one thing: If endTime currently is interpreted inclusively, I would like to be able to keep this behaviour when changing this to exclusively - an extra parameter to enable that would be great then.

If we change the logic from:

time >= startTime && time <= endTime

to:

time >= startTime && time < endTime

then it should be enough to add one nanosecond to endTime to get the same result as before, unless I'm missing something?

@florian-h05
Copy link
Contributor

then it should be enough to add one nanosecond to endTime to get the same result as before, unless I'm missing something?

Sure - this way the current behaviour could be „reproduced“.

However it still would be nice to be able to get the next state after the endTime with the timestamp set to the end time, this is basically the same like getting the state before beginTime and putting it to the beginTime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants