Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 53 additions & 2 deletions homeassistant/components/sensor/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import statistics
from collections import deque, Counter
from numbers import Number
from datetime import datetime

import voluptuous as vol

Expand All @@ -26,6 +27,7 @@
FILTER_NAME_LOWPASS = 'lowpass'
FILTER_NAME_OUTLIER = 'outlier'
FILTER_NAME_THROTTLE = 'throttle'
FILTER_NAME_TIME_SMA = 'time_sma'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this name, it's not very descriptive. Maybe time_weighted_average instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That name is too long :(

The actual name would be SMA_last as per the paper the filter is based upon:
http://www.eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think it's too long. I mean yes it's longer than most other filter names but time_sma is just incredibly hard to understand and the more descriptive name outweighs how long it is for me. And we shouldn't force our users to read scientific papers, ideally the filter would just work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore it's not a weighted average. Just refer to wikipedia for the differences:

https://en.wikipedia.org/wiki/Moving_average

I agree users should not need to read scientific papers to profit from filters, but that was another PR ;)

@balloob balloob Mar 15, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From another comment on this PR I understand that the "last" part of the official name is actually an important implementation detail of this algorithm. I would expect that to be part of the name unless we plan on making it first/last etc configurable for this filter later on.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting the last/first/average part into a type: option very clean. If we add separate filters for each filter type we would eventually just pollute the filter list.

And since these filters are so similar it would also make sense semantically. Can of course also go into a follow-up PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in the other comment, a second PR with the "linear" implementation is possible, although I have other filters in the priority list first. Maybe someone else can have a go 😄

I propose to link the paper in the documentation and code "type: last" as the default for the time being.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like adding a type: last to the options for this filter 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is reserved word, can I use variant :) ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with reserved? We're talking about the config format right? I think that this makes sense:

  - platform: filter
    name: "Filtered power"
    entity_id: sensor.power_smart_meter
    filters:
      - filter: time_simple_moving_average
        window_size: 00:01
        precision: 2
        type: last

I am a bit confused why a name can be too long? Assume your users, like me, have no clue about filters. I would prefer less abbreviations or having to read papers to get what I want. time_simple_moving_average sounds about right?

FILTERS = Registry()

CONF_FILTERS = 'filters'
Expand All @@ -44,24 +46,32 @@
ICON = 'mdi:chart-line-variant'

FILTER_SCHEMA = vol.Schema({
vol.Optional(CONF_FILTER_WINDOW_SIZE,
default=DEFAULT_WINDOW_SIZE): vol.Coerce(int),
vol.Optional(CONF_FILTER_PRECISION,
default=DEFAULT_PRECISION): vol.Coerce(int),
})

FILTER_OUTLIER_SCHEMA = FILTER_SCHEMA.extend({
vol.Required(CONF_FILTER_NAME): FILTER_NAME_OUTLIER,
vol.Optional(CONF_FILTER_WINDOW_SIZE,
default=DEFAULT_WINDOW_SIZE): vol.Coerce(int),
vol.Optional(CONF_FILTER_RADIUS,
default=DEFAULT_FILTER_RADIUS): vol.Coerce(float),
})

FILTER_LOWPASS_SCHEMA = FILTER_SCHEMA.extend({
vol.Required(CONF_FILTER_NAME): FILTER_NAME_LOWPASS,
vol.Optional(CONF_FILTER_WINDOW_SIZE,
default=DEFAULT_WINDOW_SIZE): vol.Coerce(int),
vol.Optional(CONF_FILTER_TIME_CONSTANT,
default=DEFAULT_FILTER_TIME_CONSTANT): vol.Coerce(int),
})

FILTER_TIME_SMA_SCHEMA = FILTER_SCHEMA.extend({
vol.Required(CONF_FILTER_NAME): FILTER_NAME_TIME_SMA,
vol.Required(CONF_FILTER_WINDOW_SIZE): vol.All(cv.time_period,
cv.positive_timedelta)
})

FILTER_THROTTLE_SCHEMA = FILTER_SCHEMA.extend({
vol.Required(CONF_FILTER_NAME): FILTER_NAME_THROTTLE,
})
Expand All @@ -72,6 +82,7 @@
vol.Required(CONF_FILTERS): vol.All(cv.ensure_list,
[vol.Any(FILTER_OUTLIER_SCHEMA,
FILTER_LOWPASS_SCHEMA,
FILTER_TIME_SMA_SCHEMA,
FILTER_THROTTLE_SCHEMA)])
})

Expand Down Expand Up @@ -277,6 +288,46 @@ def _filter_state(self, new_state):
return filtered


@FILTERS.register(FILTER_NAME_TIME_SMA)
class TimeSMAFilter(Filter):
"""Simple Moving Average (SMA) Filter.

The window_size is determined by time, and SMA is time weighted.
"""

def __init__(self, window_size, precision, entity):
"""Initialize Filter."""
super().__init__(FILTER_NAME_TIME_SMA, 0, precision, entity)
self._time_window = int(window_size.total_seconds())
self.last_leak = None
self.queue = deque()

def _leak(self, now):
"""Remove timeouted elements."""
while self.queue:
timestamp, _ = self.queue[0]
if timestamp + self._time_window <= now:
self.last_leak = self.queue.popleft()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this is assigned to an instance variable instead of a local variable and returned at the end of the while loop ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the previous comment #13104 (comment)

Keeping this value assures me I always have some value to work with in the _filter_state()

The filter always lags 1 state behind, so using the new state in these cases (when there are no previous states in the windows) would make the behaviour erratic.

else:
return

def _filter_state(self, new_state):
now = int(datetime.now().timestamp())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use our date util package and UTC timestamps:

from homeassistant.util import date as dt_util

dt_util.utcnow()

That way you won't get hit by daylight savings etc.

Any reason you're using timestamps instead of datetime objects?

window size is already given to you in a timedelta. So you can do things like dt_util.utcnow() - self._time_window to calculate points in time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time needs to be multiplied with an unknown type value, so I preferred to keep safe and make sure time is an int.


self._leak(now)
self.queue.append((now, float(new_state)))
moving_sum = 0
start = now - self._time_window
_, prev_val = self.last_leak or (0, float(new_state))

for timestamp, val in self.queue:
moving_sum += (timestamp-start)*prev_val

@OttoWinter OttoWinter Mar 12, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're doing here is a lower bound of the moving average:

screen shot 2018-03-12 at 16 57 50

(Edit: This picture is not entirely true; Your code always takes the previous values for the "integral" approximation, not the lower of the two)

Ideally we would instead be using the mean of prev and val:

screen shot 2018-03-12 at 16 58 41

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based the implementation in this paper:

http://www.eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf

My implementation is that of SMA_last, what you propose is SMA_linear. Both are acceptable filters, yet different ones.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that is true; I don't see much value in this SMA_last method over the trapezoids though. The latter just seems quite a bit more accurate in most cases.

@dgomes dgomes Mar 12, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably PR a second filter based on SMA_linear, but the use cases are different:

SMA_last: "discrete observation values"

  • sensors that are event based
  • binary_sensors in HA

SMA_linear: "discretely-observed continuous-time stochastic processes"

  • sensors which are polled

start, prev_val = timestamp, val
moving_sum += (now-start)*prev_val

return moving_sum/self._time_window

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we don't receive any value within self._time_window? I think we should be reporting an unknown state then. For example if my time window is 1 minute but I haven't received any state updates for 2 minutes, the output of this sensor should automatically be set to unknown (and not just repeat the same old state indefinitely).

I know this might be a bit of work to do but it's what I'd expect as a user. See usages of async_track_point_in_utc_time for ways how to do this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, If I don't get any values it just means that it hasn't changed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that the time has changed and time is also an input into this filter. Imagine that our current value is based off 2 states that happened during the window. If we wait long enough that 1 state drops out of our window, wouldn't you expect to base the value off just the other state?

@dgomes dgomes Mar 15, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@balloob that is exactly what it will happen.

Even if no points are left in the time window, the filter will report based on the last value that exited the windows (_, prev_val = self.last_leak )

@OttoWinter OttoWinter Mar 15, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgomes I think you're misunderstanding Paulus' comment (or I'm not understanding it...)

If there's only one point within the time window, as a user I wouldn't expect it to use the last value outside the time window for any computation. It should just discard anything that's outside the time window.

In other words: only every consider the data points that are inside the time frame for computation. See this gif for an example (the blue points are the data points, the blue area is our time window and red line is the output):

out

Couple of things going on here:

  • First we only "see" the first value, so of course we should only report that value.
  • Then we start to see the next value. We should then take the weighted average (or whatever it's called in your paper) of the two.
  • If the first value exits the window, we should not use it for any computation anymore. In your current code that is however happening, not what I would expect.
  • Optional: In my opinion if there's no data points inside our window left, as I said we should report an unknown state. This might differ from the definition in the paper you mentioned, but IMHO our user base is not made up of scientists nor developers.

(The computation of the values while both data points are inside the time window is probably not right in my gif, but that's not the focus of this comment.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OttoWinter that is not the filter I'm implementing here.

Sure, that filter can be implemented, but it's not the one in this PR.

Please share how you are creating those graphs 😄

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgomes I see that you report only on the last value but you only update the filter value when a new state comes in right?

@OttoWinter OttoWinter Mar 15, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgomes Ok, sorry I will shut up about other filters 😅

Yeah those graphics were done with (ab)using GeoGebra, but I would describe it as the single most user-unfriendly maths tools ever written IMHO (but it's sadly what's taught here in Austria 🇦🇹). Seriously, it's incredibly hard to use if you're starting out. (incidentally, because of its poor usability the rest of my class will probably also have to re-take a recent maths exam 😥)

btw, if for some reason you're looking for a nice tool (in python, yay) for creating nice presentations, have a look at manim. For example, it has allowed me to create these fancy visualizations about graph theory: https://youtu.be/t2Gb1YA-HcU 😺

@dgomes dgomes Mar 15, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@balloob right, this filter always lags 1 state behind. The concept is that the raw sensor has maintained the previous status until the new state has arrived.

E.g.: Think about a water pump that publishes its rate (1 to 5). If the pump is at speed 5 and comes down to speed 4 (new state) the SMA is calculated with the period of time in which the pump was at 5 (we weight in the time the pump was at 5), 4 is of no use in the SMA since the period of time is 0 (it just got to 4).

@OttoWinter please contribute more filters :) It's just that I'm prioritising filters I can use in my setup. There are plenty of important filters missing which I hope some else can contribute 😄

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'm going to shut up talking about filters as I have clearly not a clue what I am talking about 👍



@FILTERS.register(FILTER_NAME_THROTTLE)
class ThrottleFilter(Filter):
"""Throttle Filter.
Expand Down