Added a default timeout to Tradfri observations by alex3305 · Pull Request #18497 · home-assistant/core

alex3305 · 2018-11-15T21:45:51Z

Description:

The IKEA Tradfri devices were configured by default that the observation of the
individual devices never timed out. These observations are used to check the
current state of the device.

However, I have experienced that having an infinite observation, there is a
fair possibility that devices aren't responsive any more from the UI. This
seem to be caused by a race condition somewhere in the async code of Home
Assistant or pytradfri. Or possibly even the underlying apicoap library.

Since setting states through automation still seems to work and after
debugging for a couple of hours, I figured a workaround to this issue was the
least I could contribute. As I was unable to find the root cause. This at least
(partially) solves #9822 and #14386 and is almost equal to the proposal of
@max-te, but with a less frequent observation timeout.

Related issue (if applicable): partially fixes #9822 and #14386

Checklist:

The code change is tested and works locally.
Local tests pass with tox. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.

Other checks were not applicable.

homeassistant · 2018-11-15T21:46:47Z

Hi @alex3305,

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

@max-te

The IKEA Tradfri devices were configured by default that the observation of the individual devices never timed out. These observations are used to check the current state of the device. However, I have experienced that having an infinite observation, there is a fair possibility that devices aren't responsive any more from the UI. This seem to be caused by a race condition somewhere in the async code of Home Assistant or pytradfri. Or possibly even the underlying apicoap library. Since setting states through automation still seems to work and after debugging for a couple of hours, I figured a workaround to this issue was the least I could contribute. As I was unable to find the root cause. This at least (partially) solves #9822 and #14386 and is almost equal to the proposal of @max-te, but with a less frequent observation timeout.

alex3305 · 2018-11-16T09:06:48Z

Appearently Travis failed because I forgot to commit a variable. I amended that with an amended commit, but was not picked up by Travis unfortunately. Is there anyway to trigger Travis again?

Also it seems that the code that Travis fails on is unrelated.

lwis · 2018-11-16T10:10:51Z

It's a shame there are some oddities with the gateway and CoAP, when the connection dies the observation should end and be restarted automatically. While I don't (and have never) experienced this issue on my network, with my gateway, I understand that others do have issues with holding persistent connections to the gateway.

There have been talks about adding support for a heartbeat in aiocoap in the future , but I'm not sure where that ended up.

max-te · 2018-11-16T11:30:35Z

This differs from my workaround in that you don't call something like self.hass.loop.call_later(TIMEOUT, self._async_start_observe). I don't see how, in your code, a new observation is started after the timeout is reached.

lwis · 2018-11-16T11:37:51Z

@max-te good spot, I'm not sure aiocoap will execute a callback when the observation completes.

@alex3305 have you tested this?

alex3305 · 2018-11-16T13:50:43Z

@lwis I've not tested this extensively yet. I was hoping to see multiple people testing this. The aiocoap project didn't have a release in more than a year. Since the changes are quite extensive, the time it takes to trickle to through HASS would be quite long I suppose. This is just a workaround to ensure that I don't have to reboot my HASS install everyday :).

@max-te In libcoap when the duration timeout is reached, the err_callback function will be called and the timeout will be restarted. I don't know if this is the case with the aiocoap library. This can either be amended in Home Assistant or in pytradfri. What do you think?

Small edit @lwis I saw that you wrote most of the aiocoap event loop. Can you point to me where the duration is being used? I cannot seem to find any reference to it. Otherwise I can run a test with a very short timeout to check out how it is currently working...

max-te · 2018-11-17T07:07:19Z

At the time I was under the impression that err_callback isn't called when timeout is reached, I don't remember how I came to that conclusion though, so you might want to test this.

@max-te

Amends #18497 with an additional call to `loop_call_later`. According to a little more reasearch regarding this PR, me and @max-te saw that the `err_callback` wasn't called when the set duration timed out. Maybe this can be fixed in pytradfri, or as @lwis suggested there probably should be a heartbeat in pytradfri to prevent this kind of behaviour. Although this workaround works, this change can also cause a bit of a stack leak with Tradfri devices. But since the timeout is set at 1 hour at default, this shouldn't be much of an issue.

houndci-bot · 2018-11-17T14:11:34Z

-                                       duration=0)
+                                       duration=DEFAULT_OBSERVE_TIMEOUT)
            self.hass.async_create_task(self._api(cmd))
+            self.hass.loop.call_later(DEFAULT_OBSERVE_TIMEOUT - 1, 


trailing whitespace

houndci-bot · 2018-11-17T14:11:34Z

-                                      duration=0)
+                                      duration=DEFAULT_OBSERVE_TIMEOUT)
            self.hass.async_create_task(self._api(cmd))
+            self.hass.loop.call_later(DEFAULT_OBSERVE_TIMEOUT - 1, 


trailing whitespace

houndci-bot · 2018-11-17T14:11:34Z

-                                      duration=0)
+                                      duration=DEFAULT_OBSERVE_TIMEOUT)
            self.hass.async_create_task(self._api(cmd))
+            self.hass.loop.call_later(DEFAULT_OBSERVE_TIMEOUT - 1, 


trailing whitespace

lwis · 2018-11-17T14:13:19Z

Can you make this configurable with the default set to 0?

alex3305 · 2018-11-17T15:49:45Z

@lwis I cannot figure out how to get the configurable value working. With all the async threads being passed around and separate classes, it's quite hard to wrap my head around it.

Can you give me any pointers?

pvizeli · 2018-11-19T13:36:56Z

 KEY_API = 'tradfri_api'
 CONF_ALLOW_TRADFRI_GROUPS = 'allow_tradfri_groups'
 DEFAULT_ALLOW_TRADFRI_GROUPS = False
+DEFAULT_OBSERVE_TIMEOUT = 3600  # Set default timeout to 1 hour in seconds


The name suggests that is a default for options. Call it TIMEOUT_OBSERVE

I'm looking into it, also with @lwis suggestion to make it configurable.

But thanks for the suggestion.

alex3305 · 2018-11-25T15:23:39Z

I will close this PR, because I want to wait out for home-assistant-libs/pytradfri#208 to be merged and possibly released. That will at least make Tradfri more stable regarding updates and observations.

homeassistant added cla-error integration: tradfri platform: light.tradfri small-pr PRs with less than 30 lines. labels Nov 15, 2018

homeassistant added the cla-needed label Nov 15, 2018

ghost added the in progress label Nov 15, 2018

houndci-bot reviewed Nov 15, 2018

View reviewed changes

Comment thread homeassistant/components/switch/tradfri.py Outdated

homeassistant added cla-signed and removed cla-needed labels Nov 15, 2018

lwis closed this Nov 16, 2018

lwis reopened this Nov 16, 2018

ghost removed the in progress label Nov 16, 2018

ghost assigned lwis Nov 16, 2018

ghost added the in progress label Nov 16, 2018

MartinHjelmare added the cla-recheck label Nov 16, 2018

homeassistant removed the cla-recheck label Nov 16, 2018

MartinHjelmare removed the cla-error label Nov 16, 2018

home-assistant deleted a comment from homeassistant Nov 16, 2018

houndci-bot reviewed Nov 17, 2018

View reviewed changes

Comment thread homeassistant/components/switch/tradfri.py Outdated

Comment thread homeassistant/components/light/tradfri.py Outdated

Comment thread homeassistant/components/light/tradfri.py Outdated

houndci-bot reviewed Nov 17, 2018

View reviewed changes

pvizeli reviewed Nov 19, 2018

View reviewed changes

pvizeli added the Testing required label Nov 19, 2018

alex3305 mentioned this pull request Nov 21, 2018

A bit more stable observations home-assistant-libs/pytradfri#208

Closed

alex3305 closed this Nov 25, 2018

ghost removed the in progress label Nov 25, 2018

alex3305 mentioned this pull request Nov 25, 2018

Added async locks to Tradfri components #18708

Closed

3 tasks

ghost removed platform: light.tradfri labels Mar 21, 2019

Uh oh!

Conversation

alex3305 commented Nov 15, 2018

Description:

Checklist:

Uh oh!

homeassistant commented Nov 15, 2018

Uh oh!

Uh oh!

alex3305 commented Nov 16, 2018

Uh oh!

lwis commented Nov 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-te commented Nov 16, 2018

Uh oh!

lwis commented Nov 16, 2018

Uh oh!

alex3305 commented Nov 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-te commented Nov 17, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

houndci-bot Nov 17, 2018

Choose a reason for hiding this comment

Uh oh!

houndci-bot Nov 17, 2018

Choose a reason for hiding this comment

Uh oh!

houndci-bot Nov 17, 2018

Choose a reason for hiding this comment

Uh oh!

lwis commented Nov 17, 2018

Uh oh!

alex3305 commented Nov 17, 2018

Uh oh!

pvizeli Nov 19, 2018

Choose a reason for hiding this comment

Uh oh!

alex3305 Nov 19, 2018

Choose a reason for hiding this comment

Uh oh!

alex3305 commented Nov 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

lwis commented Nov 16, 2018 •

edited

Loading

alex3305 commented Nov 16, 2018 •

edited

Loading