Skip to content

Added a default timeout to Tradfri observations#18497

Closed
alex3305 wants to merge 2 commits intohome-assistant:devfrom
alex3305:tradfri-observe-reset
Closed

Added a default timeout to Tradfri observations#18497
alex3305 wants to merge 2 commits intohome-assistant:devfrom
alex3305:tradfri-observe-reset

Conversation

@alex3305
Copy link
Copy Markdown
Contributor

Description:

The IKEA Tradfri devices were configured by default that the observation of the
individual devices never timed out. These observations are used to check the
current state of the device.

However, I have experienced that having an infinite observation, there is a
fair possibility that devices aren't responsive any more from the UI. This
seem to be caused by a race condition somewhere in the async code of Home
Assistant or pytradfri. Or possibly even the underlying apicoap library.

Since setting states through automation still seems to work and after
debugging for a couple of hours, I figured a workaround to this issue was the
least I could contribute. As I was unable to find the root cause. This at least
(partially) solves #9822 and #14386 and is almost equal to the proposal of
@max-te, but with a less frequent observation timeout.

Related issue (if applicable): partially fixes #9822 and #14386

Checklist:

  • The code change is tested and works locally.
  • Local tests pass with tox. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.

Other checks were not applicable.

@homeassistant
Copy link
Copy Markdown
Contributor

Hi @alex3305,

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

Comment thread homeassistant/components/switch/tradfri.py Outdated
The IKEA Tradfri devices were configured by default that the observation of the
individual devices never timed out. These observations are used to check the
current state of the device.

However, I have experienced that having an infinite observation, there is a
fair possibility that devices aren't responsive any more from the UI. This
seem to be caused by a race condition somewhere in the async code of Home
Assistant or pytradfri. Or possibly even the underlying apicoap library.

Since setting states through automation still seems to work and after
debugging for a couple of hours, I figured a workaround to this issue was the
least I could contribute. As I was unable to find the root cause. This at least
(partially) solves #9822 and #14386 and is almost equal to the proposal of
@max-te, but with a less frequent observation timeout.
@alex3305
Copy link
Copy Markdown
Contributor Author

Appearently Travis failed because I forgot to commit a variable. I amended that with an amended commit, but was not picked up by Travis unfortunately. Is there anyway to trigger Travis again?

Also it seems that the code that Travis fails on is unrelated.

@lwis lwis closed this Nov 16, 2018
@lwis lwis reopened this Nov 16, 2018
@ghost ghost removed the in progress label Nov 16, 2018
@ghost ghost assigned lwis Nov 16, 2018
@ghost ghost added the in progress label Nov 16, 2018
@lwis
Copy link
Copy Markdown
Member

lwis commented Nov 16, 2018

It's a shame there are some oddities with the gateway and CoAP, when the connection dies the observation should end and be restarted automatically. While I don't (and have never) experienced this issue on my network, with my gateway, I understand that others do have issues with holding persistent connections to the gateway.

There have been talks about adding support for a heartbeat in aiocoap in the future , but I'm not sure where that ended up.

@max-te
Copy link
Copy Markdown
Contributor

max-te commented Nov 16, 2018

This differs from my workaround in that you don't call something like self.hass.loop.call_later(TIMEOUT, self._async_start_observe). I don't see how, in your code, a new observation is started after the timeout is reached.

@lwis
Copy link
Copy Markdown
Member

lwis commented Nov 16, 2018

@max-te good spot, I'm not sure aiocoap will execute a callback when the observation completes.

@alex3305 have you tested this?

@alex3305
Copy link
Copy Markdown
Contributor Author

alex3305 commented Nov 16, 2018

@lwis I've not tested this extensively yet. I was hoping to see multiple people testing this. The aiocoap project didn't have a release in more than a year. Since the changes are quite extensive, the time it takes to trickle to through HASS would be quite long I suppose. This is just a workaround to ensure that I don't have to reboot my HASS install everyday :).

@max-te In libcoap when the duration timeout is reached, the err_callback function will be called and the timeout will be restarted. I don't know if this is the case with the aiocoap library. This can either be amended in Home Assistant or in pytradfri. What do you think?

Small edit @lwis I saw that you wrote most of the aiocoap event loop. Can you point to me where the duration is being used? I cannot seem to find any reference to it. Otherwise I can run a test with a very short timeout to check out how it is currently working...

@max-te
Copy link
Copy Markdown
Contributor

max-te commented Nov 17, 2018

At the time I was under the impression that err_callback isn't called when timeout is reached, I don't remember how I came to that conclusion though, so you might want to test this.

Comment thread homeassistant/components/switch/tradfri.py Outdated
Comment thread homeassistant/components/light/tradfri.py Outdated
Comment thread homeassistant/components/light/tradfri.py Outdated
Amends #18497 with an additional call to `loop_call_later`. According to
a little more reasearch regarding this PR, me and @max-te saw that the
`err_callback` wasn't called when the set duration timed out.

Maybe this can be fixed in pytradfri, or as @lwis suggested there probably
should be a heartbeat in pytradfri to prevent this kind of behaviour.

Although this workaround works, this change can also cause a bit of a stack
leak with Tradfri devices. But since the timeout is set at 1 hour at default,
this shouldn't be much of an issue.
duration=0)
duration=DEFAULT_OBSERVE_TIMEOUT)
self.hass.async_create_task(self._api(cmd))
self.hass.loop.call_later(DEFAULT_OBSERVE_TIMEOUT - 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing whitespace

duration=0)
duration=DEFAULT_OBSERVE_TIMEOUT)
self.hass.async_create_task(self._api(cmd))
self.hass.loop.call_later(DEFAULT_OBSERVE_TIMEOUT - 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing whitespace

duration=0)
duration=DEFAULT_OBSERVE_TIMEOUT)
self.hass.async_create_task(self._api(cmd))
self.hass.loop.call_later(DEFAULT_OBSERVE_TIMEOUT - 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing whitespace

@lwis
Copy link
Copy Markdown
Member

lwis commented Nov 17, 2018

Can you make this configurable with the default set to 0?

@alex3305
Copy link
Copy Markdown
Contributor Author

@lwis I cannot figure out how to get the configurable value working. With all the async threads being passed around and separate classes, it's quite hard to wrap my head around it.

Can you give me any pointers?

KEY_API = 'tradfri_api'
CONF_ALLOW_TRADFRI_GROUPS = 'allow_tradfri_groups'
DEFAULT_ALLOW_TRADFRI_GROUPS = False
DEFAULT_OBSERVE_TIMEOUT = 3600 # Set default timeout to 1 hour in seconds
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name suggests that is a default for options. Call it TIMEOUT_OBSERVE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into it, also with @lwis suggestion to make it configurable.

But thanks for the suggestion.

@alex3305
Copy link
Copy Markdown
Contributor Author

I will close this PR, because I want to wait out for home-assistant-libs/pytradfri#208 to be merged and possibly released. That will at least make Tradfri more stable regarding updates and observations.

@alex3305 alex3305 closed this Nov 25, 2018
@ghost ghost removed the in progress label Nov 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trådfri ValueError on init with async code

7 participants