Skip to content

Refactor cert_expiry for 3.7+, retry transient errors, wait for HA HTTP#32001

Closed
jjlawren wants to merge 4 commits intohome-assistant:devfrom
jjlawren:cert_3.7_refactor_and_retries
Closed

Refactor cert_expiry for 3.7+, retry transient errors, wait for HA HTTP#32001
jjlawren wants to merge 4 commits intohome-assistant:devfrom
jjlawren:cert_3.7_refactor_and_retries

Conversation

@jjlawren
Copy link
Copy Markdown
Contributor

@jjlawren jjlawren commented Feb 19, 2020

Proposed change

With Python 3.7+ there are new methods available in the ssl module which provide more detailed information on errors. This PR refactors around those messages.

Additionally, some of these errors can be considered transient, so an async retry mechanism with backoff has been added. The previous default of 12h was quite a long time to wait if a site was temporarily unavailable at startup.

Finally, the http integration has been marked as a dependency to hopefully resolve issues like #31964.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

  • No score or internal
  • 🥈 Silver
  • 🥇 Gold
  • 🏆 Platinum

@probot-home-assistant
Copy link
Copy Markdown

Hey there @cereal2nd, mind taking a look at this pull request as its been labeled with a integration (cert_expiry) you are listed as a codeowner for? Thanks!

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 19, 2020

Codecov Report

Merging #32001 into dev will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev   #32001      +/-   ##
==========================================
+ Coverage   94.69%    94.7%   +<.01%     
==========================================
  Files         766      766              
  Lines       55567    55590      +23     
==========================================
+ Hits        52619    52645      +26     
+ Misses       2948     2945       -3
Impacted Files Coverage Δ
...omeassistant/components/cert_expiry/config_flow.py 98.63% <100%> (+0.26%) ⬆️
homeassistant/bootstrap.py 75.4% <0%> (ø) ⬆️
...meassistant/components/homematicip_cloud/sensor.py 100% <0%> (ø) ⬆️
...omeassistant/components/homematicip_cloud/cover.py 100% <0%> (ø) ⬆️
setup.py 88.02% <0%> (+1.79%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32cd58e...aee7e42. Read the comment docs.

_LOGGER.error("Certificate does not match host: %s", host)
self._errors[CONF_HOST] = "wrong_host"
elif "certificate has expired" in err.verify_message:
_LOGGER.error("Certificate has expired: %s", host)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an error for setting up the config flow? Isn't the whole reason this sensor exists to check this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is meant to interactively help fix an existing broken setup. We could create a new sensor and assume the user will check the logs for more info on why the sensor is reporting a failed cert, but presenting this info in the UI feels like it would help resolve the problem faster.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if the user wants to set up the sensor, see that it's invalid, then fix the sensor and see it jump to valid ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't prevent a leak detector from being paired if it was currently detecting water.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@balloob : I'm not so sure... this component give us the number of days before expiration. If the certificate is already expired, the state will be negative and I'm not sure it will really work.

I would agree with you if the state was "IsExpired: yes/no" and one of the attribute was the number of days to go to the expiration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thought I agree with @balloob and this will be allowed in a new PR.

def retry_delay(self):
"""Return the retry delay in seconds."""
return int(min(2 ** (self._retry_attempts - 1) * 30, 3600))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally not a fan of such retry logic if that is just simple fix with SCAN_INTERVAL = timedelta(30min) and trow an Retry error on config entry setup if the server is not available. Just 2 lines that reduce the complexity of this PR and remove - 60 lines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler is definitely better. However, some (but not all) users monitoring the cert of the http interface still encounter this on every startup. Having the sensor unavailable for even 30min on startup is a long time. And a 30min interval is far too often for properly validating certs. I still like the exception-based retry mechanism for this use case and a 12h interval otherwise.

With this retry in place, the startup delays based on EVENT_HOMEASSISTANT_START can probably be removed to make things a bit simpler again.

What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I now see the ConfigEntryNotReady exception you're referencing. I think that should handle the retry logic on startup. 👍

@jjlawren jjlawren mentioned this pull request Feb 21, 2020
20 tasks
@jjlawren
Copy link
Copy Markdown
Contributor Author

Closing this in favor of #32066.

@jjlawren jjlawren closed this Feb 21, 2020
@lock lock Bot locked and limited conversation to collaborators Feb 22, 2020
@jjlawren jjlawren deleted the cert_3.7_refactor_and_retries branch April 24, 2020 03:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Certificate Expiry sensor comes up as 'Unavailable' on every restart

5 participants