Add get_url helper, deprecate base_url by frenck · Pull Request #35224 · home-assistant/core

frenck · 2020-05-05T01:35:48Z

⚠️ Putting this up as an early draft now for viewing, work in progress.

Breaking change

The HTTP base_url is deprecated and replaced by an internal_url and external_url core configuration setting.

Proposed change

This PR is aimed at resolving the instance URL juggling/issues across the board. It introduces a new, single point, helper:

def async_get_url(
    hass: HomeAssistant,
    *,
    require_ssl: bool = False,
    require_standard_port: bool = False,
    allow_internal: bool = True,
    allow_external: bool = True,
    allow_cloud: bool = True,
    allow_ip: bool = True,
    prefer_external: bool = False,
    prefer_cloud: bool = False,
) -> str:

Default behavior

By default, without parameters on calling it, it will try to:

Get an internal URL set by the user, or if not available, try to detect one from the network interface (based on http integration settings).
If an internal URL fails, it will try to get an external URL. It prefers the external_url set by the user, in case that fails; Get a Home Assistant Cloud URL if that is available.

By default, nothing is required and anything is allowed.

`base_url` fallback during deprecation period

During the deprecation period, base_url will serve as a fallback for the internal & external URLs.

For the internal URL the base_url is only used if we are sure it is an internal URL (based on a local IP address in the host or .local in the domain of the host).

For the external URL the base_url is only used if we are sure it is NOT an internal URL (based on a local IP address in the host or .local in the domain of the host).

Tuning parameters

The method provides a list of parameters to get the URL one need for the use case. Explanation of the parameters:

require_ssl:
Require the returned URL to use the https scheme.
require_standard_port:
Require the returned URL to have port 80 for the http scheme, and port 443 on the https scheme.
allow_internal:
Allow the URL to be an internal set URL by the user or a detected URL on the internal network. Set this one to False if one requires an external URL exclusively.
allow_external:
Allow the URL to be an external set URL by the user or a Home Assistant Cloud URL. Set this one to False if one requires an internal URL exclusively.
allow_cloud:
Allow a Home Assistant Cloud URL to be returned, set to False in case one requires anything but a Cloud URL.
allow_ip:
Allow the host part of an URL to be an IP address, set to False in case that is not usable for the use case.
prefer_external:
By default, we prefer internal URLs over external ones. Set this option to True to turn that logic around and prefer an external URL over an internal one.
prefer_cloud:
By default an external URL set by the user is preferred, however, in rare cases a cloud URL might be more reliable. Setting this option to True prefers the Home Assistant Cloud URL above the user defined external URL.

Using this helper, most parts of Home Assistant should be able to get a working URL for their use case.

Context

Originally the PR started out with the following design document:

Frenck's URL helper variant.pdf

Original FlowChart:

Adapting to the real world

During initially implementation quickly was discovered we have many more cases, which have been incorporated. The results of this can be found in the proposal chapter above.

For example, Webhooks have 2 implementations, one from the Home Assistant Cloud, the other from the webhook integration itself. The built-in one, should not use cloud URLs.

The (now obsolete) get_external_url method, actually preferred the Home Assistant cloud over the base_url.

The general motivation and design of the document and flowchart still apply.

Request URL

In the discussion in this PR, it appears there is a need for having access to the request URL the user uses. This is currently out of scope for this PR as it aims towards resolving backend issues.

Getting that request URL is a little more complex as it seems. We need to add a ContextVar to the http integration, inject a bit of middleware to set values to that context. Also, while basically that is not hard, the hard part is handling proxied requests correctly. This involves processing the X-Forwarded-Proto header, which might have security implications.

Some other strategy could be: Only retrieve the host from the http context and try to map it to either the internal or external URL and set preference to either of those based on the current request.

However, for the future that is something to consider (although, I'm not completely sure if that should be part of this helper).

Migrations

The following integrations need to be migrated onto this new helper:

Those that rely on get_external_url helper:

alexa
almond

Those that rely on hass.config.api.base_url directly:

Other things to migrate:

browser.open in __main__.py
zeroconf serviceinfo
oauth2 flow

Other tasks:

Adjust user documentation
Add developer documentation
Write small developer blog article
Add frontend for core configuration
Adjust Home Assistant Javascript Websocket library
~~Add back get_external_url (deprecated) with a warning in the logs.~~
GitHub search revealed no usage, our codebase doesn't use it either.
https://github.com/search?l=Python&p=1&q=async_get_external_url&type=Code
⚠️ Detect direct access to the base_url property and drop a warning in the logs.
Will make a separate PR for this, after this has been merged. It is ready and written, but is rather large. This PR is already huge.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Example entry for `configuration.yaml`:

# Example configuration.yaml

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request: Remove base_url, add external_url & internal_url home-assistant.io#13307
Link to frontend pull request: Add new core configuration UI for external_url & internal_url frontend#5755
Link to JavaScript WebSocket pull request: Add new external & internal URL core configs to HassConfig home-assistant-js-websocket#115
Link to Developer documentation pull request: Add get_url helper documentation developers.home-assistant#497
Link to Developer blog pull request: Add blogpost on new instance url helper developers.home-assistant#498

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Black (black --fast homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

No score or internal
🥈 Silver
🥇 Gold
🏆 Platinum

probot-home-assistant · 2020-05-05T01:39:42Z

Hey there @home-assistant/core, mind taking a look at this pull request as its been labeled with a integration (http) you are listed as a codeowner for? Thanks!
_{^{(message by CodeOwnersMention)}}

elupus · 2020-05-05T04:59:02Z

Can't we get it from http context variable if available?

elupus · 2020-05-05T05:02:36Z

Alternatively one could make use of https://docs.python.org/3/library/contextvars.html to store incoming request info, so we don't need to pass it along as a variable all the time.

frenck · 2020-05-05T09:01:05Z

Can't we get it from http context variable if available?

Get what exactly? I think I miss your point?

elupus · 2020-05-05T09:48:48Z

I may be off base, but on each http incoming request, it should be possible to deduce which hostname was used by the client to do the request. So any response that is the result of a request can provide the same host/ip as was used in the request rather than a static configuration.

That said, this will be hard/impossible to use for back end initiated things.

frenck · 2020-05-05T09:59:44Z

That definitely is a viable option to explore. This currently is mainly aimed backend wise. Where things like a Webhook, telegram bot or Samsung smart things are hard to set up / get right with the current configuration possibilities.

So let cover the backend first, and get everything migrated (which will be part of this PR). After that, we have a centralized place to improve on.

jjlawren · 2020-05-05T14:35:13Z

I may be off base, but on each http incoming request, it should be possible to deduce which hostname was used by the client to do the request. So any response that is the result of a request can provide the same host/ip as was used in the request rather than a static configuration.

This is my use case. The config flow for Plex sends the user to an external site and passes along an HA-generated callback URL to redirect the user back to HA to complete setup. If the base_url is incorrect or unset it fails. Some examples where this has occurred: link.

If a browser can successfully open the UI, the backend implicitly knows that's a valid URL for that specific session. It's possible for different users to reach the server using different paths, some of which can not be known by the backend ahead of time. For example, if a user is connecting to HA over a proxy or SSH tunnel. Since we have the built-in webserver to leverage, this should be doable automatically without asking the user to configure any hardcoded base URLs. I'd see this part as very useful for certain interactive config flows.

frenck · 2020-05-05T14:39:29Z

Since we have the built-in webserver to leverage, this should be doable automatically without asking the user to configure any hardcoded base URLs.

This goes south with reverse proxies in a lot of cases. While I hear you guys, we now have a single base_url, which is capable of defaulting to 127.0.0.1 in the worst case.

Right now, the focus is the backend case, not the frontend case.

That said, an improvement to your use case can already be made by using this helper, as it will catch way more cases.

Edit: so your linked case is caused by a nat loopback / hair pinning problem. That would be resolved by this PR, as it can differentiate between an internal and external URL, which right now, isn't possible.

jjlawren · 2020-05-05T14:58:47Z

Since we have the built-in webserver to leverage, this should be doable automatically without asking the user to configure any hardcoded base URLs.

This goes south with reverse proxies in a lot of cases.

Does it, though? If the URL the browser used to hit HA is passed all the way down the chain, then it should be reusable for other calls. I tested quickly behind some reverse proxies and with an SSH tunnel and the same URL I see in my browser was always held in request.url in http's handler factory. I'd be happy to test other scenarios to validate this theory.

frenck · 2020-05-05T15:03:21Z

Does it, though? If the URL the browser used to hit HA is passed all the way down the chain, then it should be reusable for other calls

It is in many cases, not all setups do that. But again, for now this is out of scope.

frenck · 2020-05-05T19:39:43Z

@elupus @jjlawren I've explored the option for adding a context var as a middleware to our aoihttp web application. While it works fine (in most cases), it goes south on a reverse proxy that does SSL offloading. The scheme will not be correct in those cases, which is intended behavior by aiohttp to avoid security issues.

elupus · 2020-05-05T19:50:33Z

SSL offloading is no problem. There is a header that should be set to indicate the originating protocol. Can't remember it now.

elupus · 2020-05-05T19:51:45Z

Found it: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-Proto

frenck · 2020-05-05T19:58:42Z

Yes, we have that header, but it relies on being set by the reverse proxy. So it is not 100% reliable. I guess we could add some logic like: When a reverse proxy is detected, and the forwarded proto not set: No URL available.

There are some concerns around using that with security, but we can ride along with the real IP detection middleware we already have.

I'm going to do some restructuring to this PR, but leaving this part out. However, will make it easier to add as a next step. I have a POC working and done for the request URL handling, including the X-Forwarded-Proto.

elupus · 2020-05-05T20:03:05Z

Imho, reverse proxy must be setup correctly. Also those headers should only be respected if source address is from an expected reverse proxy ip (or some secret key in header). Not checked code if that is verified.

frenck · 2020-05-05T20:25:45Z

@elupus It is verified by the trusted proxies list (real_ip middleware).

Co-authored-by: Paulus Schoutsen <balloob@gmail.com>

MartinHjelmare

The helper doesn't seem to need async context but we do mark it as such.

MartinHjelmare · 2020-05-08T13:14:25Z

            setup_platform(hass, config, add_entities, discovery_info)

-    start_url = f"{hass.config.api.base_url}{FITBIT_AUTH_CALLBACK_PATH}"
+    start_url = f"{async_get_url(hass)}{FITBIT_AUTH_CALLBACK_PATH}"


This is not async context here. Looks like the whole platform is sync.

MartinHjelmare · 2020-05-08T13:18:21Z

        configurator.notify_errors(_configurator, error_msg)

-    start_url = f"{hass.config.api.base_url}{WINK_AUTH_CALLBACK_PATH}"
+    start_url = f"{async_get_url(hass)}{WINK_AUTH_CALLBACK_PATH}"


This integration is sync.

MartinHjelmare · 2020-05-08T13:19:03Z

    }

+    try:
+        params["external_url"] = async_get_url(hass, allow_internal=False)


Sync context.

homeassistant added the cla-signed label May 5, 2020

probot-home-assistant Bot added breaking-change core has-tests integration: config integration: http new-feature labels May 5, 2020

balloob reviewed May 5, 2020

View reviewed changes

Comment thread homeassistant/helpers/network.py Outdated

balloob reviewed May 5, 2020

View reviewed changes

Comment thread homeassistant/helpers/network.py Outdated

balloob reviewed May 5, 2020

View reviewed changes

Comment thread homeassistant/helpers/network.py Outdated

balloob reviewed May 5, 2020

View reviewed changes

Comment thread homeassistant/helpers/network.py Outdated

frenck added 13 commits May 7, 2020 12:29

Migrate camera to get_url helper

4b87f78

Adjust alexa get_url params based on feedback

4f3a937

Migrate google_assistant to get_url helper

63520a5

Migrate tts to get_url helper

b7dd938

Fix plex tests after rebase

0549d66

Fix google_translate TTS tests

0dec544

Fix marytts TTS tests

8f0ec6b

Fix voicerrs TTS tests

5a63f6c

Fix yandextts TTS tests

be86e30

Fix tts tests

447e0c4

Remove SSL requirement from camera integration

af91fb2

Remove camera stream SSL requirement from google_assistant integration

8cd58b3

Always use 127.0.0.1 for main browser.open

d948b53

frenck force-pushed the frenck-2020-0469 branch from bf9200e to d948b53 Compare May 7, 2020 12:41

frenck and others added 4 commits May 7, 2020 18:28

Add local_ip to http ApiConfig

4061968

Raise exception in Alexa if no suitable URL found

8cbe55c

Co-authored-by: Paulus Schoutsen <balloob@gmail.com>

Remove Optional parameter from is_ip_address

5227ef9

Migrate base_url to core on first load

6d1ffe8

This was referenced May 7, 2020

Add get_url helper documentation home-assistant/developers.home-assistant#497

Merged

Add blogpost on new instance url helper home-assistant/developers.home-assistant#498

Merged

balloob added 2 commits May 7, 2020 15:22

Test migration base_url -> internal/external

122378c

Better test name

e991cc7

balloob approved these changes May 7, 2020

View reviewed changes

frenck marked this pull request as ready for review May 7, 2020 22:53

frenck merged commit 2223592 into dev May 8, 2020

frenck deleted the frenck-2020-0469 branch May 8, 2020 00:29

frenck mentioned this pull request May 8, 2020

Detect use of deprecated base_url #35353

Merged

20 tasks

MartinHjelmare reviewed May 8, 2020

View reviewed changes

frenck mentioned this pull request May 8, 2020

async_get_url -> get_url #35382

Merged

20 tasks

lock Bot locked and limited conversation to collaborators May 21, 2020

Uh oh!

Conversation

frenck commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Putting this up as an early draft now for viewing, work in progress.

Breaking change

Proposed change

Default behavior

base_url fallback during deprecation period

Tuning parameters

Context

Adapting to the real world

Request URL

Migrations

Type of change

Example entry for configuration.yaml:

Additional information

Checklist

Uh oh!

probot-home-assistant Bot commented May 5, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elupus commented May 5, 2020

Uh oh!

elupus commented May 5, 2020

Uh oh!

frenck commented May 5, 2020

Uh oh!

elupus commented May 5, 2020

Uh oh!

frenck commented May 5, 2020

Uh oh!

jjlawren commented May 5, 2020

Uh oh!

frenck commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjlawren commented May 5, 2020

Uh oh!

frenck commented May 5, 2020

Uh oh!

frenck commented May 5, 2020

Uh oh!

elupus commented May 5, 2020

Uh oh!

elupus commented May 5, 2020

Uh oh!

frenck commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elupus commented May 5, 2020

Uh oh!

frenck commented May 5, 2020

Uh oh!

MartinHjelmare left a comment

Choose a reason for hiding this comment

Uh oh!

MartinHjelmare May 8, 2020

Choose a reason for hiding this comment

Uh oh!

MartinHjelmare May 8, 2020

Choose a reason for hiding this comment

Uh oh!

MartinHjelmare May 8, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

frenck commented May 5, 2020 •

edited

Loading

`base_url` fallback during deprecation period

Example entry for `configuration.yaml`:

frenck commented May 5, 2020 •

edited

Loading

frenck commented May 5, 2020 •

edited

Loading