Skip to content

Add get_url helper, deprecate base_url#35224

Merged
frenck merged 44 commits into
devfrom
frenck-2020-0469
May 8, 2020
Merged

Add get_url helper, deprecate base_url#35224
frenck merged 44 commits into
devfrom
frenck-2020-0469

Conversation

@frenck
Copy link
Copy Markdown
Member

@frenck frenck commented May 5, 2020

⚠️ Putting this up as an early draft now for viewing, work in progress.

Breaking change

The HTTP base_url is deprecated and replaced by an internal_url and external_url core configuration setting.

Proposed change

This PR is aimed at resolving the instance URL juggling/issues across the board. It introduces a new, single point, helper:

def async_get_url(
    hass: HomeAssistant,
    *,
    require_ssl: bool = False,
    require_standard_port: bool = False,
    allow_internal: bool = True,
    allow_external: bool = True,
    allow_cloud: bool = True,
    allow_ip: bool = True,
    prefer_external: bool = False,
    prefer_cloud: bool = False,
) -> str:

Default behavior

By default, without parameters on calling it, it will try to:

  • Get an internal URL set by the user, or if not available, try to detect one from the network interface (based on http integration settings).
  • If an internal URL fails, it will try to get an external URL. It prefers the external_url set by the user, in case that fails; Get a Home Assistant Cloud URL if that is available.

By default, nothing is required and anything is allowed.

base_url fallback during deprecation period

During the deprecation period, base_url will serve as a fallback for the internal & external URLs.

For the internal URL the base_url is only used if we are sure it is an internal URL (based on a local IP address in the host or .local in the domain of the host).

For the external URL the base_url is only used if we are sure it is NOT an internal URL (based on a local IP address in the host or .local in the domain of the host).

Tuning parameters

The method provides a list of parameters to get the URL one need for the use case. Explanation of the parameters:

  • require_ssl:
    Require the returned URL to use the https scheme.
  • require_standard_port:
    Require the returned URL to have port 80 for the http scheme, and port 443 on the https scheme.
  • allow_internal:
    Allow the URL to be an internal set URL by the user or a detected URL on the internal network. Set this one to False if one requires an external URL exclusively.
  • allow_external:
    Allow the URL to be an external set URL by the user or a Home Assistant Cloud URL. Set this one to False if one requires an internal URL exclusively.
  • allow_cloud:
    Allow a Home Assistant Cloud URL to be returned, set to False in case one requires anything but a Cloud URL.
  • allow_ip:
    Allow the host part of an URL to be an IP address, set to False in case that is not usable for the use case.
  • prefer_external:
    By default, we prefer internal URLs over external ones. Set this option to True to turn that logic around and prefer an external URL over an internal one.
  • prefer_cloud:
    By default an external URL set by the user is preferred, however, in rare cases a cloud URL might be more reliable. Setting this option to True prefers the Home Assistant Cloud URL above the user defined external URL.

Using this helper, most parts of Home Assistant should be able to get a working URL for their use case.

Context

Originally the PR started out with the following design document:

Frenck's URL helper variant.pdf

Original FlowChart:

FlowChart

Adapting to the real world

During initially implementation quickly was discovered we have many more cases, which have been incorporated. The results of this can be found in the proposal chapter above.

For example, Webhooks have 2 implementations, one from the Home Assistant Cloud, the other from the webhook integration itself. The built-in one, should not use cloud URLs.

The (now obsolete) get_external_url method, actually preferred the Home Assistant cloud over the base_url.

The general motivation and design of the document and flowchart still apply.

Request URL

In the discussion in this PR, it appears there is a need for having access to the request URL the user uses. This is currently out of scope for this PR as it aims towards resolving backend issues.

Getting that request URL is a little more complex as it seems. We need to add a ContextVar to the http integration, inject a bit of middleware to set values to that context. Also, while basically that is not hard, the hard part is handling proxied requests correctly. This involves processing the X-Forwarded-Proto header, which might have security implications.

Some other strategy could be: Only retrieve the host from the http context and try to map it to either the internal or external URL and set preference to either of those based on the current request.

However, for the future that is something to consider (although, I'm not completely sure if that should be part of this helper).

Migrations

The following integrations need to be migrated onto this new helper:

Those that rely on get_external_url helper:

  • alexa
  • almond

Those that rely on hass.config.api.base_url directly:

  • almond
  • ambiclimate
  • camera
  • cast
  • doorbird
  • fitbit
  • google_assistant
  • konnected
  • media_player
  • plex
  • smartthings
  • telegram_bot
  • tts
  • webhook
  • wink

Other things to migrate:

  • browser.open in __main__.py
  • zeroconf serviceinfo
  • oauth2 flow

Other tasks:

  • Adjust user documentation
  • Add developer documentation
  • Write small developer blog article
  • Add frontend for core configuration
  • Adjust Home Assistant Javascript Websocket library
  • Add back get_external_url (deprecated) with a warning in the logs.
    GitHub search revealed no usage, our codebase doesn't use it either.
    https://github.com/search?l=Python&p=1&q=async_get_external_url&type=Code
  • ⚠️ Detect direct access to the base_url property and drop a warning in the logs.
    Will make a separate PR for this, after this has been merged. It is ready and written, but is rather large. This PR is already huge.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Example entry for configuration.yaml:

# Example configuration.yaml

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

  • No score or internal
  • 🥈 Silver
  • 🥇 Gold
  • 🏆 Platinum

@probot-home-assistant
Copy link
Copy Markdown

Hey there @home-assistant/core, mind taking a look at this pull request as its been labeled with a integration (http) you are listed as a codeowner for? Thanks!
(message by CodeOwnersMention)

Comment thread homeassistant/helpers/network.py Outdated
Comment thread homeassistant/helpers/network.py Outdated
Comment thread homeassistant/helpers/network.py Outdated
Comment thread homeassistant/helpers/network.py Outdated
@elupus
Copy link
Copy Markdown
Contributor

elupus commented May 5, 2020

Can't we get it from http context variable if available?

@elupus
Copy link
Copy Markdown
Contributor

elupus commented May 5, 2020

Alternatively one could make use of https://docs.python.org/3/library/contextvars.html to store incoming request info, so we don't need to pass it along as a variable all the time.

@frenck
Copy link
Copy Markdown
Member Author

frenck commented May 5, 2020

Can't we get it from http context variable if available?

Get what exactly? I think I miss your point?

@elupus
Copy link
Copy Markdown
Contributor

elupus commented May 5, 2020

I may be off base, but on each http incoming request, it should be possible to deduce which hostname was used by the client to do the request. So any response that is the result of a request can provide the same host/ip as was used in the request rather than a static configuration.

That said, this will be hard/impossible to use for back end initiated things.

@frenck
Copy link
Copy Markdown
Member Author

frenck commented May 5, 2020

That definitely is a viable option to explore. This currently is mainly aimed backend wise. Where things like a Webhook, telegram bot or Samsung smart things are hard to set up / get right with the current configuration possibilities.

So let cover the backend first, and get everything migrated (which will be part of this PR). After that, we have a centralized place to improve on.

@jjlawren
Copy link
Copy Markdown
Contributor

jjlawren commented May 5, 2020

I may be off base, but on each http incoming request, it should be possible to deduce which hostname was used by the client to do the request. So any response that is the result of a request can provide the same host/ip as was used in the request rather than a static configuration.

This is my use case. The config flow for Plex sends the user to an external site and passes along an HA-generated callback URL to redirect the user back to HA to complete setup. If the base_url is incorrect or unset it fails. Some examples where this has occurred: link.

If a browser can successfully open the UI, the backend implicitly knows that's a valid URL for that specific session. It's possible for different users to reach the server using different paths, some of which can not be known by the backend ahead of time. For example, if a user is connecting to HA over a proxy or SSH tunnel. Since we have the built-in webserver to leverage, this should be doable automatically without asking the user to configure any hardcoded base URLs. I'd see this part as very useful for certain interactive config flows.

@frenck
Copy link
Copy Markdown
Member Author

frenck commented May 5, 2020

Since we have the built-in webserver to leverage, this should be doable automatically without asking the user to configure any hardcoded base URLs.

This goes south with reverse proxies in a lot of cases. While I hear you guys, we now have a single base_url, which is capable of defaulting to 127.0.0.1 in the worst case.

Right now, the focus is the backend case, not the frontend case.

That said, an improvement to your use case can already be made by using this helper, as it will catch way more cases.

Edit: so your linked case is caused by a nat loopback / hair pinning problem. That would be resolved by this PR, as it can differentiate between an internal and external URL, which right now, isn't possible.

@jjlawren
Copy link
Copy Markdown
Contributor

jjlawren commented May 5, 2020

Since we have the built-in webserver to leverage, this should be doable automatically without asking the user to configure any hardcoded base URLs.

This goes south with reverse proxies in a lot of cases.

Does it, though? If the URL the browser used to hit HA is passed all the way down the chain, then it should be reusable for other calls. I tested quickly behind some reverse proxies and with an SSH tunnel and the same URL I see in my browser was always held in request.url in http's handler factory. I'd be happy to test other scenarios to validate this theory.

@frenck
Copy link
Copy Markdown
Member Author

frenck commented May 5, 2020

Does it, though? If the URL the browser used to hit HA is passed all the way down the chain, then it should be reusable for other calls

It is in many cases, not all setups do that. But again, for now this is out of scope.

@frenck
Copy link
Copy Markdown
Member Author

frenck commented May 5, 2020

@elupus @jjlawren I've explored the option for adding a context var as a middleware to our aoihttp web application. While it works fine (in most cases), it goes south on a reverse proxy that does SSL offloading. The scheme will not be correct in those cases, which is intended behavior by aiohttp to avoid security issues.

@elupus
Copy link
Copy Markdown
Contributor

elupus commented May 5, 2020

SSL offloading is no problem. There is a header that should be set to indicate the originating protocol. Can't remember it now.

@elupus
Copy link
Copy Markdown
Contributor

elupus commented May 5, 2020

@frenck
Copy link
Copy Markdown
Member Author

frenck commented May 5, 2020

Yes, we have that header, but it relies on being set by the reverse proxy. So it is not 100% reliable. I guess we could add some logic like: When a reverse proxy is detected, and the forwarded proto not set: No URL available.

There are some concerns around using that with security, but we can ride along with the real IP detection middleware we already have.

I'm going to do some restructuring to this PR, but leaving this part out. However, will make it easier to add as a next step. I have a POC working and done for the request URL handling, including the X-Forwarded-Proto.

@elupus
Copy link
Copy Markdown
Contributor

elupus commented May 5, 2020

Imho, reverse proxy must be setup correctly. Also those headers should only be respected if source address is from an expected reverse proxy ip (or some secret key in header). Not checked code if that is verified.

@frenck
Copy link
Copy Markdown
Member Author

frenck commented May 5, 2020

@elupus It is verified by the trusted proxies list (real_ip middleware).

@frenck frenck force-pushed the frenck-2020-0469 branch from bf9200e to d948b53 Compare May 7, 2020 12:41
@frenck frenck marked this pull request as ready for review May 7, 2020 22:53
@frenck frenck merged commit 2223592 into dev May 8, 2020
@frenck frenck deleted the frenck-2020-0469 branch May 8, 2020 00:29
@frenck frenck mentioned this pull request May 8, 2020
20 tasks
Copy link
Copy Markdown
Member

@MartinHjelmare MartinHjelmare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper doesn't seem to need async context but we do mark it as such.

setup_platform(hass, config, add_entities, discovery_info)

start_url = f"{hass.config.api.base_url}{FITBIT_AUTH_CALLBACK_PATH}"
start_url = f"{async_get_url(hass)}{FITBIT_AUTH_CALLBACK_PATH}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not async context here. Looks like the whole platform is sync.

configurator.notify_errors(_configurator, error_msg)

start_url = f"{hass.config.api.base_url}{WINK_AUTH_CALLBACK_PATH}"
start_url = f"{async_get_url(hass)}{WINK_AUTH_CALLBACK_PATH}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration is sync.

}

try:
params["external_url"] = async_get_url(hass, allow_internal=False)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync context.

@frenck frenck mentioned this pull request May 8, 2020
20 tasks
@lock lock Bot locked and limited conversation to collaborators May 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants