Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Error while fetching OpenID metadata stops Synapse from initializing #8088

Open
Rafaeltheraven opened this issue Aug 14, 2020 · 5 comments
Open
Labels
A-SSO Single Sign-On (maybe OIDC) T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-p3 (Deprecated Label)

Comments

@Rafaeltheraven
Copy link

Rafaeltheraven commented Aug 14, 2020

Description

As the title says, my Synapse instance started crashing once the linked openid provider (selfhosted keycloak) suddenly crashed and began returning 502 errors as you can read in the following log:

2020-08-14 08:35:16,129 - synapse.http.client - 301 - INFO -  - Sending request GET https://domain/auth/realms/realm/.well-known/openid-configuration
2020-08-14 08:35:16,184 - synapse.http.client - 340 - INFO -  - Received response to GET https://domain/auth/realms/realm/.well-known/openid-configuration: 502
2020-08-14 08:35:16,184 - twisted - 192 - ERROR -  - Error during startup:
2020-08-14 08:35:16,185 - twisted - 192 - ERROR -  - Traceback (most recent call last):
2020-08-14 08:35:16,185 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
2020-08-14 08:35:16,186 - twisted - 192 - ERROR -  -     current.result = callback(current.result, *args, **kw)
2020-08-14 08:35:16,186 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
2020-08-14 08:35:16,186 - twisted - 192 - ERROR -  -     _inlineCallbacks(r, g, status)
2020-08-14 08:35:16,186 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
2020-08-14 08:35:16,186 - twisted - 192 - ERROR -  -     result = result.throwExceptionIntoGenerator(g)
2020-08-14 08:35:16,187 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
2020-08-14 08:35:16,187 - twisted - 192 - ERROR -  -     return g.throw(self.type, self.value, self.tb)
2020-08-14 08:35:16,187 - twisted - 192 - ERROR -  - --- <exception caught here> ---
2020-08-14 08:35:16,188 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/app/homeserver.py", line 440, in start
2020-08-14 08:35:16,188 - twisted - 192 - ERROR -  -     yield defer.ensureDeferred(oidc.load_metadata())
2020-08-14 08:35:16,188 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
2020-08-14 08:35:16,189 - twisted - 192 - ERROR -  -     result = result.throwExceptionIntoGenerator(g)
2020-08-14 08:35:16,189 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
2020-08-14 08:35:16,189 - twisted - 192 - ERROR -  -     return g.throw(self.type, self.value, self.tb)
2020-08-14 08:35:16,189 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/handlers/oidc_handler.py", line 241, in load_metadata
2020-08-14 08:35:16,190 - twisted - 192 - ERROR -  -     metadata_response = await self._http_client.get_json(url)
2020-08-14 08:35:16,190 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
2020-08-14 08:35:16,190 - twisted - 192 - ERROR -  -     result = result.throwExceptionIntoGenerator(g)
2020-08-14 08:35:16,190 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
2020-08-14 08:35:16,191 - twisted - 192 - ERROR -  -     return g.throw(self.type, self.value, self.tb)
2020-08-14 08:35:16,191 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/http/client.py", line 465, in get_json
2020-08-14 08:35:16,191 - twisted - 192 - ERROR -  -     body = yield self.get_raw(uri, args, headers=headers)
2020-08-14 08:35:16,191 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
2020-08-14 08:35:16,192 - twisted - 192 - ERROR -  -     result = g.send(result)
2020-08-14 08:35:16,192 - twisted - 192 - ERROR -  -   File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/http/client.py", line 547, in get_raw
2020-08-14 08:35:16,192 - twisted - 192 - ERROR -  -     raise HttpResponseException(response.code, response.phrase, body)
2020-08-14 08:35:16,192 - twisted - 192 - ERROR -  - synapse.api.errors.HttpResponseException: 502: b'Bad Gateway'
2020-08-14 08:35:16,193 - synapse.handlers.presence - 327 - INFO - presence.on_shutdown-0 - Performing _on_shutdown. Persisting 158 unpersisted changes
2020-08-14 08:35:16,194 - synapse.handlers.presence - 338 - INFO - presence.on_shutdown-0 - Finished _on_shutdown
2020-08-14 08:35:16,196 - twisted - 192 - INFO -  - Main loop terminated.

Not sure how to properly deal with this, since I'm no expert on either synapse or openid, but it might be a good idea to at least have a catch for 502 errors.

Steps to reproduce

  • Point synapse to an openid provider
  • Make the provider return 502
  • Crash

Version information

  • Personal homeserver
  • Version: 1.18.0

  • Install method: debian package

  • Platform: debian 10

Quick Edit: This seems to have happened after my server randomly restarted, making it very possible for others to run into a similar issue. If synapse starts before the openid provider, it crashes badly.

@clokep
Copy link
Member

clokep commented Aug 14, 2020

To add a bit more info here:

  • During start-up the OpenID code needs to fetch metadata from the OpenID provider.
  • Since the system restarted, the OpenID provider was likely not running yet, so the request failed and was not retried.

We probably want to retry this a few times with a backoff? Not sure what else we could do. We could also not attempt to load the metadata until someone tries to login, but that seems like it would be much harder to debug.

@clokep clokep added z-p3 (Deprecated Label) A-SSO Single Sign-On (maybe OIDC) labels Aug 14, 2020
@clokep clokep changed the title OpenID 502 causes crash Error while fetching OpenID metadata stops Synapse from initializing Aug 14, 2020
@Rafaeltheraven
Copy link
Author

Rafaeltheraven commented Aug 14, 2020

While not getting metadata until needed would probably be the cleanest solution, I can understand it being hard to debug. For now, maybe it would be an idea to leave a warning about this somewhere and telling admins to make sure their openid provider is started before synapse? (I've set this up through systemd Before= which I think should work but I'm not going to just shut down my server to test :P)

@clokep
Copy link
Member

clokep commented Aug 14, 2020

For now, maybe it would be an idea to leave a warning about this somewhere and telling admins to make sure their openid provider is started before synapse? (I've set this up through systemd Before= which I think should work but I'm not going to just shut down my server to test :P)

This makes sense if you're running them both on the same system! Might make sense to add a note to https://github.com/matrix-org/synapse/blob/develop/docs/openid.md

@luilegeant
Copy link

Occurred to me this morning: with synapse 1.28.0.
My setup is in docker containers, after applying updates to the host (including docker binaries) everything restarted (expected) but Keycloak was too slow for synapse's expectations. A few synapse crashes later (container auto restart after crash) once Keycloak was availlable again, it came back to a working state.

@acuteaura
Copy link

The workaround for docker-compose with keycloak in docker is to add a health check.

It's a little more complicated, since you need to add a curl binary to the container.

And then you can add something like this to your keycloak compose service:

    healthcheck:
      test: curl --head -fsS http://localhost:8080/realms/master
      interval: 10s
      timeout: 10s
      retries: 5
      start_period: 10s

And a dependency in synapse:

    depends_on:
      keycloak:
        condition: service_healthy
        restart: true

@squahtx squahtx added the T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. label May 3, 2023
squahtx added a commit that referenced this issue May 3, 2023
…15530)

#15514 introduced a regression where Synapse would encounter
`PartialDownloadError`s when fetching OpenID metadata for certain
providers on startup. Due to #8088, this prevents Synapse from starting
entirely.

Revert the change while we decide what to do about the regression.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-SSO Single Sign-On (maybe OIDC) T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-p3 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

5 participants