Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Can't log in (initial sync fails HTTP 500) #7349

Closed
HarHarLinks opened this issue Apr 26, 2020 · 7 comments
Closed

Can't log in (initial sync fails HTTP 500) #7349

HarHarLinks opened this issue Apr 26, 2020 · 7 comments
Labels
z-bug (Deprecated Label)

Comments

@HarHarLinks
Copy link
Contributor

HarHarLinks commented Apr 26, 2020

Description

A user of my homeserver has this issue:
after logging in from a new device, riot-desktop and riot-web are "Unable to connect to Homeserver. Retrying..." , and if cancelled "Unexpected error resolving homeserver configuration".

From my understanding, the initial sync of this user may contain malformed json, which leads to the initial sync failing. The user can still access the account from another device that have been logged in for a long time.

Steps to reproduce

  • have some existing chat history, incl encrypted and redacted
  • log in from new device

Specifically, synapse runs behind an nginx reverse proxy.
Error as seen from nginx:

"GET /_matrix/client/r0/sync?filter=12&timeout=0&_cacheBuster=1587900538750 HTTP/1.1" 500 78 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.123 Safari/537.36"

And from synapse homesever.log:

2020-04-26 11:29:00,034 - synapse.access.http.8008 - 282 - INFO - GET-453461- IP - 8008 - {@user:homeserver.tld} Processed request: 0.809sec/0.247sec (0.420sec, 0.040sec) (0.532sec/1.064sec/60) 67B 500 "GET /_matrix/client/r0/sync?filter=12&timeout=0&_cacheBuster=1587900538750 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.123 Safari/537.36" [21 dbevts]

Logs

2020-04-26 11:28:59,724 - synapse.http.server - 106 - ERROR - GET-453461- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7f8287e5ce20 method='GET' uri='/_matrix/client/r0/sync?filter=12&timeout=0&_cacheBuster=1587900538750' clientproto='HTTP/1.0' site=8008>
Traceback (most recent call last):
  File "/[...]/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)

StopIteration: [dict dump removed for privacy]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/[...]/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/[...]/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/[...]/python3.8/site-packages/synapse/handlers/sync.py", line 311, in _wait_for_sync_for_user
    result = await self.current_sync_for_user(
  File "/[...]/python3.8/site-packages/synapse/handlers/sync.py", line 343, in current_sync_for_user
    return await self.generate_sync_result(sync_config, since_token, full_state)
  File "/[...]/python3.8/site-packages/synapse/handlers/sync.py", line 999, in generate_sync_result
    res = await self._generate_sync_entry_for_rooms(
  File "/[...]/python3.8/site-packages/synapse/handlers/sync.py", line 1419, in _generate_sync_entry_for_rooms
    await concurrently_execute(handle_room_entries, room_entries, 10)
  File "/[...]/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/[...]/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/[...]/python3.8/site-packages/synapse/util/async_helpers.py", line 159, in _concurrently_execute_inner
    await maybe_awaitable(func(next(it)))
  File "/[...]/python3.8/site-packages/synapse/handlers/sync.py", line 1797, in _generate_room_entry
    batch = await self._load_filtered_recents(
  File "/[...]/python3.8/site-packages/synapse/handlers/sync.py", line 497, in _load_filtered_recents
    events, end_key = await self.store.get_recent_events_for_room(
  File "/[...]/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/[...]/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/[...]/python3.8/site-packages/synapse/storage/data_stores/main/stream.py", line 485, in get_recent_events_for_room
    events = yield self.get_events_as_list(
  File "/[...]/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/[...]/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/[...]/python3.8/site-packages/synapse/storage/data_stores/main/events_worker.py", line 273, in get_events_as_list
    event_entry_map = yield self._get_events_from_cache_or_db(
  File "/[...]/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/[...]/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/[...]/python3.8/site-packages/synapse/storage/data_stores/main/events_worker.py", line 420, in _get_events_from_cache_or_db
    missing_events = yield self._get_events_from_db(
  File "/[...]/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/[...]/python3.8/site-packages/synapse/storage/data_stores/main/events_worker.py", line 577, in _get_events_from_db
    d = json.loads(row["json"])
  File "/[...]/python3.8/site-packages/simplejson/__init__.py", line 525, in loads
    return _default_decoder.decode(s)
  File "/[...]/python3.8/site-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/[...]/python3.8/site-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2020-04-26 11:29:00,034 - synapse.access.http.8008 - 282 - INFO - GET-453461- IP - 8008 - {@user:homeserver.tld} Processed request: 0.809sec/0.247sec (0.420sec, 0.040sec) (0.532sec/1.064sec/60) 67B 500 "GET /_matrix/client/r0/sync?filter=12&timeout=0&_cacheBuster=1587900538750 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.123 Safari/537.36" [21 dbevts]

Specifically, the dict dumped after StopIteration contains several events with a 'json' field each.
I manually confirmed all these json strings as valid using a validator, except one which is not a "clear text" json string, but instead a hex-encoded string in the manner of '\\xMANYHEXCHARS'. Trying to load this hex with simplejson crashes it, however decoding it returns what seems to be valid json.

Related line from the source:
https://github.com/matrix-org/synapse/blob/master/synapse/storage/data_stores/main/events_worker.py#L577

Version information

  • Version: 1.12.4 (accessed with riot-web/desktop 1.5.15)
  • Install method: pip in pyenv
  • Platform: Debian GNU/Linux 9.12 (stretch)
  • additional: using synapse-s3-storage provider
@HarHarLinks
Copy link
Contributor Author

HarHarLinks commented Apr 26, 2020

Follow-up:
using postgresql 9.6
manually updating the hex encoded json to normal json in event_json seems to have resolved the issue.

@babolivier
Copy link
Contributor

Hey there, could you give me some info to figure out what happened here? More specifically:

  • What's the type of the json column in event_json? You can see that by looking at the table's schema by running \d event_json in PostgreSQL or .schema event_json in SQLite.
  • The output of SELECT * FROM events WHERE event_id = '[EVENT_ID]', replacing [EVENT_ID] with the value for event_id in that problematic row.
  • The output of SELECT * FROM redactions WHERE event_id = '[EVENT_ID]'; (same thing as above about [EVENT_ID]).

@babolivier babolivier added z-bug (Deprecated Label) info-needed labels Apr 27, 2020
@HarHarLinks
Copy link
Contributor Author

Note: the other json in the table (checked some random ones) seems alright (normal, human readable json).

  • schema:
synapse=# \d event_json
        Table "public.event_json"
      Column       |  Type   | Modifiers
-------------------+---------+-----------
 event_id          | text    | not null
 room_id           | text    | not null
 internal_metadata | text    | not null
 json              | text    | not null
 format_version    | integer |
Indexes:
    "event_json_event_id_key" UNIQUE CONSTRAINT, btree (event_id)
    "event_json_room_id" btree (room_id)
  • event:
 stream_ordering | topological_ordering |              event_id              |       type       |              room_id               | content | unrecognized_keys | processed | outlier | depth | origin_server_ts |  received_ts  |         sender         | contains_url
-----------------+----------------------+------------------------------------+------------------+------------------------------------+---------+-------------------+-----------+---------+-------+------------------+---------------+------------------------+--------------
          395100 |                 2641 | $myevent:homeserver.tld            | m.room.encrypted | !myroom:homeserver.tld             |         |                   | t         | f       |  2641 |    1570385506832 | 1570385506895 | @user:homeserver.tld   | f
  • redactions:
synapse=# select * from redactions where event_id = 'mybrokenevent';
 event_id | redacts | have_censored | received_ts
----------+---------+---------------+-------------
(0 rows)

however, the broken event itself was redacted

synapse=# select * from redactions where redacts = 'mybrokenevent';
              event_id              |              redacts               | have_censored |  received_ts
------------------------------------+------------------------------------+---------------+---------------
     $redactevent:homeserver.tld    | $mybrokenevent:homeserver.tld      | t             | 1570385575166

@babolivier
Copy link
Contributor

Duh, yes, I mistyped the last request, sorry about that and thanks for correcting it.

So it looks like the column has the right type, and that it's #4475, though this bug should have been fixed #6185 (which included a database update that should have applied to this event), so I'm a bit puzzled here.

Trying to figure out what happened, do you remember migrating your homeserver's database from SQLite to PostgreSQL after this redaction was sent (Oct 6)?
Also, can you please run SELECT * FROM applied_schema_deltas WHERE file = '56/redaction_censor3_fix_update.sql.postgres'; to make sure that update applied correctly?

@HarHarLinks
Copy link
Contributor Author

HarHarLinks commented Apr 28, 2020

Indeed, this database was migrated this year. I had to purge the psql 1-2 times after running into space issues, and ended up mounting the sqlite through network. In the end it seemingly succeeded.

synapse=# SELECT * FROM applied_schema_deltas WHERE file = '56/redaction_censor3_fix_update.sql.postgres';
 version |                     file
---------+----------------------------------------------
      56 | 56/redaction_censor3_fix_update.sql.postgres

@babolivier
Copy link
Contributor

Indeed, this database was migrated this year

Right, I think there was an issue with the migration script a few months ago, that might have explained it?

Anyway, I believe this might just be a one-off issue which might not require a fix on Synapse's codebase. You can make Synapse look for other events that might have that issue by running:

INSERT INTO background_updates (update_name, progress_json)
   VALUES ('event_fix_redactions_bytes', '{}');

in psql and restarting Synapse.

@HarHarLinks
Copy link
Contributor Author

Thanks for looking into this, running the background update now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

2 participants