Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message push tests flaky? #814

Closed
clokep opened this issue Feb 19, 2020 · 6 comments
Closed

Message push tests flaky? #814

clokep opened this issue Feb 19, 2020 · 6 comments

Comments

@clokep
Copy link
Member

clokep commented Feb 19, 2020

Saw this failure for Synapse: https://buildkite.com/matrix-dot-org/synapse/builds/7244#7e0371f0-d1e0-4af7-9a94-68479007a4dc

Some relevant logs:

tests/61push/01message-pushed.pl:
    Test 747 Test that a message is pushed... OK
    Test 748 Invites are pushed... OK
    Test 749 Rooms with names are correctly named in pushed... OK
    Test 750 Rooms with canonical alias are correctly named in pushed... FAIL
    Test 751 Rooms with many users are correctly pushed... FAIL
    Test 752 Don't get pushed for rooms you've muted... 2020-02-19 18:07:31.730 UTC [388] ERROR:  could not serialize access due to concurrent update
2020-02-19 18:07:31.730 UTC [388] STATEMENT:  DELETE FROM presence_stream WHERE stream_id < 842 AND user_id = ANY(ARRAY['@anon-20200219_175626-870:localhost:8831','@anon-20200219_175626-905:localhost:8831','@anon-20200219_175626-872:localhost:8831','@anon-20200219_175626-871:localhost:8831','@anon-20200219_175626-892:localhost:8831','@anon-20200219_175626-904:localhost:8831'])
FAIL
    Test 753 Rejected events are not pushed... FAIL

results.tap

ok 747 Test that a message is pushed (6 subtests)
ok 748 Invites are pushed
ok 749 Rooms with names are correctly named in pushed
not ok 750 Rooms with canonical alias are correctly named in pushed
# Started: 2020-02-19 18:07:10.734
# Ended: 2020-02-19 18:07:20.870
# Timed out waiting for test at /sytest/run-tests.pl line 745.
# 0.128664: Registered new user @anon-20200219_175626-911:localhost:8831
# 0.133333: Registered new user @anon-20200219_175626-910:localhost:8831
# 0.739873: User @anon-20200219_175626-910:localhost:8831 joined room
# { room_id => "!VrusazlxwxeQvatumA:localhost:8831" }
# 0.956120: Sent 'm.room.message' event in !VrusazlxwxeQvatumA:localhost:8831: $zf6XIXFNkevvVMLFcFEUwV-IZEpMciUpluGDraZSJDU
not ok 751 Rooms with many users are correctly pushed
# Started: 2020-02-19 18:07:20.870
# Ended: 2020-02-19 18:07:31.050
# Timed out waiting for test at /sytest/run-tests.pl line 745.
# 0.171445: Registered new user @anon-20200219_175626-912:localhost:8831
# 0.172933: Registered new user @anon-20200219_175626-913:localhost:8831
# 0.177487: Registered new user @anon-20200219_175626-914:localhost:8831
# 0.772685: User @anon-20200219_175626-912:localhost:8831 joined room
# { room_id => "!voZsmtjFAUNLnobvCC:localhost:8831" }
# 1.010684: User @anon-20200219_175626-914:localhost:8831 joined room
# { room_id => "!voZsmtjFAUNLnobvCC:localhost:8831" }
# 1.131119: Sent 'm.room.message' event in !voZsmtjFAUNLnobvCC:localhost:8831: $buVkfZUQjT7Bq2tRdg9K7CT2guWRLP3ksZCny1QELXI
not ok 752 Don't get pushed for rooms you've muted
# Started: 2020-02-19 18:07:31.050
# Ended: 2020-02-19 18:07:41.177
# Timed out waiting for test at /sytest/run-tests.pl line 745.
# 0.122006: Registered new user @anon-20200219_175626-915:localhost:8831
# 0.123858: Registered new user @anon-20200219_175626-916:localhost:8831
# 0.781483: User @anon-20200219_175626-915:localhost:8831 joined room
# { room_id => "!lDBUtWtUCQSLhKIBIA:localhost:8831" }
# 0.781779: room_id
# "!lDBUtWtUCQSLhKIBIA:localhost:8831"
# 0.904870: Sent 'm.room.message' event in !lDBUtWtUCQSLhKIBIA:localhost:8831: $nZN-ru7tAVVOOpxHbm9UdOy8skelgsTm6jY4WSpU6FU
not ok 753 Rejected events are not pushed
# Started: 2020-02-19 18:07:41.177
# Ended: 2020-02-19 18:07:51.843
# Timed out waiting for test at /sytest/run-tests.pl line 745.
# 0.075101: Registered new user @anon-20200219_175626-917:localhost:8831
# 0.658629: Joined room 1: !qYjxesziFCeUEyhAhf:localhost:8831
# 0.661887: Rejected event $dubeOHQQ9I2dU-RtMv1jSFdGLPf8LVo9Dtu-wvs_4Ig
# 0.661958: Regular event $YqAi7r3AlAvbzDQQV9Jx_40SqL4Yk7imNW1aLxAM0l8

@richvdh had some thoughts:

another nice thing to do if you're lost is just to throw more diagnostics into the test which might help someone doing future investigation
from the logs there, those tests look like they might be following the antipattern of "wait indefinitely for X to happen"
the problem being that if X never happens, you don't get to find out what X was

@clokep
Copy link
Member Author

clokep commented Feb 19, 2020

Looking through some of the other logs, this looks relevant:

server-0/pusher.log

2020-02-19 18:07:10,925 - synapse.app.pusher - 177 - INFO - replication-RDATA-pushers-10 - Starting pusher '@anon-20200219_175626-910:localhost:8831' / 'sytest:a_push_key'
2020-02-19 18:07:10,928 - synapse.storage.data_stores.main.pusher - 45 - WARNING - replication-RDATA-pushers-10 - Invalid JSON in data for pusher 13: \x7b2275726c223a2268747470733a2f2f6c6f63616c686f73743a34363231312f616c6963655f70757368227d, Expecting value: line 1 column 1 (char 0)

@richvdh
Copy link
Member

richvdh commented Feb 19, 2020

2020-02-19 18:07:10,928 - synapse.storage.data_stores.main.pusher - 45 - WARNING - replication-RDATA-pushers-10 - Invalid JSON in data for pusher 13: \x7b2275726c223a2268747470733a2f2f6c6f63616c686f73743a34363231312f616c6963655f70757368227d, Expecting value: line 1 column 1 (char 0)

iiiiinteresting. This sort of thing comes up every so often thanks to matrix-org/synapse#4475. We have suspicions that occasionally bytea_output isn't being set on some database connections, though can't figure out why and have never managed to reproduce it.

For reference, here's what we get when we decode that:

echo "7b2275726c223a2268747470733a2f2f6c6f63616c686f73743a34363231312f616c6963655f70757368227d" | perl -pe 's/(..)/chr hex $1/eg'
{"url":"https://localhost:46211/alice_push"}

@clokep
Copy link
Member Author

clokep commented Feb 20, 2020

That sounds related. Is your guess that this is flaky due to sometimes ending up with a "bad" character that can't be stored into the table properly?

@richvdh
Copy link
Member

richvdh commented Feb 20, 2020

no, as per the above there is no bad character in the stored sequence. My guess is that this is flaky because we are dropping the bytea_output setting on the postgres connection somehow.

@richvdh
Copy link
Member

richvdh commented Feb 20, 2020

(one solution to this would be to fix things so that we don't need to care about bytea_output, but given we set the isolation level at the same time as bytea_output, I'd really like to get to the bottom of what's causing it to not get set correctly.)

@richvdh
Copy link
Member

richvdh commented Jan 14, 2021

hopefully this particular failure will be fixed by matrix-org/synapse#9117.

@richvdh richvdh closed this as completed Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants