Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Complement test TestPartialStateJoin/CanReceiveEventsWithMissingParentsDuringPartialStateJoin is flaky #13564

Closed
richvdh opened this issue Aug 19, 2022 · 6 comments · Fixed by matrix-org/complement#456 or matrix-org/complement#570
Assignees
Labels
A-Federated-Join joins over federation generally suck A-Testing Issues related to testing in complement, synapse, etc O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact Z-Flake Tests that give intermittent failures

Comments

@richvdh
Copy link
Member

richvdh commented Aug 19, 2022

https://github.com/matrix-org/synapse/runs/7915629022?check_suite_focus=true#step:5:6663

@richvdh
Copy link
Member Author

richvdh commented Aug 19, 2022

sticking this on the faster-joins backlog, since it's a faster-joins test

@anoadragon453 anoadragon453 added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Flake Tests that give intermittent failures labels Aug 19, 2022
@squahtx
Copy link
Contributor

squahtx commented Aug 23, 2022

CanReceiveEventsDuringPartialStateJoin and CanReceiveEventsWithHalfMissingParentsDuringPartialStateJoin are also flaky, presumably for the same reason.
https://github.com/matrix-org/synapse/runs/7978919099?check_suite_focus=true
https://github.com/matrix-org/synapse/runs/7978705963?check_suite_focus=true

@clokep clokep added O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Tolerable Minor significance, cosmetic issues, low or no impact to users. labels Aug 24, 2022
@richvdh richvdh added the A-Federated-Join joins over federation generally suck label Aug 25, 2022
@richvdh richvdh self-assigned this Aug 25, 2022
@richvdh
Copy link
Member Author

richvdh commented Aug 26, 2022

The problem here appears to be that the /state_ids request is racing against the current-state calculation:

  synapse_main | 2022-08-25 16:45:29,290 - synapse.handlers.federation - 1653 - INFO - sync_partial_state_room-0 - Updating current state for !0:host.docker.internal:46871
  synapse_main | 2022-08-25 16:45:29,312 - synapse.http.server - 169 - INFO - GET-13 - <SynapseRequest at 0x7f994cefe370 method='GET' uri='/_matrix/federation/v1/state_ids/%210:host.docker.internal:46871?event_id=%24bO0OjeuzEcRmdWzNWTJLIKtM2MQy_VET2J1-4bayLBg' clientproto='HTTP/1.0' site='8080'> SynapseError: 403 - Host not in room.
  synapse_main | 2022-08-25 16:45:29,314 - synapse.access.http.8080 - 450 - INFO - GET-13 - ::ffff:127.0.0.1 - 8080 - {host.docker.internal:46871} Processed request: 0.003sec/0.001sec (0.002sec, 0.000sec) (0.000sec/0.000sec/1) 53B 403 "GET /_matrix/federation/v1/state_ids/%210:host.docker.internal:46871?event_id=%24bO0OjeuzEcRmdWzNWTJLIKtM2MQy_VET2J1-4bayLBg HTTP/1.0" "Go-http-client/1.1" [0 dbevts]
  synapse_main | 2022-08-25 16:45:29,324 - synapse.handlers.federation - 1658 - INFO - sync_partial_state_room-0 - Clearing partial-state flag for !0:host.docker.internal:46871

We can see that the "updating current state" operation is ongoing as the state_ids request arrives. The same is true for the other failures linked above.

So really this is an artifact of #13288, though I might see if we can hack around it for now.

@DMRobertson
Copy link
Contributor

@DMRobertson DMRobertson reopened this Oct 18, 2022
@squahtx squahtx added Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact A-Testing Issues related to testing in complement, synapse, etc labels Nov 28, 2022
@babolivier
Copy link
Contributor

@squahtx
Copy link
Contributor

squahtx commented Dec 12, 2022

Failure on develop: https://github.com/matrix-org/synapse/actions/runs/3272273117/jobs/5383194205

https://github.com/matrix-org/synapse/actions/runs/3659144555/jobs/6214941767 is also a similar failure of this test

These last two are 502 Bad Gateway failures, which look like the same class of flake as #14543, where a previous test did not clean up properly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federated-Join joins over federation generally suck A-Testing Issues related to testing in complement, synapse, etc O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact Z-Flake Tests that give intermittent failures
Projects
None yet
6 participants