Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more tests for receiving events while resyncing for a partial join #419

Merged
Merged
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
a07b041
Factor out CreateMessageEvent() function
Jul 21, 2022
7b483be
Factor out `testReceiveEventDuringPartialStateJoin()` function
Jul 21, 2022
bd2b688
Add `handleGetMissingEventsRequests()` function to respond to `/get_m…
Jul 21, 2022
d085f11
Test whether the homeserver thinks it has full state at a received event
Jul 21, 2022
2cbeaae
Allow explicitly specified `/state` and `/state_ids` requests to comp…
Jul 21, 2022
894d755
Add simple test for receiving an event with a missing prev event
Jul 21, 2022
0643979
Add a test for receiving an event with one missing and one known prev…
Jul 21, 2022
5e38528
Add a test for receiving an event with a missing prev event, with one…
Jul 21, 2022
2979d24
fixup: s/sent/created/
Jul 22, 2022
a5b3c71
fixup: define StateIDsResult struct and send /state_id failures down …
Jul 22, 2022
8f5b930
fixup: rename new tests to use parents/grandparents terminology
Jul 22, 2022
49e35b4
Revert "Allow explicitly specified `/state` and `/state_ids` requests…
Jul 22, 2022
1adf5aa
Faster joins tests: make request handlers more specific
richvdh Jul 21, 2022
74dc057
fixup: Add explicit /state_id and /state handlers instead
Jul 22, 2022
5a9832f
Log which /state_ids request is being replied to
Jul 22, 2022
d437bfb
fixup: explain purpose of deferred channel read
Jul 22, 2022
8abc0e8
fixup: link to synapse#13288
Jul 22, 2022
8f0b234
Merge remote-tracking branch 'origin/main' into squah/faster_room_joi…
Jul 22, 2022
98907c0
fixup: Use WithRetryUntil to retry /event
Jul 25, 2022
47ce7ba
Use Context.WithTimeout instead of a SendFederationRequest goroutine
Jul 25, 2022
1a0fa9f
Add comment explaining purpose of early /state_ids request
Jul 25, 2022
e3d8197
fixup: Remove .vscode/settings.json
Jul 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions tests/federation_room_join_partial_state_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"github.com/gorilla/mux"
"github.com/tidwall/gjson"

"github.com/matrix-org/gomatrix"
"github.com/matrix-org/gomatrixserverlib"
"github.com/matrix-org/util"

Expand Down Expand Up @@ -489,6 +490,51 @@ func testReceiveEventDuringPartialStateJoin(
t.Fatalf("GET /event failed with %d: %s", res.StatusCode, string(eventResBody))
}

// fire off a /state_ids request for the last event.
// it must either:
// * block because the homeserver does not have full state at the last event
// * or 403 because the homeserver does not have full state yet and does not consider the
// Complement homeserver to be in the room
Comment on lines +603 to +607
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we know which, and test for that specifically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now only the last test case blocks instead of 403s. I'm somewhat reluctant to be more specific in the tests, because it feels like an implementation detail of synapse (which will change when we get around to matrix-org/synapse#13288).

If you're really keen on being more specific, I'm happy to add a boolean parameter or suchlike. (It's not easy to move this part of the test out, since it needs to happen after the /event request above has succeeded)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right. tbh I don't think either blocking or returning a 403 is correct behaviour, though I'm not quite sure what is.

It would be very useful to add a link to #13288 and say that this is somewhat temporary until we solve it properly.

Copy link
Contributor Author

@squahtx squahtx Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, is there a better way to test whether synapse thinks an event has partial state?
this part of the test was intended to check that _resolve_state_at_missing_prevs calculates the partial state flag correctly and it sounds like it's not a good idea to rely on /state_ids

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none that I can immediately think of. The ideal thing would be to check that after the resync, /state_ids returns some result that would have been impossible without the information conveyed during the resync - but that sounds rather fiddly for a test (and, probably, a hard-to-understand test failure if it later turns out not to be true).

stateIdsResponseChan := make(chan *gomatrixserverlib.RespStateIDs)
defer close(stateIdsResponseChan)
go func() {
stateReq := gomatrixserverlib.NewFederationRequest("GET", "hs1",
fmt.Sprintf("/_matrix/federation/v1/state_ids/%s?event_id=%s",
url.PathEscape(psjResult.ServerRoom.RoomID),
url.QueryEscape(event.EventID()),
),
)
var respStateIDs gomatrixserverlib.RespStateIDs
if err := psjResult.Server.SendFederationRequest(deployment, stateReq, &respStateIDs); err != nil {
httpErr, ok := err.(gomatrix.HTTPError)
t.Logf("%v", httpErr)
if ok && httpErr.Code == 403 {
stateIdsResponseChan <- nil
return
}
t.Errorf("/state_ids request returned non-200: %s", err)
return
}
stateIdsResponseChan <- &respStateIDs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than matching for the 403 here and sending a nil magic value down the channel, I suggest you define

type stateIDsResult struct {
    RespStateIDs gomatrixserverlib.RespStateIDs
    Error error
}

... and then send that back down the channel, and do the matching in the main thread.

}()

select {
case <-time.After(1 * time.Second):
t.Logf("/state_ids request for event %s blocked as expected", event.EventID())
defer func() { <-stateIdsResponseChan }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this defer thing do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the /state_ids request blocks, the send of the response happens some time later, perhaps even after the test ends.
If we don't read from the channel, the goroutine above blocks on sending into the channel and complement gets very unhappy 10 minutes later because it's still not finished.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so we're making sure we read from the channel so that the goroutine doesn't block. Can you add a comment to that effect?

break
case respStateIDs := <-stateIdsResponseChan:
if respStateIDs == nil {
t.Logf("/state_ids request for event %s returned 403 as expected", event.EventID())
} else {
// since we have not yet given the homeserver the full state at the join event and allowed
// the partial join to complete, it can't possibly know the full state at the last event.
// While it may be possible for the response to be correct by some accident of state res,
// the homeserver is still wrong in spirit.
t.Fatalf("/state_ids request for event %s did not block when it should have", event.EventID())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part tests matrix-org/synapse#13002 . If Synapse does not correctly flag that it does not have partial state in the CanReceiveEventsWithMissingPrevWithHalfMissingPrevsDuringPartialStateJoin test case, it will erroneously return state (which was derived via state res and only happens to be correct) instead of blocking.

}
}

// allow the partial join to complete
psjResult.FinishStateRequest()
alice.MustSyncUntil(t,
Expand Down