-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tech Debt Cleanup and Docs Update #219
Conversation
Clarify actor pattern and responsibilities across threads
refactor requestmanager to clearly designate responsibilities in actor pattern
remove the actor pattern from asyncloader, as its not needed. it works well as simply a locked data structure and is about 10x simpler
update references in architecture doc to reflect IPLD linksystem
eb9a9d2
to
86c0646
Compare
al.stateLk.Lock() | ||
defer al.stateLk.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR adds this single mutex lock around the loader's state. Do we have reasonable confidence that we won't end up with lock contention over it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code sent a message to a queue channel, and a single goroutine owned the state and processed those messages sequentially. So the amount of contention seems basically the same before and after.
One significant difference is the priority and ordering; with a channel, before you would be guaranteed first-come-first-serve. With a mutex, it's always a semi-random race to decide what goroutine gets to grab the mutex, so if incoming messages keep coming, you could have high latencies with older messages getting unlucky and not grabbing the mutex. It's not clear to me whether order matters here.
One last thought: sharing memory does give us more ability to be performant, if we want to. For example, using a RWMutex, or multiple mutexes, or sync/atomic. So in terms of contention, I think it can be better than a single goroutine with a message queue, if we care enough to optimize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I feel quite confident about this being a better, more reliable strategy.
// to them. | ||
type RequestManager struct { | ||
ctx context.Context | ||
cancel func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context.CancelFunc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently it's just func() everywhere in the code and I should change it all at once. will do in seperate cleanup PR
for cancelMessageChannel != nil || incomingResponses != nil || incomingErrors != nil { | ||
select { | ||
case cancelMessageChannel <- &cancelRequestMessage{requestID, false, nil, nil}: | ||
cancelMessageChannel = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this close the channel as well? a comment might be useful since this pattern looks weird otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if you look a couple lines above, what we're doing here is using cancelMessageChannel = nil to send exactly once -- but the actual channel is the main message channel for the request manager.
I agree, this is overcomplicated. I think it's probably ok to do the channel send once before the loop, but for this PR I'd rather not change that functionality.
return remainingResponses | ||
} | ||
|
||
func (rm *RequestManager) updateLastResponses(responses []gsmsg.GraphSyncResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this method (amongst others) does seems not fully reflected in the name. some comments on expected behavior of these methods would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add in cleanup PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two comments on one of the commits, since the whole diff was pretty large. I'm not sure how well GitHub will deal with that :)
@@ -1,6 +1,6 @@ | |||
module github.com/ipfs/go-graphsync | |||
|
|||
go 1.12 | |||
go 1.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe time to get this repo onboard with unified ci :) it would take care of bumping this too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes agreed.
al.stateLk.Lock() | ||
defer al.stateLk.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code sent a message to a queue channel, and a single goroutine owned the state and processed those messages sequentially. So the amount of contention seems basically the same before and after.
One significant difference is the priority and ordering; with a channel, before you would be guaranteed first-come-first-serve. With a mutex, it's always a semi-random race to decide what goroutine gets to grab the mutex, so if incoming messages keep coming, you could have high latencies with older messages getting unlucky and not grabbing the mutex. It's not clear to me whether order matters here.
One last thought: sharing memory does give us more ability to be performant, if we want to. For example, using a RWMutex, or multiple mutexes, or sync/atomic. So in terms of contention, I think it can be better than a single goroutine with a message queue, if we care enough to optimize it.
|
||
stateLk sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd explicitly document what fields this mutex protects, even if that's just "all fields below".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
// Shutdown ends processing for the want manager. | ||
func (rm *RequestManager) Shutdown() { | ||
rm.cancel() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused; if the user supplies a context when creating a RequestManager, could they not just cancel the context themselves to do a shutdown? Having RequestManager hold the cancel func and expose a Shutdown method feels unnecessary.
One reason we might have for exposing Shutdown methods is to also block until all goroutines have stopped and resources have been freed up. If we think we'll want that later on, then I think leaving these in with a TODO makes sense. Otherwise, I'd personally remove the Shutdown APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we went with the "we'll eventually want to block until all request goroutines are stopped" route, then I think these methods should gain a context parameter, too. That way, I can say "graceful shutdown for 10s to stop accepting requests, and after that, kill all requests which aren't done yet". This is how net/http does it: https://pkg.go.dev/net/http#Server.Shutdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is odd. This is the pattern of the repo. I'd rather revisit the shutdown behavior globally in a seperate ticket.
defer al.stateLk.Unlock() | ||
_, ok := al.alternateQueues[name] | ||
if !ok { | ||
return errors.New("unknown persistence option") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include the name? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
@@ -77,3 +77,47 @@ var ResponseCodeToName = map[ResponseStatusCode]string{ | |||
RequestFailedContentNotFound: "RequestFailedContentNotFound", | |||
RequestCancelled: "RequestCancelled", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while you're refactoring, you could automate this table and String method with https://pkg.go.dev/golang.org/x/tools/cmd/stringer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg where was this tool all my life! You should check over at go-fil-markets and go-data-transfer for some pretty awesome comedy here.
responsecode.go
Outdated
case RequestCancelled: | ||
return RequestCancelledErr{} | ||
default: | ||
return fmt.Errorf("Unknown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("unknown response status code: %d", c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
I've responded to your comments. Some of them related to unchanged code that has just been moved around. I agree generally with all the comments, but as you can see I've suggested deferring them to minimize this changeset. I've documented these cleanup issues here: If I can get your approvals on this PR I'd like to go ahead and merge. |
Goals
Do a big cleanup pass on go-graphsync to make it easier to reason about and work with. Belatedly update docs for IPLD LinkSystem.
Implementation
THIS PR LOOKS LARGE BUT IT'S NOT THAT BIG. These are the main changes:
I can break this into multiple PRs if need be, but you can also review the individual commits if you like.