Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

properly close connections #128

Merged
merged 11 commits into from
Apr 7, 2020
Merged

properly close connections #128

merged 11 commits into from
Apr 7, 2020

Conversation

dryajov
Copy link
Contributor

@dryajov dryajov commented Apr 3, 2020

No description provided.

@arnetheduck
Copy link
Contributor

what does close mean?

basically, for streams that have a parent stream, it should mean roughly "request underlying stream to close" and then the underlying stream closes and notifies child streams that it closed by firing its close event, "rippling" the close back up the chain, and then the stream cleans up its resources - this way, no matter what caused the stream to close, the cleanup is run at the right time.

So there are two data flows here: from parent to child and from child to parent - what's confusing is that inside the parent-to-child handler, the child-to-parent is invoked, creating a loop - this loop complicates matters because now you have to introduce state (that might be take on unexpected values in out-of-order async execution) to handle it.

it seems that close does two things now: cleanup and notify parent - perhaps that needs splitting up to simplify the machinery? another simplification is of course wip2 which removes some of this complexity altogether.

@dryajov
Copy link
Contributor Author

dryajov commented Apr 5, 2020

what does close mean?

Yes, close will both notify child/parent streams that it's being disposed and reset it's internal state.

what's confusing is that inside the parent-to-child handler, the child-to-parent is invoked, creating a loop

It looks like that, however the way AsyncEvent works it will ensure that it will only fire once. So once the parent fires the event, it won't be triggered again unless it's reset, this alleviates the loop.

it seems that close does two things now: cleanup and notify parent - perhaps that needs splitting up to simplify the machinery?

Ideally yes, but I'm not sure if that's possible. Basically there is only one place where we want to notify the child/parent streams that it's being closed, which is when the close happens, notifying of the close and then disposing of resources actually complicates matters because now you have handle an essentially atomic operation in several places.

I'm not arguing that this is perfect, but right now it seems that tweaking it won't necessarily help us solve this issues, this requires a refactor that as you say will simplify all of this machinery (wip & wip2), this is next in my priority list.

The biggest culprit of this leak seems to be https://github.com/status-im/nim-libp2p/pull/128/files#diff-54deac2904add92d29f284d9eb946cc7R33-R40, the connections where being stored in the transports internal table and never disposed of, this table is probably not needed anymore and it's a left over of an early design assumption that doesn't hold anymore.

@arnetheduck
Copy link
Contributor

well, I guess there are two schools of thought here:

one is where "close" means disconnecting from parent stream: it would mean removing the connection to the parent close event handler as well as cancelling any pending reads and writes etc, so that the child stream is entirely disconnected from the parent. the parent stream is then notified and can do what it pleases because the child no longer cares: keep reading, closing itself etc

the other is that "close" is a notification to the parent stream to "start" closing, but on the child stream nothing else happens: no cleanup, no notification, no event to children of the child streams: this is delayed until the parent stream as carried out the close request.

I know there is a flag in there to break the loop - but the loop can also be broken "structurally" by any of the two methods above - true, there's little distinction, but it's worth thinking about what "principle" close should operate by, and how one ensures that there is no race condition, multiple events etc.

@dryajov
Copy link
Contributor Author

dryajov commented Apr 6, 2020

the other is that "close" is a notification to the parent stream to "start" closing, but on the child stream nothing else happens: no cleanup, no notification, no event to children of the child streams: this is delayed until the parent stream as carried out the close request.

This is the close method on the connection - https://github.com/status-im/nim-libp2p/blob/master/libp2p/connection.nim#L110-L117. I think this does what you describe in the above paragraph.

one is where "close" means disconnecting from parent stream: it would mean removing the connection to the parent close event handler as well as cancelling any pending reads and writes etc, so that the child stream is entirely disconnected from the parent. the parent stream is then notified and can do what it pleases because the child no longer cares: keep reading, closing itself etc

I'm not entirely sure what the advantage of this approach is?

but the loop can also be broken "structurally" by any of the two methods above

I think this is the ideal and all the close event cruft is a consequence of the current approach, which needs simplifying. I'm not sure tho how to fix this structurally without significant refactoring (wip & wip2) right now tho.

@arnetheduck
Copy link
Contributor

I'm not entirely sure what the advantage of this approach is?

it's simpler: you don't have to handle close failures which makes closing more predictable and less risky from a race point of view.

@dryajov dryajov marked this pull request as ready for review April 6, 2020 22:12
@dryajov dryajov changed the title [WIP] properly close connections properly close connections Apr 7, 2020
libp2p/connection.nim Outdated Show resolved Hide resolved
@@ -154,5 +153,10 @@ method newStream*(m: Mplex,

method close*(m: Mplex) {.async, gcsafe.} =
trace "closing mplex muxer"
if not m.connection.closed():
await m.connection.close()

await allFutures(@[allFutures(toSeq(m.remote.values).mapIt(it.reset())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might help to merge this #125 but anyway can be the opposite way too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably more urgent since it fixes some pretty egregious mem leaks. I'd rather get this in first to see how it behaves in NBC.

if not isNil(sconn) and not sconn.closed:
asyncCheck sconn.close()
result = newConnection(newBufferStream(writeHandler))
asyncCheck readLoop(sconn, result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda wanna remove this too and track the future but I guess can be done in another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to happen in a more focused refactor, we'll do that after we get the current implementation more stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants