properly close connections #128

dryajov · 2020-04-03T20:37:46Z

No description provided.

arnetheduck · 2020-04-05T08:58:44Z

what does close mean?

basically, for streams that have a parent stream, it should mean roughly "request underlying stream to close" and then the underlying stream closes and notifies child streams that it closed by firing its close event, "rippling" the close back up the chain, and then the stream cleans up its resources - this way, no matter what caused the stream to close, the cleanup is run at the right time.

So there are two data flows here: from parent to child and from child to parent - what's confusing is that inside the parent-to-child handler, the child-to-parent is invoked, creating a loop - this loop complicates matters because now you have to introduce state (that might be take on unexpected values in out-of-order async execution) to handle it.

it seems that close does two things now: cleanup and notify parent - perhaps that needs splitting up to simplify the machinery? another simplification is of course wip2 which removes some of this complexity altogether.

dryajov · 2020-04-05T19:38:22Z

what does close mean?

Yes, close will both notify child/parent streams that it's being disposed and reset it's internal state.

what's confusing is that inside the parent-to-child handler, the child-to-parent is invoked, creating a loop

It looks like that, however the way AsyncEvent works it will ensure that it will only fire once. So once the parent fires the event, it won't be triggered again unless it's reset, this alleviates the loop.

it seems that close does two things now: cleanup and notify parent - perhaps that needs splitting up to simplify the machinery?

Ideally yes, but I'm not sure if that's possible. Basically there is only one place where we want to notify the child/parent streams that it's being closed, which is when the close happens, notifying of the close and then disposing of resources actually complicates matters because now you have handle an essentially atomic operation in several places.

I'm not arguing that this is perfect, but right now it seems that tweaking it won't necessarily help us solve this issues, this requires a refactor that as you say will simplify all of this machinery (wip & wip2), this is next in my priority list.

The biggest culprit of this leak seems to be https://github.com/status-im/nim-libp2p/pull/128/files#diff-54deac2904add92d29f284d9eb946cc7R33-R40, the connections where being stored in the transports internal table and never disposed of, this table is probably not needed anymore and it's a left over of an early design assumption that doesn't hold anymore.

arnetheduck · 2020-04-05T20:19:01Z

well, I guess there are two schools of thought here:

one is where "close" means disconnecting from parent stream: it would mean removing the connection to the parent close event handler as well as cancelling any pending reads and writes etc, so that the child stream is entirely disconnected from the parent. the parent stream is then notified and can do what it pleases because the child no longer cares: keep reading, closing itself etc

the other is that "close" is a notification to the parent stream to "start" closing, but on the child stream nothing else happens: no cleanup, no notification, no event to children of the child streams: this is delayed until the parent stream as carried out the close request.

I know there is a flag in there to break the loop - but the loop can also be broken "structurally" by any of the two methods above - true, there's little distinction, but it's worth thinking about what "principle" close should operate by, and how one ensures that there is no race condition, multiple events etc.

dryajov · 2020-04-06T15:11:30Z

the other is that "close" is a notification to the parent stream to "start" closing, but on the child stream nothing else happens: no cleanup, no notification, no event to children of the child streams: this is delayed until the parent stream as carried out the close request.

This is the close method on the connection - https://github.com/status-im/nim-libp2p/blob/master/libp2p/connection.nim#L110-L117. I think this does what you describe in the above paragraph.

one is where "close" means disconnecting from parent stream: it would mean removing the connection to the parent close event handler as well as cancelling any pending reads and writes etc, so that the child stream is entirely disconnected from the parent. the parent stream is then notified and can do what it pleases because the child no longer cares: keep reading, closing itself etc

I'm not entirely sure what the advantage of this approach is?

but the loop can also be broken "structurally" by any of the two methods above

I think this is the ideal and all the close event cruft is a consequence of the current approach, which needs simplifying. I'm not sure tho how to fix this structurally without significant refactoring (wip & wip2) right now tho.

arnetheduck · 2020-04-06T16:00:39Z

I'm not entirely sure what the advantage of this approach is?

it's simpler: you don't have to handle close failures which makes closing more predictable and less risky from a race point of view.

libp2p/transports/tcptransport.nim

libp2p/connection.nim

sinkingsugar · 2020-04-07T06:43:22Z

libp2p/muxers/mplex/mplex.nim

@@ -154,5 +153,10 @@ method newStream*(m: Mplex,

 method close*(m: Mplex) {.async, gcsafe.} =
  trace "closing mplex muxer"
+  if not m.connection.closed():
+    await m.connection.close()
+
  await allFutures(@[allFutures(toSeq(m.remote.values).mapIt(it.reset())),


might help to merge this #125 but anyway can be the opposite way too

This is probably more urgent since it fixes some pretty egregious mem leaks. I'd rather get this in first to see how it behaves in NBC.

sinkingsugar · 2020-04-07T06:43:59Z

libp2p/protocols/secure/secure.nim

-      if not isNil(sconn) and not sconn.closed:
-        asyncCheck sconn.close()
+  result = newConnection(newBufferStream(writeHandler))
+  asyncCheck readLoop(sconn, result)


I kinda wanna remove this too and track the future but I guess can be done in another PR

this needs to happen in a more focused refactor, we'll do that after we get the current implementation more stable

dryajov force-pushed the fix/mem-leak branch from 20492a7 to c36b5d1 Compare April 5, 2020 02:47

dryajov marked this pull request as ready for review April 6, 2020 22:12

dryajov force-pushed the fix/mem-leak branch from 7f52a46 to ff0b683 Compare April 6, 2020 22:22

dryajov changed the title ~~[WIP] properly close connections~~ properly close connections Apr 7, 2020

dryajov requested a review from sinkingsugar April 7, 2020 00:54

sinkingsugar reviewed Apr 7, 2020

View reviewed changes

libp2p/transports/tcptransport.nim Outdated Show resolved Hide resolved

sinkingsugar suggested changes Apr 7, 2020

View reviewed changes

dryajov added 11 commits April 7, 2020 07:58

properly close connections

bee7261

more connection closes to fix leaks

46442bc

small cleanup

64a3afb

close connections

8fd182b

disable storing connections on internal table

c1470e4

connection closing tests

e398e5c

remove unused field

f9e0e55

proper connection cleanup

9c7f7f9

formatting

795f93b

reduse usssage of asyncCheck

590d4a8

fix nil condition

8a0e54b

dryajov force-pushed the fix/mem-leak branch from ff0b683 to 8a0e54b Compare April 7, 2020 15:50

dryajov requested a review from sinkingsugar April 7, 2020 15:52

dryajov merged commit 00fbc92 into master Apr 7, 2020

dryajov deleted the fix/mem-leak branch April 7, 2020 18:17

mratsim mentioned this pull request Apr 8, 2020

[ongoing] Network stability status-im/nimbus-eth2#784

Closed

2 tasks

mratsim mentioned this pull request Sep 23, 2020

-d:useMalloc for the default GC nim-lang/Nim#15394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

properly close connections #128

properly close connections #128

dryajov commented Apr 3, 2020

arnetheduck commented Apr 5, 2020

dryajov commented Apr 5, 2020 •

edited

Loading

arnetheduck commented Apr 5, 2020

dryajov commented Apr 6, 2020

arnetheduck commented Apr 6, 2020

sinkingsugar Apr 7, 2020

dryajov Apr 7, 2020

sinkingsugar Apr 7, 2020

dryajov Apr 7, 2020

properly close connections #128

properly close connections #128

Conversation

dryajov commented Apr 3, 2020

arnetheduck commented Apr 5, 2020

dryajov commented Apr 5, 2020 • edited Loading

arnetheduck commented Apr 5, 2020

dryajov commented Apr 6, 2020

arnetheduck commented Apr 6, 2020

sinkingsugar Apr 7, 2020

Choose a reason for hiding this comment

dryajov Apr 7, 2020

Choose a reason for hiding this comment

sinkingsugar Apr 7, 2020

Choose a reason for hiding this comment

dryajov Apr 7, 2020

Choose a reason for hiding this comment

dryajov commented Apr 5, 2020 •

edited

Loading