Skip to content

Conversation

@fcecin
Copy link
Contributor

@fcecin fcecin commented Nov 26, 2025

This patch adds an AsyncLock around StreamServer.accept inside HttpServer.accept.

Chronos does not support multiple tasks calling StreamServer.accept on the same socket. To make HttpServer.accept callable by multiple concurrent tasks, the stream accept must be protected by an async lock.

This patch avoids this fix being pushed into nim-websock. The task-spawning logic can then be pushed up to the caller (e.g. nim-libp2p).

@fcecin
Copy link
Contributor Author

fcecin commented Dec 1, 2025

There's a chance StreamServer.accept just doesn't want to be targeted by multiple tasks, at all. Don't know why yet; will investigate a bit more to see if I can figure it out.

If that's the case, we would need to short-term patch the head-of-line blocking issue in whatever way plays nicer with Nim/Chronos:

  • Revert to the previous patch with a single dispatcher task owning StreamServer.accept inside of nim-websock, which should play nicer and is transparent, but is arguably uglier.
  • Give an alternative nim-websock API that can be used: HttpServer.acceptStream just accepts the stream (without parsing the HTTP header afterwards with the timeout) and an HttpServer.parseHeader to be invoked from the outside (which is free to be called in parallel by multiple dispatched async tasks).
  • Another possibility is making HttpServer.accept return a "hollow" HttpRequest that is hydrated later. The caller has to wait on a future inside the HttpRequest. I tried this one, but it's arguably worse; you have one additional asyncSpawn inside HttpServer.accept (to fulfill the parsing); the "advantage" is that you don't have to necessarily specify a maximum concurrent accept rate ("100 connections") on nim-libp2p -- except you do, if you don't want to deal with FD exhaustion anyways.

HttpError, "Callback already registered - cannot mix callback and accepts styles!"
)

await server.acceptLock.acquire()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await server.acceptLock.acquire()
# StreamServer.accept does not allow queueing multiple accept requests, so we protect access to it with a lock
await server.acceptLock.acquire()

or some similar comment

@arnetheduck
Copy link
Member

There's a chance

Indeed, neither accept nor readXxx allow queueing readers, it turns out, so this lock is needed - it's an open question whether this capability should be something that StreanTransport itself offers or if it should be done here, but for now, I think the solution here is fine - I'm surprised this hasn't been an issue until now but I guess it just manifests as being a bit slow (which has its own upside, such as limiting the amount of resources allocated to dealing with dos attacks and thus protecting existing connections).

Someone posting multiple accepts should likely also use a token bucket to slow down the connection rate to something reasonable, but that's not really a problem that websock needs to solve, here.

What's missing from this PR is really a test - it should be (fairly) simple to write a test where we queue 10 accepts then make a client that waits 1s before sending its headers - such a test should take ~1s to complete, but would take 10s under serialized conditions.

@fcecin
Copy link
Contributor Author

fcecin commented Dec 2, 2025

There's a chance

Indeed, neither accept nor readXxx allow queueing readers, it turns out, so this lock is needed - it's an open question whether this capability should be something that StreanTransport itself offers or if it should be done here, but for now, I think the solution here is fine - I'm surprised this hasn't been an issue until now but I guess it just manifests as being a bit slow (which has its own upside, such as limiting the amount of resources allocated to dealing with dos attacks and thus protecting existing connections).

Someone posting multiple accepts should likely also use a token bucket to slow down the connection rate to something reasonable, but that's not really a problem that websock needs to solve, here.

What's missing from this PR is really a test - it should be (fairly) simple to write a test where we queue 10 accepts then make a client that waits 1s before sending its headers - such a test should take ~1s to complete, but would take 10s under serialized conditions.

Amazingly enough, I have a potential counter-argument to merging this PR.

I think there's a nonzero chance that there's a low-level bug involving Chronos and recent changes to Nim (say between Nim 2.0.16 and Nim 2.2.6 for some reason). It's not conclusive, but I'm sure something genuinely strange is going on.

With this lock and multiple async tasks calling accept here, even if the lock serializes StreamServer.accept(), there's some internal artifact to Chronos that's suffering just from the fact it's being called by multiple async tasks. The nim-libp2p side of this patch is implemented correctly I believe, but it does hang or crash Nim/Chronos in some specific environment combos. If I add a "sleep 300ms" between clients in the nim-libp2p wstransport test, it "works." That is, hitting the AsyncLock "gently" makes it work, which makes zero sense unless some lower-level bookkeeping (say, some GC cycle or whatever) could complete in that time.

Point is, even if all that is just wrong, I think that even having multiple tasks calling the StreamServer.accept() of the same TCP accept socket is an anti-pattern. It is fighting Chronos. So arguably it shouldn't even be supported by HttpServer.accept.

On the other hand, this patch here as a zero-cost standalone defense could be useful. I'm going to submit another PR to nim-websock that's going to expose an HttpServer.acceptStream and an HttpServer.processHttpHeader (or something) for calling by clients that don't want head-of-line-blocking like nwaku via nim-libp2p. That cannot possibly upset Chronos, since the client (nim-libp2p) will have one task (a global WsTransport dispatcher) pulling sockets out of the StreamServer. Then it has to work, even without this AsyncLock here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants