Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Migrate away from handling each packet in a dedicated goroutine #71

Open
1 task done
jonbarrow opened this issue Nov 9, 2024 · 0 comments
Open
1 task done
Labels
awaiting-approval Topic has not been approved or denied enhancement An update to an existing part of the codebase

Comments

@jonbarrow
Copy link
Member

Checked Existing

  • I have checked the repository for duplicate issues.

What enhancement would you like to see?

Currently we handle each request as its own goroutine. This was fine originally when there were fewer requests, but now that we're handling thousands upon thousands of requests this can create an ungodly number of goroutines. This can, under heavy load, create a lot of system resource usage. Goroutines are not "free", even if small they still take up memory and can create slowdowns during context switching. This can, under the right circumstances, create a situation where clients have connection issues or large amounts of memory ends up being used.

To account for this, it may be worth moving to a worker pool-based approach with a channel queue. This would mean creating a pool of premade goroutines, say 100 of them, to handle incoming packets. A channel queue can move the packets into a worker as soon as one becomes ready to handle the packet. This would keep things under a somewhat more consistent predefined level of usage.

Any other details to share? (OPTIONAL)

Considerations

Some more accurate testing needs to be done to determine how much of an issue the current design is. We've fixed a number of memory related issues, however we are still abusing goroutines pretty heavily which goes against recommendations and sometimes I do see the friends server get some high-ish memory usage.

There are a number of edge cases and caveats to consider when using this sort of design as well. For instance, whether or not to buffer the channel or not.

Buffering the channel will mean it has a predefined size and will drop any new data put into it if it's reached its max size, called "load shedding". This has the benefit of controlling memory usage better but means we will begin to drop packets if the channel becomes overloaded. We handle instances where packets may be dropped already, since that can happen no matter what, so maybe that's not a huge deal, but it does mean that if we are consistently under load then the servers will consistently drop packets.

Channels in Go can be unbuffered, and thus have an infinite size, however this means that under heavy load we will get more memory usage as data backs up. This also means that clients may actually have a higher risk of timing out. In a system where the channel is buffered if the packet is dropped then the next time the packet is sent by the client it may have a chance to get handled sooner as there may be less packets in the queue at that time. However, with an unbuffered queue the amount of time before the packet is processed is variable. This means the queue may have many instances of the same packet if it's resent multiple times, and if the queue is very long then it may take a while before it becomes processed.

Another thing to consider is the number of workers. Unlike most languages we can safely have a number of workers higher than CPU cores, since the Go runtime is very good at multiplexing them and context switching. However, since some operations may take a while to complete, we could run into an issue where all workers are busy. Say we have 100 workers and we get 101 requests, all of which are operations which take a second or more (this is a real possibility, as we had some friends server requests taking multiple seconds in the past). Suddenly that 101st request must now wait an extra second to be processed because there are no workers available.

Alternatives

If we wanted to get really low level, we could do things like what Tailscale does. Tailscale uses some lower-level Linux systems to improve performance and reliability, such as recvmmsg and netpoll to handle incoming packets and finding free goroutines to process them. This can be incredibly powerful and would likely help us a lot here, but is also fairly complex and would likely break on other platforms outside of Linux

@jonbarrow jonbarrow added enhancement An update to an existing part of the codebase awaiting-approval Topic has not been approved or denied labels Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-approval Topic has not been approved or denied enhancement An update to an existing part of the codebase
Projects
None yet
Development

No branches or pull requests

1 participant