Skip to content

Akka.IO: fix TcpListener connection queue problem#7623

Merged
Aaronontheweb merged 14 commits into
akkadotnet:devfrom
Aaronontheweb:fix-5988-AkkaIOTcp-2
May 7, 2025
Merged

Akka.IO: fix TcpListener connection queue problem#7623
Aaronontheweb merged 14 commits into
akkadotnet:devfrom
Aaronontheweb:fix-5988-AkkaIOTcp-2

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Member

@Aaronontheweb Aaronontheweb commented May 6, 2025

Changes

close #5988

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Latest dev Benchmarks

                                                                                                                                          
BenchmarkDotNet v0.13.12, Pop!_OS 22.04 LTS                                                                                               
13th Gen Intel Core i7-1360P, 1 CPU, 16 logical and 12 physical cores                                                                     
.NET SDK 8.0.404                                                                                                                          
  [Host]  : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2                                                                                 
  LongRun : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2                                                                                 
                                                                                                                                          
Job=LongRun  Concurrent=True  Server=True                                                                                                 
InvocationCount=1  IterationCount=10  LaunchCount=3                                                                                       
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=3                                                                                     
                                                                                                                                          
Method MessageLength ClientsCount Mean Error StdDev Req/sec
ClientServerCommunication 10 1 27.982 μs 1.2878 μs 1.9275 μs 35,737.37
ClientServerCommunication 10 3 14.090 μs 2.1295 μs 3.1874 μs 70,974.46
ClientServerCommunication 10 5 10.736 μs 1.0059 μs 1.5055 μs 93,147.14
ClientServerCommunication 10 7 8.458 μs 0.6112 μs 0.9149 μs 118,237.67
ClientServerCommunication 10 10 6.384 μs 0.4887 μs 0.7315 μs 156,647.67
ClientServerCommunication 10 20 4.306 μs 0.3578 μs 0.5355 μs 232,243.96
ClientServerCommunication 10 30 4.177 μs 0.3495 μs 0.5231 μs 239,412.74
ClientServerCommunication 10 40 4.082 μs 0.3152 μs 0.4718 μs 244,954.21
ClientServerCommunication 100 1 25.356 μs 1.2967 μs 1.9408 μs 39,438.36
ClientServerCommunication 100 3 16.853 μs 2.6889 μs 4.0247 μs 59,337.20
ClientServerCommunication 100 5 11.007 μs 0.5326 μs 0.7971 μs 90,849.56
ClientServerCommunication 100 7 8.152 μs 0.4302 μs 0.6439 μs 122,671.38
ClientServerCommunication 100 10 6.648 μs 0.6572 μs 0.9837 μs 150,428.57
ClientServerCommunication 100 20 4.394 μs 0.3995 μs 0.5979 μs 227,607.99
ClientServerCommunication 100 30 4.541 μs 0.5171 μs 0.7740 μs 220,222.12
ClientServerCommunication 100 40 4.446 μs 0.4494 μs 0.6726 μs 224,926.92

This PR's Benchmarks

                                                                                                                                                                      
BenchmarkDotNet v0.13.12, Pop!_OS 22.04 LTS                                                                                                                           
13th Gen Intel Core i7-1360P, 1 CPU, 16 logical and 12 physical cores                                                                                                 
.NET SDK 8.0.404                                                                                                                                                      
  [Host]  : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2                                                                                                             
  LongRun : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2                                                                                                             
                                                                                                                                                                      
Job=LongRun  Concurrent=True  Server=True                                                                                                                             
InvocationCount=1  IterationCount=10  LaunchCount=3                                                                                                                   
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=3                                                                                                                 
                                                                                                                                                                      
Method MessageLength ClientsCount Mean Error StdDev Median Req/sec
ClientServerCommunication 10 1 28.613 μs 1.0532 μs 1.5763 μs 28.204 μs 34,949.41
ClientServerCommunication 10 3 13.247 μs 2.0348 μs 3.0456 μs 13.202 μs 75,489.21
ClientServerCommunication 10 5 11.622 μs 0.9882 μs 1.4791 μs 11.611 μs 86,046.57
ClientServerCommunication 10 7 8.414 μs 0.7337 μs 1.0982 μs 8.291 μs 118,847.94
ClientServerCommunication 10 10 6.439 μs 0.4373 μs 0.6546 μs 6.316 μs 155,312.36
ClientServerCommunication 10 20 4.597 μs 0.8180 μs 1.2244 μs 4.230 μs 217,527.70
ClientServerCommunication 10 30 4.022 μs 0.2507 μs 0.3752 μs 3.918 μs 248,662.30
ClientServerCommunication 10 40 3.911 μs 0.3339 μs 0.4998 μs 3.737 μs 255,716.58
ClientServerCommunication 100 1 25.943 μs 1.4570 μs 2.1808 μs 26.077 μs 38,546.16
ClientServerCommunication 100 3 14.730 μs 2.2137 μs 3.3134 μs 14.296 μs 67,887.66
ClientServerCommunication 100 5 11.184 μs 0.6095 μs 0.9122 μs 11.162 μs 89,415.28
ClientServerCommunication 100 7 8.954 μs 0.9970 μs 1.4922 μs 8.868 μs 111,677.15
ClientServerCommunication 100 10 6.562 μs 0.3587 μs 0.5368 μs 6.450 μs 152,390.84
ClientServerCommunication 100 20 4.167 μs 0.3625 μs 0.5426 μs 3.993 μs 239,966.16
ClientServerCommunication 100 30 4.250 μs 0.4495 μs 0.6729 μs 3.902 μs 235,280.42
ClientServerCommunication 100 40 4.285 μs 0.4918 μs 0.7360 μs 3.896 μs 233,382.51

@Aaronontheweb
Copy link
Copy Markdown
Member Author

Depends on #7621

@Aaronontheweb
Copy link
Copy Markdown
Member Author

Still need to update this to include some changes to the default number of accepted connections.

@Aaronontheweb
Copy link
Copy Markdown
Member Author

One change I think I'm going to make is removing the "accept-batch-size" setting altogether. It's redundant - we already have TCP backlog for this purpose.

As for the size of the SocketAsyncEventArgs pool - I'll need to do some more work to figure that one out.

@Aaronontheweb Aaronontheweb marked this pull request as ready for review May 7, 2025 19:19
Copy link
Copy Markdown
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed my changes

public class Bind : Akka.IO.Tcp.Command
{
public Bind(Akka.Actor.IActorRef handler, System.Net.EndPoint localAddress, int backlog = 100, System.Collections.Generic.IEnumerable<Akka.IO.Inet.SocketOption> options = null, bool pullMode = False) { }
public Bind(Akka.Actor.IActorRef handler, System.Net.EndPoint localAddress, int backlog = 1024, System.Collections.Generic.IEnumerable<Akka.IO.Inet.SocketOption> options = null, bool pullMode = False) { }
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significantly increased the default backlog to a much more reasonable 1024, which is the default in Akka.Remote.

/// SocketAsyncEventArgs is a wrapper around SocketAsyncEventArgs that allows us to deliver
/// notifications to actors upon completion of the operation.
/// </summary>
internal sealed class SocketAsyncActorEventArgs : SocketAsyncEventArgs
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common pattern in Kestrel, DotNetty, and I think we did this in TurboMqtt: wrap the SocketAsyncEventArgs with additional operation-handling context built into it.

private int _acceptLimit;
private SocketAsyncEventArgs[] _saeas;
private readonly int _acceptLimit = DefaultAcceptLimit;
private SocketAsyncActorEventArgs[]? _acceptPool;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use our custom SAEA implementation

private readonly int _acceptLimit = DefaultAcceptLimit;
private SocketAsyncActorEventArgs[]? _acceptPool;
private bool _binding;
private static readonly EventHandler<SocketAsyncEventArgs> OnCompleted = OnIoCompleted;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a static EventHandler instance

private bool _binding;
private static readonly EventHandler<SocketAsyncEventArgs> OnCompleted = OnIoCompleted;

private sealed record AcceptCompleted(SocketAsyncEventArgs EventArgs) : INoSerializationVerificationNeeded;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal message types for signaling completion / retry needed

case SocketError.TimedOut:
case SocketError.WouldBlock:
// transient – short back‑off then retry
saea.AcceptSocket = null;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these error codes a retriable - schedule a short back-off and go again.

Important note: can't use IWithTimers for this because you'd need a unique timer key each time - hence why I'm using the IScheduler. That's something we may want to improve or add to the IWithTimers interface.

_socket.Bind(_bind.LocalAddress);
_socket.Listen(_bind.Backlog);
_saeas = Accept(_acceptLimit).ToArray();

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocate the pool and start accepting connections on it - in theory we could do this in a single loop but this code only ever runs once.

protected override bool Receive(object message)
{
throw new NotImplementedException();
switch (message)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was, previously, the Initializing behavior.

{
// remove event handler
saea.Completed -= OnCompleted;
saea.Dispose();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properly dispose all SAEA

initialSocketAsyncEventArgs: config.GetInt("nr-of-socket-async-event-args", 32),
traceLogging: config.GetBoolean("trace-logging", false),
batchAcceptLimit: config.GetInt("batch-accept-limit", 10),
batchAcceptLimit: config.GetString("batch-accept-limit") == "scale-to-cpus"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle the new scale-to-cpus value we accept for akka.io.tcp.batch-accept-limit

Copy link
Copy Markdown
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Arkatufus Arkatufus enabled auto-merge (squash) May 7, 2025 20:27
@Aaronontheweb Aaronontheweb disabled auto-merge May 7, 2025 21:06
@Aaronontheweb Aaronontheweb merged commit 49de2de into akkadotnet:dev May 7, 2025
8 of 11 checks passed
@Aaronontheweb Aaronontheweb deleted the fix-5988-AkkaIOTcp-2 branch May 7, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Akka.IO] TcpListener connection queue problem

2 participants