[NEW] Introduce Quality-of-Service for the replication stream to reduce full sync as a result of buffer overruns #1596

xbasel · 2025-01-21T10:37:35Z

When replica nodes are very busy processing customer traffic, the replication stream can get starved, causing disconnections due to exceeding buffer limits. This often triggers a full sync.

To reduce the likelihood of full syncs caused by client output buffer overruns, we can add a quality-of-service mechanism in the replica to prioritize replication traffic during high load.

Description of the feature
This feature improves the availability of replicas and the stability of primaries by reducing the chances of full syncs.

The replica can detect replication traffic bursts by monitoring the buffer during primary socket reads. For example, if the application buffer has been fully filled for the last N reads, the replica can assume the kernel level TCP receive queue buffer isn’t empty and traffic is high. In this case, it can prioritize more socket reads for the primary file descriptor, helping to drain the shared replication buffer in the primary faster and reducing the chances of full syncs.

zuiderkwast · 2025-01-21T11:40:39Z

You mean the IP_TOS field in IP packets? https://en.wikipedia.org/wiki/Type_of_service. IIUC, the kernel can prioritize packets according to this and routers can use it too. I think we can set it on replication connections and cluster bus connections. It should be just a setsockopt call on the socket fd.

xbasel · 2025-01-21T12:39:13Z

No. I believe the bottleneck lies with the engine itself (CPU), not the network. If the replica processes the primary connection like any other client, and there are many clients sending commands, the replication connection could become starved.

Current implementation:

fds = epoll();
for (fd : fds) {
    buf = read(fd);
    process(buf);
}

Proposed approach:

fds = epoll();
for (fd : fds) {
    if (fd == primary) {
        handleReplication(fd);
    } else {
        buf = read(fd);
        process(buf);
    }
}

handleReplication(fd) {
    do {
        buf = read(fd);
        process(buf);
    } while (len(buf) == MAX_BUF_LEN && some_threshold_to_prevent_starving_other_clients_to_an_extreme_level);
}

If the condition len(buf) == MAX_BUF_LEN is true, this means the rx queue in the kernel is likely to have more data to read, this can be observed in netstat, Recv-Q field would be constantly way above 0. It should ideally be 0 or close to 0 at all times, especially for the replication connection..

We can set IP_TOS, although I'm not sure how impactful it would be, especially that many routers might ignore it.

I believe I can demonstrate this in a PR, before & after .

xbasel self-assigned this Feb 11, 2025

xbasel mentioned this issue Feb 27, 2025

[NEW] QoS of ae-loop #1789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] Introduce Quality-of-Service for the replication stream to reduce full sync as a result of buffer overruns #1596

[NEW] Introduce Quality-of-Service for the replication stream to reduce full sync as a result of buffer overruns #1596

xbasel commented Jan 21, 2025

zuiderkwast commented Jan 21, 2025

xbasel commented Jan 21, 2025

[NEW] Introduce Quality-of-Service for the replication stream to reduce full sync as a result of buffer overruns #1596

[NEW] Introduce Quality-of-Service for the replication stream to reduce full sync as a result of buffer overruns #1596

Comments

xbasel commented Jan 21, 2025

zuiderkwast commented Jan 21, 2025

xbasel commented Jan 21, 2025