[WIP] Using scheduled start-time instead of actual start-time #56

pveentjer · 2025-01-27T12:33:06Z

Currently the FailoverTestRug uses the actual starting time to determine the latency.

  int workCount = 0;
            final long now = clock.nanoTime();

            if (moreToGenerate && now - nextMessageAt >= 0)
            {
                generationTimestamps[freePosition++] = now; <---

            ...
            }

 private int trySend()
    {
        if (!synced || sendPosition >= freePosition)
        {
            return 0;
        }

        final int sequence = sendPosition;
        final long timestamp = generationTimestamps[sendPosition]; <------

        if (transceiver.trySendEcho(sequence, timestamp))
        {
            sendPosition++;

            return 1;
        }

        return 0;
    }

The consequence is that the system can still be suffering from a form of coordinated omission and therefor the latency numbers will look more positive than they actually are.

Explanation:

The load generator does 100 request/second, so every 10 ms there will be a request.

Every request to the remote system is handled in exactly 3 ms.

Imagine that the clocks in ms and currently is at 50000 and the next request is scheduled at 50010. If there is a stall of 100 ms just before calling now, then now will return 50100, so the value 50100 is stored in the generationTimestamps and that is the value passed to the echo message. Since a request takes 3 ms, the latency of that request will be be 50103-50100=3 ms. This is because it is based on the actual starting time of that request.

But the scheduled starting time of the request was 50010, so the actual latency is 50103-50010=93 ms. That is a 30x difference.

It doesn't only affect this call, but all calls that should have been made during the 100 ms pause. These calls still get made (which is good) but assuming there are no further stalls, than with the current code, the measured latency for the 9 other calls will be 3,3,3,...,3,3,3,3. But in reality it should be 87,77,67,...,13,3. So the measured latencies of calls that should have happened during the stall, are incorrect.

This can be seen as a form of coordinated omission. The main difference is that it isn't caused by the remote system, but is caused by the local system. And you can get stalls in the local system.

What should be done is to use the scheduled starting time of a request to determine the latency and not the actual starting time of the request.

pveentjer · 2025-01-27T12:34:37Z

benchmarks-aeron/src/main/java/uk/co/real_logic/benchmarks/aeron/remote/FailoverTestRig.java

@@ -256,8 +266,6 @@ private void runTest(final int durationSeconds, final int messageRate)

            workCount += transceiver.receive();



Why is this second trySend done? You only want to send when a request is scheduled and that is taken care of by the above if-statement.

Improved handling coordinated omission

222569a

pveentjer changed the title ~~Using intended start time instead of actual start time~~ [WIP] Using intended start time instead of actual start time Jan 27, 2025

pveentjer commented Jan 27, 2025

View reviewed changes

pveentjer requested a review from wojciech-adaptive January 28, 2025 07:14

pveentjer changed the title ~~[WIP] Using intended start time instead of actual start time~~ [WIP] Using scheduled start-time instead of actual start-time Jan 28, 2025

pveentjer requested a review from vyazelenko January 28, 2025 07:30

pveentjer closed this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Using scheduled start-time instead of actual start-time #56

[WIP] Using scheduled start-time instead of actual start-time #56

pveentjer commented Jan 27, 2025 •

edited

Loading

pveentjer Jan 27, 2025

		@@ -256,8 +266,6 @@ private void runTest(final int durationSeconds, final int messageRate)

		workCount += transceiver.receive();

[WIP] Using scheduled start-time instead of actual start-time #56

[WIP] Using scheduled start-time instead of actual start-time #56

Conversation

pveentjer commented Jan 27, 2025 • edited Loading

pveentjer Jan 27, 2025

Choose a reason for hiding this comment

pveentjer commented Jan 27, 2025 •

edited

Loading